XPath is a language for selecting nodes from an
XMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
document. In addition,
XPathXPath is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document...
may be used to compute values (strings, numbers, or boolean values) from the content of an XML document. The current version of the language is
XPath 2.0XPath 2.0 is the current version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007....
, but version 1.0 is still more widely used.
The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as
an XPath.
Originally motivated by a desire to provide a common syntax and behavior model between
XPointerXPointer is a system for addressing components of XML based internet media.XPointer is divided among four specifications: a "framework" which forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces, and a scheme for XPath-based addressing...
and
XSLTXSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...
, subsets of the XPath
query languageQuery languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...
are used in other W3C specifications such as XML Schema and
XFormsXForms is an XML format for the specification of a data processing model for XML data and user interface for the XML data, such as web forms...
.
Syntax and semantics
The most important kind of expression in XPath is a
location path. A location path consists of a sequence of
location steps. Each location step has three components:
- an axis
- a node test
- zero or more predicates.
An XPath expression is evaluated with respect to a
context node. An Axis Specifier such as 'child' or 'descendant' specifies the direction to navigate from the context node. The node test and the predicate are used to filter the nodes specified by the axis specifier: For example the node test 'A' requires that all nodes navigated to must have label 'A'. A predicate can be used to specify that the selected nodes have certain properties, which are specified by XPath expressions themselves.
The XPath syntax comes in two flavours: the
abbreviated syntax, is more compact and allows XPaths to be written and read easily using intuitive and, in many cases, familiar characters and constructs. The
full syntax is more verbose, but allows for more options to be specified, and is more descriptive if read carefully.
Abbreviated syntax
The compact notation allows many defaults and abbreviations for common cases. Given source XML containing at least
<A>
<B>
<C/>
</B>
</A>
the simplest XPath takes a form such as
which selects C elements that are children of B elements that are children of the A element that forms the outermost element of the XML document. The XPath syntax is designed to mimic URI (
Uniform Resource IdentifierIn computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...
) and Unix-style file path syntax.
More complex expressions can be constructed by specifying an axis other than the default 'child' axis, a node test other than a simple name, or predicates, which can be written in square brackets after any step. For example, the expression
selects the first element ('
[1]'), whatever its name ('
*'), that is a child ('
/') of a B element that itself is a child or other, deeper descendant ('
//') of an A element that is a child of the current context node (the expression does not begin with a '
/'). If there are several suitable B elements in the document, this actually returns a set of all their first children. ("
(A//B/*)[1]" returns just the first such node.)
Expanded syntax
In the full, unabbreviated syntax, the two examples above would be written
-
/child::A/child::B/child::C
-
child::A/descendant-or-self::node/child::B/child::*[position=1]
Here, in each step of the XPath, the
axis (e.g.
child or
descendant-or-self) is explicitly specified, followed by
:: and then the
node test, such as
A or
node in the examples above
Axis specifiers
The Axis Specifier indicates navigation direction within the tree representation of the XML document. The axes available are:
| Full Syntax | Abbreviated Syntax | Notes |
ancestor |
|
|
ancestor-or-self |
|
|
attribute |
@ |
@abc is short for attribute::abc |
child |
|
xyz is short for child::xyz |
descendant |
|
|
descendant-or-self |
// |
// is short for /descendant-or-self::node/ |
following |
|
|
following-sibling |
|
|
namespace |
|
|
parent |
.. |
.. is short for parent::node |
preceding |
|
|
preceding-sibling |
|
|
self |
. |
. is short for self::node |
As an example of using the
attribute axis in abbreviated syntax,
//a/@href selects the attribute called
href in
a elements anywhere in the document tree.
The expression
. (an abbreviation for
self::node) is most commonly used within a predicate to refer to the currently selected node.
For example,
h3[.='See also'] selects an element called
h3 in the current context, whose text content is
See also.
Node tests
Node tests may consist of specific node names or more general expressions. In the case of an XML document in which the namespace prefix
gs has been defined,
//gs:enquiry will find all the
enquiry elements in that namespace, and
//gs:* will find all elements, regardless of local name, in that namespace.
Other node test formats are:
comment :finds an XML comment node, e.g.
text :finds a node of type text, e.g. the
hello in
hello all
processing-instruction :finds XML
processing instructionA Processing Instruction is an SGML and XML node type, which may occur anywhere in the document, intended to carry instructions to the application....
s such as
. In this case,
processing-instruction('php') would match.
node :finds any node at all.
Predicates
Predicates, written as expressions in square brackets, can be used to restrict a node-set to select only those nodes for which some condition is true. For example
a[@href='help.php'] will select those
a elements (among the children of the context node) having an
href attribute whose value is
help.php.
There is no limit to the number of predicates in a step, and they need not be confined to the last step in an XPath. They can also be nested to any depth. Paths specified in predicates begin at the context of the current step (i.e. that of the immediately preceding node test) and do not alter that context. All predicates must be satisfied for a match to occur.
When the value of the predicate is numeric, it is interpreted as a test on the position of the node. So
p[1] selects the first
p element child, while
p[last] selects the last.
In other cases, the value of the predicate is automatically converted to a boolean. When the predicate evaluates to a node-set, the result is true when the node-set is non-empty. Thus
p[@x] selects those
p elements that have an attribute named
x.
A more complex example: the expression
a[/html/@lang='en'][@href='help.php'][1]/@target selects the value of the
target attribute of the first
a element among the children of the context node that has its
href attribute set to
help.php, provided the document's
html top-level element also has a
lang attribute set to
en. The reference to an attribute of the top-level element in the first predicate affects neither the context of other predicates nor that of the location step itself.
Predicate order is significant if predicates test the position of a node. Each predicate 'filters' a location step's selected node-set in turn. So
a[1][@href='help.php'] will find a match only if the first
a child of the context node satisfies the condition
@href='help.php', while
a[@href='help.php'][1] will find the first
a child that satisfies this condition.
Functions and operators
XPath 1.0 defines four data types: node-sets (sets of nodes with no intrinsic order), strings, numbers and booleans.
The available operators are:
- The "/", "//" and "[...]" operators, used in path expressions, as described above.
- A union operator, "|", which forms the union of two node-sets.
- Boolean operators "and" and "or", and a function "not"
- Arithmetic operators "+", "-", "*", "div" (divide), and "mod"
- Comparison operators "=", "!=", "<", ">", "<=", ">="
The function library includes:
- Functions to manipulate strings: concat, substring, contains, substring-before, substring-after, translate, normalize-space, string-length
- Functions to manipulate numbers: sum, round, floor, ceiling
- Functions to get properties of nodes: name, local-name, namespace-uri
- Functions to get information about the processing context: position, last
- Type conversion functions: string, number, boolean
Some of the more commonly useful functions are detailed below. For a complete description, see
the W3C Recommendation document
Node set functions
position :returns a number representing the position of this node in the sequence of nodes currently being processed (for example, the nodes selected by an xsl:for-each instruction in XSLT).
count(
node-set) :returns the number of nodes in the node-set supplied as its argument.
String functions
string(
object?) :converts any of the four XPath data types into a string according to built-in rules. If the value of the argument is a node-set, the function returns the string-value of the first node in document order, ignoring any further nodes.
concat(
string,
string,
string*) :
concatenatesIn computer programming, string concatenation is the operation of joining two character strings end-to-end. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...
two or more strings
starts-with(
s1,
s2) : returns
true if
s1 starts with
s2
contains(
s1,
s2) :returns
true if
s1 contains
s2
substring(
string,
start,
length?) :example:
substring("ABCDEF",2,3) returns
"BCD".
substring-before(
s1,
s2) :example:
substring-before("1999/04/01","/") returns
1999
substring-after(
s1,
s2) :example:
substring-after("1999/04/01","/") returns
04/01
string-length(string?) :returns number of characters in string
normalize-space(
string?) :all leading and trailing
whitespaceIn computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...
is removed and any sequences of whitespace characters are replaced by a single space. This is very useful when the original XML may have been
prettyprintPrettyprint is the application of any of various stylistic formatting conventions to text, source code, markup, and other similar kinds of content. These formatting conventions usually consist of changes in positioning, spacing, color, contrast, size and similar modifications intended to make the...
formatted, which could make further string processing unreliable.
substring(
string,
start,
length) :returns a
length characters long substring of the given
string, starting at
start (which begins with
1).
Boolean functions
not(
boolean) :negates any boolean expression.
true :evaluates to
true.
false :evaluates to
false.
Number functions
sum(
node-set) :converts the string values of all the nodes found by the XPath argument into numbers, according to the built-in casting rules, then returns the sum of these numbers.
Usage examples
Expressions can be created inside predicates using the operators:
=, !=, <=, <, >= and
>. Boolean expressions may be combined with brackets
and the boolean operators
and and
or as well as the
not function described above. Numeric calculations can use
*, +, -, div and
mod. Strings can consist of any
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
characters.
//item[@price > 2*@discount] selects items whose price attribute is greater than twice the numeric value of their discount attribute.
Entire node-sets can be combined (
'unioned'In set theory, the union of a collection of sets is the set of all distinct elements in the collection. The union of a collection of sets S_1, S_2, S_3, \dots , S_n\,\! gives a set S_1 \cup S_2 \cup S_3 \cup \dots \cup S_n.- Definition :...
) using the vertical bar character |. Node sets that meet one or more of several conditions can be found by combining the conditions inside a predicate with '
or'.
v[x or y] | w[z] will return a single node-set consisting of all the
v elements that have
x or
y child-elements, as well as all the
w elements that have
z child-elements, that were found in the current context.
Examples
Given a sample XML document
en.wikipedia.org
de.wikipedia.org
fr.wikipedia.org
pl.wikipedia.org
es.wikipedia.org
en.wiktionary.org
fr.wiktionary.org
vi.wiktionary.org
tr.wiktionary.org
es.wiktionary.org
The XPath expression
/wikimedia/projects/project/@name
Selects name attributes for all projects, and
/wikimedia//editions
Selects all editions of all projects, and
/wikimedia/projects/project/editions/edition[@language="English"]/text
Selects addresses of all English Wikimedia projects (text of all
edition elements where
language attribute is equal
to
English). And the following
/wikimedia/projects/project[@name="Wikipedia"]/editions/edition/text
Selects addresses of all Wikipedias (text of all
edition elements that exist under
project element with a name
attribute of
Wikipedia)
Implementations
Command Line Tools
- XMLStarlet
XMLStarlet is a command line XML utility which allows the modification and validation of XML documents.It is released under a MIT License.- Example Usage :An XML document can be validated against an XSD schema as follows: xml val -e -s my.xsd my.xml...
ActionScriptActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of...
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
/
C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
- libxml2
- Pathan
Pathan may refer to a member of the:*Pashtun people; an ethnic group native to Pakistan and Afghanistan*Pathans of Punjab*Pathans of Rajasthan*Pathans of Uttar Pradesh*Pathans of Bihar*Pathans of Gujarat*Rohilla...
- Sedna XML Database
Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...
- VTD-XML
Virtual Token Descriptor for eXtensible Markup Language refers to a collection of cross-platform XML processing technologies centered around a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor...
DelphiDelphi is both an archaeological site and a modern town in Greece on the south-western spur of Mount Parnassus in the valley of Phocis.In Greek mythology, Delphi was the site of the Delphic oracle, the most important oracle in the classical Greek world, and a major site for the worship of the god...
Implementations for Database Engines
- OpenLink Virtuoso
Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional RDBMS, ORDBMS, virtual database, RDF, XML, free-text, web application server and file server functionality in a single system...
JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
- Saxon XSLT
Saxon is an XSLT and XQuery processor created by Michael Kay. There are open-source and also closed-source commercial versions. Versions exist for Java and .NET.The current version, as of December 2010, is 9.3.- Versions :...
supports XPath 1.0 and XPath 2.0 (as well as XSLT 1.0, XSLT 2.0, and XQuery 1.0)
- BaseX
BaseX is a native and light-weight XML database management system, developed as a community project on GitHub. It is specialized in storing, querying, and visualizing large XML documents and collections...
(also supports XPath 2.0 and XQuery)
- VTD-XML
Virtual Token Descriptor for eXtensible Markup Language refers to a collection of cross-platform XML processing technologies centered around a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor...
- Sedna XML Database
Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...
Both XML:DB and proprietary.
The
JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
package
has been part of Java standard edition since Java 5. Technically this is an XPath API rather than an XPath implementation, and it allows the programmer the ability to select a specific implementation that conforms to the interface.
JavaScriptJavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
- JQuery
jQuery is a cross-browser JavaScript library designed to simplify the client-side scripting of HTML. It was released in January 2006 at BarCamp NYC by John Resig...
(Basic support)
.NET FrameworkThe .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
- In the System.Xml and System.Xml.XPath namespaces
- Sedna XML Database
Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...
PerlPerl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
PHPPHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
- Sedna XML Database
Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...
PythonPython is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
- libxml2
- Amara
Amara, the sun beetles, are a large genus of carabid beetles, mostly holarctic, but a few species are neotropical or occurring in eastern Asia.These ground beetles are mostly black or bronze-coloured...
- Sedna XML Database
Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...
RubyRuby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
ActionScriptActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of...
Scheme
- Sedna XML Database
Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...
SQLSQL is a programming language designed for managing data in relational database management systems ....
- MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...
supports a subset of XPath from version 5.1.5 onwards
- PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...
supports XPath and XSLT from version 8.4 on
Use in schema languages
XPath is increasingly used to express constraints in schema languages for XML.
- The (now ISO standard
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
) schema language SchematronIn markup languages, Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees...
pioneered the approach.
- A streaming subset of XPath is used in W3C XML Schema 1.0 for expressing uniqueness and key constraints. In XSD 1.1, the use of XPath is extended to support conditional type assignment based on attribute values, and to allow arbitrary boolean assertions to be evaluated against the content of elements.
- XForms
XForms is an XML format for the specification of a data processing model for XML data and user interface for the XML data, such as web forms...
uses XPath to bind types to values.
- The approach has even found use in non-XML applications, such as the constraint language for Java called PMD: the Java is converted to a DOM-like parse tree, then XPaths rules are defined over the tree.
See also
- XPath 2.0
XPath 2.0 is the current version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007....
- XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
- XSL
In computing, the term Extensible Stylesheet Language is used to refer to a family oflanguages used to transform and render XML documents....
, XSLTXSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...
, XSL-FO
- XQuery
- Features :XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents....
- XLink
XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.-The XLink specification:...
, XPointerXPointer is a system for addressing components of XML based internet media.XPointer is divided among four specifications: a "framework" which forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces, and a scheme for XPath-based addressing...
- XML Schema
- Schematron
In markup languages, Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees...
- Navigational database
A navigational database is a type of database characterized by the fact that objects in it are found primarily by following references from other objects...
- XML database
An XML database is a data persistence software system that allows data to be stored in XML format. This data can then be queried, exported and serialized into the desired format.Two major classes of XML database exist:...
External links