XOM 1.1 supports XPath 1.0 reasonably faithfully. However there are some differences between the XPath data model and the XOM data model you need to be aware of when using XPath. The main conceptual shift required to grok how XPath operates in XOM is to understand that an XPath data model is built from a XOM object, rather being a XOM object. If you're not getting the results you expect when using XPath to query XOM objects, this may help explain why. Specific areas you need to worry about are:
Document type declarations
Adjacent text nodes
Empty text nodes
Namespace nodes and the namespace
axis
Nodes that do not belong to a document
Node-set order
Queries that return non node-sets
The XPath data model does not include any representation of the document type declaration. Therefore no XPath expression will select a DocType
object. Furthermore, the DocType
object is not considered when counting the number or position of a document's children using position()
, last()
, count()
, or similar functions in XPath.
The XPath data model does not allow contiguous text nodes or empty text nodes. XOM does allow one Text
object to immediately follow another.
When XPath queries are made on XOM documents, all contiguous Text
objects are treated as a single XPath text node. For example, consider this
code fragment:
Element parent = new Element("parent"); Text t1 = new Text("1"); Text t2 = new Text("2"); Text t3 = new Text("3"); Text t4 = new Text("4"); parent.appendChild(t1); parent.appendChild(t2); parent.appendChild(t3); parent.appendChild(t4); Element child = new Element("child"); parent.appendChild(child); Nodes result = parent.query("child::node()[2]");
result
contains the child
element
because all four Text
objects only count as one XPath text
node. The function call parent.query("child::text()[1]")
returns
a Nodes
object containing all four Text
objects in order. It is not possible to use XPath to select a single Text
object without selecting all adjacent
Text
objects.
Empty text nodes are a related issue. They do not exist in the XPath data model, and they cannot be individually selected by XPath expressions. For example, consider this:
Element parent = new Element("parent"); Element child1 = new Element("child1"); parent.appendChild(child1); Text t1 = new Text(""); parent.appendChild(t1); Element child2 = new Element("child2"); parent.appendChild(child2); Nodes result = parent.query("child::node()");
result
contains the child1
and child2
but not t1
because the empty Text
object is invisible to
XPath. On the other hand consider this:
Element parent = new Element("parent"); Element child1 = new Element("child1"); parent.appendChild(child1); Text t1 = new Text(""); Text t2 = new Text("2"); Text t3 = new Text("3"); parent.appendChild(t1); parent.appendChild(t2); parent.appendChild(t3); Element child2 = new Element("child2"); parent.appendChild(child2); Nodes result = parent.query("child::node()");
In this case, result
contains the child1
,
t1
,
t2
,
t3
,
and child2
even though t1
is empty because it is adjacent to non-empty Text
objects.
XPath defines a namespace node as a namespace in scope on an element. XOM mostly works with namespace declarations instead. When evaluating XPath expressions that use the namespace
axis, XOM uses the XPath definition. However,
there is now a Namespace
subclass of Node
which is used solely in XPath results.
These objects are created on the fly as necessary, and are not accessible from the rest of XOM.
XPath implicitly assumes that all nodes belong to a document. In XOM this is not necessarily true. Nonetheless it is still useful to be able to execute queries on nodes (and particularly trees of nodes) that don't belong to any document. When an absolute XPath expression such as /root/child
or //
is evaluated, XOM supplies a fictitious root node. The effect is the same as if the actual top-level node in the tree were contained in a document. For example,
consider this query:
Element test = new Element("test"); Nodes result = element.query("/*[1]");
result
contains the test
because it is the first child of this fictitious root. However, the query
/
throws an XPathException
because it is attempting to return this fictitious root.
XPath node-sets are unordered (like any set) and do not contain duplicates. The Nodes
object returned
by the query
method also does not contain duplicates.
However, it orders all nodes in document order. Generally this is the order in which nodes would be encountered in a depth first traversal of the tree. Attributes and namespaces
appear after their parent element in this order and before any child elements. However, otherwise their order is not guaranteed.
XOM only supports expressions that return node-sets as queries. Expressions that return numbers, booleans, or strings throw an XPathException
. Examples of such expressions include count(//)
, 1 + 2 + 3
,
name(/*)
, and @id='p12'
. Note that all of these expressions can be used in location path predicates. They just can't be the final result of evaluating an XPath expression.