XOM XPath Mapping

Elliotte Rusty Harold

XOM 1.1 supports XPath 1.0 reasonably faithfully. However there are some differences between the XPath data model and the XOM data model you need to be aware of when using XPath. The main conceptual shift required to grok how XPath operates in XOM is to understand that an XPath data model is built from a XOM object, rather being a XOM object. If you're not getting the results you expect when using XPath to query XOM objects, this may help explain why. Specific areas you need to worry about are:

Document type declarations
Adjacent text nodes
Empty text nodes
Namespace nodes and the namespace axis
Nodes that do not belong to a document
Node-set order
Queries that return non node-sets

Document Type Declaration

The XPath data model does not include any representation of the document type declaration. Therefore no XPath expression will select a DocType object. Furthermore, the DocType object is not considered when counting the number or position of a document's children using position(), last(), count(), or similar functions in XPath.

Contiguous Text Nodes

The XPath data model does not allow contiguous text nodes or empty text nodes. XOM does allow one Text object to immediately follow another. When XPath queries are made on XOM documents, all contiguous Text objects are treated as a single XPath text node. For example, consider this code fragment:

  Element parent = new Element("parent");
  Text t1 = new Text("1");
  Text t2 = new Text("2");
  Text t3 = new Text("3");
  Text t4 = new Text("4");
  parent.appendChild(t1);
  parent.appendChild(t2);
  parent.appendChild(t3);
  parent.appendChild(t4);
  Element child = new Element("child");
  parent.appendChild(child);
  Nodes result = parent.query("child::node()[2]");

result contains the child element because all four Text objects only count as one XPath text node. The function call parent.query("child::text()[1]") returns a Nodes object containing all four Text objects in order. It is not possible to use XPath to select a single Text object without selecting all adjacent Text objects.

Empty Text Nodes

Empty text nodes are a related issue. They do not exist in the XPath data model, and they cannot be individually selected by XPath expressions. For example, consider this:

  Element parent = new Element("parent");
  Element child1 = new Element("child1");
  parent.appendChild(child1);
  Text t1 = new Text("");
  parent.appendChild(t1);
  Element child2 = new Element("child2");
  parent.appendChild(child2);
  Nodes result = parent.query("child::node()");

result contains the child1 and child2 but not t1 because the empty Text object is invisible to XPath. On the other hand consider this:

  Element parent = new Element("parent");
  Element child1 = new Element("child1");
  parent.appendChild(child1);
  Text t1 = new Text("");
  Text t2 = new Text("2");
  Text t3 = new Text("3");
  parent.appendChild(t1);
  parent.appendChild(t2);
  parent.appendChild(t3);
  Element child2 = new Element("child2");
  parent.appendChild(child2);
  Nodes result = parent.query("child::node()");

In this case, result contains the child1, t1, t2, t3, and child2 even though t1 is empty because it is adjacent to non-empty Text objects.

Namespace Nodes

XPath defines a namespace node as a namespace in scope on an element. XOM mostly works with namespace declarations instead. When evaluating XPath expressions that use the namespace axis, XOM uses the XPath definition. However, there is now a Namespace subclass of Node which is used solely in XPath results. These objects are created on the fly as necessary, and are not accessible from the rest of XOM.

Documentless Trees

XPath implicitly assumes that all nodes belong to a document. In XOM this is not necessarily true. Nonetheless it is still useful to be able to execute queries on nodes (and particularly trees of nodes) that don't belong to any document. When an absolute XPath expression such as /root/child or // is evaluated, XOM supplies a fictitious root node. The effect is the same as if the actual top-level node in the tree were contained in a document. For example, consider this query:

  Element test = new Element("test");
  Nodes result = element.query("/*[1]");

result contains the test because it is the first child of this fictitious root. However, the query / throws an XPathException because it is attempting to return this fictitious root.

Document Order

XPath node-sets are unordered (like any set) and do not contain duplicates. The Nodes object returned by the query method also does not contain duplicates. However, it orders all nodes in document order. Generally this is the order in which nodes would be encountered in a depth first traversal of the tree. Attributes and namespaces appear after their parent element in this order and before any child elements. However, otherwise their order is not guaranteed.

Expressions that return non node-sets

XOM only supports expressions that return node-sets as queries. Expressions that return numbers, booleans, or strings throw an XPathException. Examples of such expressions include count(//), 1 + 2 + 3, name(/*), and @id='p12'. Note that all of these expressions can be used in location path predicates. They just can't be the final result of evaluating an XPath expression.