XOM Release Notes

XOM is a new XML object model. It is an open source (LGPL), tree-based API for processing XML with Java that strives for correctness and simplicity.

1.2.10

Support the built-in Android parser.

1.2.9

Exclude org.w3c.dom from Jaxen files we copy in to avoid problems with some application servers.

Upgrade Jaxen to 1.1.6 to fix some IEEE-754 bugs involving -0.

1.2.8

Upgraded to Jaxen 1.1.4 to fix several XPath bugs involving function resolution and Java 7 compatibility.

1.2.7

Canonical XML 1.1

1.2.6

Fixes a bug that doubled query strings in base URLs.

Upgraded to Jaxen 1.1.3 to fix an XPath bug evaluating relational operators when one of the operands was a text, comment, or processing instruction node.

1.2.5

Throws NullPointerException instead of MalformedUriException when a null Reader is passed to Builder.build().

Maven 2 support

1.2.4 Release

More automatic deploy process.

Fixed maven targets.

Slight optimization to XPath by combining two loops.

1.2.3

Bug fix for some obscure corner cases.

1.2.2

This release focuses on improved packaging with Maven and OSGI. Otherwise, no visible changes.

1.2.1

A very minor release that now prints the correct version number when you execute the JAR archive by typing java -jar xom.jar

1.2

The 1.2 release fixes a number of bugs, especially in canonicalization and XPath. However there's at least one bug fix in the core so I recommend all users upgrade. XOM 1.2 should be fully backwards compatible with code written to 1.0 and 1.1 APIs. 1.2 should also be somewhat easier to compile and edit due to various changes with UnicodeUtil and Jaxen. Actual new features in this release are fairly minor and include:

1.1

New features implemented since 1.0 include:

Memory usage has been reduced, and performance improved by up to 2-4 times for some common operations. In addition, some bugs have been fixed in XOMTestCase and in the handling of a few edge conditions in the internal DTD subset. Furthermore, 1.1 works around quite a few more bugs in Crimson.

1.0

Essentially the same as Beta 11. The README file was improved slightly and all version numbers in the JavaDoc have been upgraded to 1.0. A number of small edits have been made to the API documentation. The only API-level change is that the deprecated setNodeFactory method in XSLTransform has been removed.

1.0b11/RC5

Beta 11 is the fifth release candidate. It restores the three servlet samples (FibonacciServlet, FibonacciSOAPServlet, and FibonacciXMLRPCServlet) but uses Ant conditions to only compile these files if the servlet classes are present. It also adds README, LICENSE, and LGPL files to the core distribution rather than simply placing these on the web site. Finally, http://www.cafeconleche.org/XOM/ has been replaced by http://www.xom.nu/ in the source code and documentation. The core API has not changed at all.

1.0b10/RC4

Beta 10 is the fourth release candidate. It removes three samples (FibonacciServlet, FibonacciSOAPServlet, and FibonacciXMLRPCServlet) to avoid having to distribute servlet.jar with XOM. It also modifies the Ant build file so the tools package is not compiled except when generating the betterdoc target. This makes the complete distribution more self-contained and easier to build. The core API has not changed at all.

1.0b9/RC3

Beta 9 is the third release candidate. It adds a few more unit tests and fixes some packaging issues that were bedeviling Windows system. (The zip and tar files no longer contain any test files whose names are legal on Unix but illegal on Windows.) Barring discovery of any last-minute bugs, this will be XOM 1.0. No further optimizations or fixes are planned before 1.0. All the changes are restricted to the tests package. The core API has not changed at all.

1.0b8/RC2

Beta 8 is the second release candidate. Barring discovery of any last-minute bugs, this will be XOM 1.0. No further optimizations or fixes are planned before 1.0. Changes in this release include:

1.0b7/RC1

Beta 7 is the first release candidate. There are still a few open issues with regard to error handling in XInclude that require clarification from the XInclude working group. If they decide that how XOM currently behaves is correct, then XOM 1.0 is essentially complete. If they decide to require different behavior a few changes may yet need to be made.

Changes in this release include:

1.0b6

Beta 6 is primarily a bug fix release. It also polishes off some rough edges in various corners of the API. Changes in this release include:

1.0b5

Beta 5 primarily focuses on fixing bugs in XInclude and improving performance of builders when reading from files. It also deprecates the setNodeFactory() method in XSLTransform which will be removed in the next release. In its place, there's a new constructor:

public XSLTransform(Document stylesheet, NodeFactory factory)

Finally, the four XSLTransform constructors deprecated in the last release have been removed.

1.0b4

1.0b4 primarily focuses on fixing bugs and improving performance in the converters and XSLT package. XSLT transformation can now work directly from a XOM Document without an intermediate step that serializes the Document as a string. Consequently, these four constructors in XSLTransform have been deprecated and will be removed in the next release:

public XSLTransform(InputStream stylesheet)
public XSLTransform(Reader stylesheet)
public XSLTransform(String URL)
public XSLTransform(File stylesheet)

Other changes include:

1.0b3

The primary impetus for beta 3 is fixing a few bugs in the DOMConverter. Also, Java encoding names like "8859_1" are now recognized when using the repackaged Xerces bundled with Java 1.5 I also spell checked the comments. :-)

1.0b2

The primary impetus for beta 2 is fixing some bugs that prevented the XOM-specific parsers from being loaded in Java 1.5 when the standard Xerces (as opposed to the Java 1.5 bundled Xerces) was not in the classpath.

This release also makes the JavaDoc well-formed (and possibly valid, I haven't checked) XHTML.

1.0b1

Beta 1 is feature and code complete. There are no known bugs in XOM. All that remains to be done is finishing the documentation and doing some minor code clean-ups. These include such housekeeping tasks as splitting long lines, spell checking the comments, and making sure the Javadoc is all valid XHTML. None of this should have any affect on client code. XOM is now believed to be ready for serious, production use.

Unless new bugs are uncovered, this may be the one and only beta release. Possibly I'll do some profiling runs to see if there are any more areas where I can save some memory or speed up some operations. Barring that, all that's needed before the final 1.0 release is finished documentation.

Beta 1 makes no backwards incompatible changes to the published API. Changes since the final alpha include:

1.0a5

1.0a5 makes no backwards incompatible changes to the published API. Changes since the previous release include:

1.0a4

1.0a4 makes no backwards incompatible changes to the published API. Changes since the previous release include:

1.0a3

1.0a3 makes no backwards incompatible changes to the published API. It adds one new protected method. Changes since the previous release include:

1.0a2

1.0a2 makes no changes to the published API. Behavioral changes since the previous release include:

1.0a1

1.0a1 is the first alpha release of XOM. The API is now considered to be reasonably stable and frozen. I may add to the API in the future, but the current API will not change without a very good reason. Most features should work pretty much as intended. There are no API changes since 1.0d25. Behavioral changes since the previous release include:

There appear to be some bugs in Sun's JDK 1.4.2_03 that break about 5 or 6 of the unit tests. All tests pass with JDK 1.4.2_02 and JDK 1.5.0a1. Ant 1.5.x is required to build XOM. I have been unable to get the tests to run with Ant 1.6, and the Ant developers seem actively hostile to any reports about this issue.

1.0d25

1.0d25 is the second last call release of XOM. I had planned for this to be alpha 1 and API freeze. However, enough changes since the last release were discovered to be necessary, that I decided to make this 1.0d25 instead. Anything that didn't change since the last release is probably pretty stable. However, there have been some new changes in this release that are worth reviewing and may change again:

There are also several changes that do not affect the API

1.0d24

1.0d24 is a very fast release to fix a bug that prevented 1.0d23 from being used in multi-classloader environments like Tomcat. A couple of bugs that prevented some of the test cases from successfully completing on Windows have also been fixed, a bug in the FibonacciServlet sample was corrected, and some of the documentation has been improved. The API has not changed at all. XOM is still in "last call".

1.0d23

This is the last call, pre-alpha release of XOM. My plan is that the next release will be the official API freeze for 1.0. While nothing is written in stone, I do plan to strenuously resist any backwards incompatible changes in the API after the next release (1.0a1). If you have any concerns about the API, now is the time to get them in.

There are several backwards incompatible changes in this release. Most notably, the various makeNode() methods in the NodeFactory class all return Nodes objects. This means a factory can replace one node type with a different node type (e.g. changing elements into attributes and vice versa) or replace a single node with several nodes.

Oher changes that may require code modifications include:

More or less backwards compatible changes in 1.0d23 include

And of course numerous bugs have been fixed, especially in XInclude.

1.0d22

This release collects numerous small new features, refactorings, renamings, unit tests, sample programs, and bug fixes. Many programs will need minor modifications and recompilation to work against this release. Visible changes include:

This is probably the last version that will support the old, XInclude 2002 Candidate Recommendation syntax. The next release will likely support the new 2003 Working Draft syntax.

1.0d21

This release collects a number of small changes, refactorings, and bug fixes. Most programs should continue to work as they did previously without modification or recompilation. Visible changes include:

1.0d20

This release adds a workaround for Java's broken, non-conformant handling of file: URLs on Windows. The problem manifested itself as an inability to resolve relative URLs in documents built with the Builder.build(File) method. This caused the failure of a couple of dozen unit tests. Unix users were not affected (which is why I didn't notice the problem sooner). There are no API-level changes in this release.

The JAR archive is no longer compressed, which means a larger JAR archive but faster class loading on initial startup.

1.0d19

The major API level change in XOM 1.0d19 is in NodeFactory. makeElement has been renamed startMakingElement and endElement has been renamed finishMakingElement. startMakingElement behaves the same as the old makeElement. However, finishMakingElement now has a slightly different contract. if it returns null, the entire element is deleted from the tree. It is no longer necessary to explicitly call detach. If it returns a different element than the one passed to it, then the old element is deleted from the tree and the new one is inserted in its place. This is more consistent with the other methods in this class. Return the node you want added to the tree, or null for no node at all.

The second big change has no API-level impact. By default, the Serializer and toXML methods now use numeric character references to to escape all tabs, carriage returns, and line feeds in attribute values and all carriage returns in text nodes. This helps make round tripping more reilable and robust. However, if the user indicates that white space is not significant by calling either setMaxLength or setIndent, then these characters may not be preserved. If the client calls setLineSeparator, then tabs will still be preserved but carriage returns and line feeds may not be.

There are also several minor improvements and bug fixes:

1.0d18

1.0d18 adds one minor new feature and one major new feature. The minor feature is that nu.xom.tests.XOMTestCase is now public. This class is very useful for comparing two documents or pieces thereof for deep equality. For example, I use it to compare the actual output of the XInclude test cases to the expected outputs. I'm still working on the API and detailed behavior, but I think it's solid enough to be useful for other people's unit testing.

Now the major feature, and this one's way cool: It is now possible to subclass NodeFactory in order to filter and/or stream your processing. XOM can now handle documents of effectively arbitrary size with only slightly more memory use than the underlying SAX parser! I really need to write an article about this style of mixed tree/stream processing, but in the meantime here are the key things you need to know:

  1. To enable filtering or streaming, install your own NodeFactory subclass with the Builder. I've added a couple of constructors to Builder to make this easier.
  2. NodeFactory has one makeNode method for each of XOM's node types. You must return a node of the requested type, but you can change its name, namespace, value, or other characteristics before doing so.
  3. You can eliminate a node from the document simply by returning null from the makeNode method. This saves both the memory needed to store the node and the time required to build it.
  4. To process one element at a time, override endElement() in NodeFactory. This supports streaming. Before the builder calls this method, it has completely built the element with all its content. The usual XOM methods all work on it. You do not have process every element in order to process some. You can do a quick check on the name and namespace of the element (or other characteristics) to figure out what you want to do with it. If you don't want to process the element, just return. For example an XHTML spider could easily look at each a element and ignore all the other elements in the document. Indeed it wouldn't even have had to build them or any of their content in the first place.
  5. If you only need to process an element once, put your processing in the endElement() method and detach() it when you're done. As long as you haven't stored a reference to it somewhere, the element can then be garbage collected as needed. This is how XOM processes documents larger than available memory. This is sort of like SAX callbacks, except it's much more convenient because you have the entire element to work with. You do not need to build a custom data structure to hold onto the content until you're ready to work with it. The element is its own data structure.
  6. Most importantly, if you don't care about all this, you can ignore it. It has no impact on the rest of the API. Adding this functionality just required two new protected methods in NodeFactory and two new constructors in Builder. The rest of the API is unchanged. You can forget about it until you need it.

More details are in the JavaDoc for NodeFactory, and I've written lots of new sample programs that you'll find in the nu.xom.samples package. Many of them are streaming versions of earlier, less memory efficient samples.

This developed from an idea proposed by John Cowan, based on Simon St. Laurent's work with MOE. There have been things like this before, (DOMBuilderFilter in DOM3, MOE, ElementScanner in JDOM, and of course SAX filters) but I don't think any API has done quite as neat a job as XOM now does. This is really powerful stuff. Not only does it make programs faster and much, much smaller. It makes them much easier to write. For instance, you can easily throw away all white space only nodes on build so you're left with only the real content of the document, no more white space nodes getting in the way of your navigation. I urge you to check this out. It will radically change how you think about processing XML.

This release is API compatible with 1.0d17. All programs that compiled in 1.0d17 should still compile in 1.0d18 without any edits.

1.0d17

The is primarily a bug fix release. There are only very minor API changes, the most significant of which is that XSLTransform is final. Other fixes and improvements in this release include:

1.0d16

The primary focus of this release is adding unit tests for XSLT, and fixing the bugs they uncovered:

Other assorted improvements in this release include:

1.0d15

The primary focus of this release is XInclude. To my knowledge, XOM is now completely conformant with with the XInclude candidate recommendation including:

I've also written 24 unit tests for XInclude and fixed numerous bugs including one in the Document and Element copy constructors that failed to preserve base URI.

Other changes in this release include:

This release should be completely compatible with code written against 1.0d14. You should not even need to recompile existing programs.

1.0d14

The primary focus of this release is speed. I've done extensive profiling of the CPU times used by XOM, and rearchitected classes to run faster by both macro and micro optimizations. One of the things I discovered was that parsing and serialization are dramatically slower than in-memory manipulations, typically by three orders of magnitude. Right now my belief is that any program that does any parsing or serialization (and it's hard to imagine what program wouldn't do at least one of those two) is going to spend so much time doing that, that nothing else is worth optimizing. Parsing and serialization are typically three orders of magnitude slower than in-memory manipulations, even when all I/O is performed between byte arrays. There's simply no point to optimizing anything else.

That said, I have optimized parsing/document building extensively in this release. It is much, much faster than in previous releases. It should now be competitive with any other tree-based API written in Java, though naturally it's still slower than a straight forward SAX parse because it sits on top of SAX. The biggest effects on speed now are I/O (don't forget to buffer your streams) and the speed of the underlying parser. I'm still recommending Xerces because it's the only I've found that's almost correct, but you can speed XOM up by a factor of a third by switching to Crimson, and possibly more by switching to Piccolo. However, both of those have nasty bugs that prevent the XOM unit tests from completing successfully. Xerces has a couple of bugs too, but fortunately nothing I couldn't work around.

Contrary to popular belief, most of the optimizations improved both speed and memory use. There were few trade-offs between them. However, there was one notable exception. The Text class is now storing its data internally in UTF-8. This cuts memory usage for mostly ASCII text by about 10-20%. However, it has a noticeable 10% speed penalty. I'm not sure if I'm going to keep this strategy or not. Ideally, I'd like to provide some sort of runtime switch to select this behavior (or not) but I haven't yet figured out the right design to make this happen. The constraints on the design are:

There are no public API level changes in this release. However, the unit tests have been expanded dramatically, which resulted in the discovery and elimination of a number of bugs. Internal changes in 1.0d14 include:

1.0d13

The primary focus of this release is memory. I've done extensive profiling of the memory used by XOM, plugged memory leaks, and rearchitected classes to use less memory. The Element class has fewer fields than before and uses lazy initialization so many complex fields are null until and unless they're actually used. With this release XOM programs should use less than half the memory they used previously. I now have a rough estimate that for large (a hundred kilobytes or more), primarily ASCII-range XML documents encoded in UTF-8, the corresponding XOM Document object is five to six times the size of the input XML. Less complex documents without attributes or namespaces are likely to be smaller than documents of the same physical size with attributes and namespaces. If the original document is encoded in UTF-16, the size difference is likely to be more like 2 to 3 times.

Measurements are currently showing that almost all the space is taken up by strings and char arrays (mostly inside strings and string buffers). There might be a few places where I can make a nip here or a tuck there, but further large-scale memory optimization would have to look at using UTF-8 internally instead of UTF-16. (Possibly I can get away with doing this in just a couple of places like the Text class.) One area I can still explore is whether it might make sense to intern strings. Generally, the parser does this for anything read from a document, and the compiler does it for string literals; but there might still be a few opportunities here.

I've also done a little work on speed as well, though not nearly as extensive. Mostly I just picked off some low-hanging fruit the profiler made obvious. More serious work remains to be done. My inital measurements focused on document building. About 25-35% of the time was eaten by the parser. Another 25-35% went into verification, the biggest chunk of which was text content. The rest was divided up into dribs and drabs of actual document building. The single biggest time waster was this method:

    private static boolean isXMLCharacter(int c) {
        
        if (c <= 0xD7FF)  {
            if (c >= 0x20) return true;
            else {
                 if (c == '\n') return true;
                 if (c == '\r') return true;
                 if (c == '\t') return true;
                 return false;
            }
        }

        if (c < 0xE000) return false;  if (c <= 0xFFFD) return true;
        if (c < 0x10000) return false;  if (c <= 0x10FFFF) return true;
        
        return false;
    }

Even small optimizations here could have a large effect, so let me know if you see any. However, I'm probably going to redesign the XOMHandler class in 1.0d14 so it bypasses verification. The assumption is the parser will have already checked all this.

There are a few API level changes in this release:

In addition, they're a number of small changes in behavior that don't change the API:

Finally, there were a number of small bug fixes, and lots of code cleanups throughout. The most significant bug fix involved setting or changing the namespace URI of XHTML elements (and other elements that use the default namespace).

1.0d12

This release removes the insertBefore insertAfter methods from ParentNode because:

  1. They're redundant with other methods
  2. They don't really fit into XOM's indexed based access style
  3. Experience has shown they're not commonly used
  4. I'd rather hit 1.0 with too few methods than too many. It's easier to add a method in the future than to take it away.

However, if anyone howls too loudly about this, I can probably be convinced to put them back in.

This release also fixes a bug that arose when removing the namespace from an element that had attributes, such as might occur when converting XHTML to plain vanilla HTML.

1.0d11

The new feature in this release is an ANT build file. This should make it much easier to compile XOM from source. ANT is not included though. You'll have to download and install it separately.

There are no API-level changes in this release. All code that ran before should still run. This release does fix three assorted bugs reported by users:

Not surprisingly these all appeared in the Builder and Serializer classes, which out of all the classes in XOM are the least well-covered by unit tests. I've expanded the unit tests to catch these and related bugs. The unit tests all pass, assuming you use a non-buggy SAX2 parser. However, if you run the JUnit GUI from the ANT build file, some confusing class loader issues cause the more-buggy Crimson to be loaded instead of the less-buggy Xerces. This breaks four unit tests. Everything should pass if you run the tests directly instead of from ANT. (That is, type "java -Xmx96m junit.swingui.TestRunner nu.xom.tests.XOMTests" instead of "ant testui".) If anyone can explain to me how I might fix this, I'd appreciate it.

1.0d10

This release fixes various bugs in namespaces, and makes one API change. The declareNamespace method is once again addNamespaceDeclaration.

Under the hood, however, there are much more significant changes in namespace handling, and these are likely to break some existing applications. In particular,

1.0d9

Removed vestigial getNextSibling() and getPreviousSibling() methods from Document. These should have been removed earlier.

In Comment:
Renamed check to checkValue
Renamed setData to setValue
In ProcessingInstruction class:
Renamed checkData to checkValue
Renamed setData to setValue
In Text:
Renamed check to checkValue
Renamed setData to setValue
In ParentNode:
Renamed checkRemove to checkRemoveChild for symmetry with checkInsertChild
Moved these two methods down into Element:
public final void appendChild(String text)
public final void insertChild(String text, int position)

Fixed Builder bug that prevented parsing File objects whose filenames contained spaces and other non-URL legal characters

Fixed equals() method in Attribute.Type to work in mutliclassloader environments

Corrected usage instructions in samples programs to include the package name

Added checks on values of xml:base attributes that they are legal IRIs. Mainly this involves checking the hex escaping.

1.0d8

XSLT works (modulo some obscure bugs in handling the undeclaration of the default namespace. I need to get some clarification on the proper behavior of SAX processors to fix this.) The TrAX XOMSource and XOMResult classes are not yet public because I'm still thinking about the proper API for these, but you can use the XSLTransform class for most use-cases. You'll need a TrAX compliant XSLT engine such as Saxon or Xalan-J 2.4 somewhere in your classpath to use this.

It is now possible to undeclare the default namespace on a prefixed element by passing the empty string as the prefix and URI to declareNamespace().

1.0d7

Added constraint that an element cannot have two attributes with the same local name and same namespace URI, but different prefixes.

Changed automatic attribute replacement to depend on local name and namespace URI and never on qualified name alone.

Removed the getFirstChild(), getPreviousSibling(), and getNextSibling() methods from Node. These really didn't fit the XOM model of indexed access, and were slower than the indexed equivalents.

Added indexOf() method to ParentNode that returns the position of a given node within its parent, or -1 if the node is not a child of this ParentNode. This is helpful for those few cases where you do need to identify a node's sibling.

public int indexOf(Node child)

Spell checked the API documentation

Moved XOMResult into the nu.xom.transform package. XSLT still doesn't work, but it's a little closer to working.

1.0d6

This release makes very limited backwards incompatible changes to the API. (A few formerly public methods in Serializer are now protected.) Almost all code that previously compiled and ran with 1.0d4 and 1.0d5, should still compile and run. New features in the API in this release include:

In addition, several bugs were fixed:

1.0d5

This release makes no backwards incompatible changes to the API. All code that previously compiled and ran with 1.0d4, should still compile and run. New features in the API in this release include:

1.0d4

The major addition in 1.0d4 are methods to get and set the base URI of a node. You can invoke getBaseURI from any Node object to retrieve the URL against which relative URLs in that Node should be resolved. This is calculated in keeping with XML Base. That is, if an xml:base attribute is in scope its value is used. Otherwise, the URI of the entity in which the Node appears is loaded. You can change the underlying URI of the entity using the setBaseURI method in ParentNode. When a document is built, the parser fills in the base URI for each node. This is stored separately from xml:base attributes, which are not treated differently than any other attribute. When a document is serialized, you may request that the serializer fill in extra xml:base attributes not present in the infoset to preserve the underlying base URIs. However, since this is a structural change to the document, this feature is turned off by default.

Other API level changes include:

In addition several bugs were fixed, the JavaDoc was further cleaned up and improved, and more than a dozen new unit tests were added.

1.0d3

The major change in 1.0d3 is that the TreeNode class has been replaced by the ParentNode class. The only immediate subclasses of ParentNode are Element and Document. Attribute is the only immediate subclass of Node The other four node types are subclasses of LeafNode which is a subclass of Node. All navigation methods—getChild, getNextSibling, getParent, etc.—are now in Node. All insertion and deletion methods—appendChild, insertChild, removeChild, etc.—are only available in ParentNode, that is, Document and Element. Other API-level changes since 1.0d2 include:

I also spent a lot of time improving the JavaDoc.

1.0d2

I've posted 1.0d2 to fix the first bugs discovered, clean up the source code, and make a few changes to method names that seemed wise. API-level changes since Tuesday night include:


[ Cafe con Leche | Cafe au Lait ]

Copyright 2002-2005, 2009, 2013 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified January 13, 2012