XOM is a new XML object model. It is an open source (LGPL), tree-based API for processing XML with Java that strives for correctness and simplicity.
Xalan is no longer bundled. XOM should work in all JDKs from the last decade+ without it, though it will still default to it if it's installed.
Purely internal changes that improve compatibility with Java 17+
More purely internal changes:
com.sun
classes to make XOM compatible with JDK 16.StringBuffer
with StringBuilder
to slightly improve performance.Some purely internal changes, mostly replacing StringBuffer
with StringBuilder
, that should slightly improve performance in modern VMs.
Adds an Automatic-Module-Name header to the jar file (this time the right one) for improved compatibility with the Java Platform Module System in Java 9+.
Tried to add an Automatic-Module-Name header to the jar file for improved compatibility with the Java Platform Module System in Java 9+, but in fact added it to the wrong jar.
Improves performance with applications that build many small documents frequently.
Uploaded to Maven Central as xom:xom:1.3.2.
Failed release.
Java 1.5 or later is now required.
The Nodes
and Elements
classes are iterable so you can use the
enhanced for loop syntax on instances of these classes.
The copy()
method is now covariant.
Update and improve build system and website generation.
The binary is now compiled against Java 1.6 or later by default. It should still be source compatible with Java 1.4 but this has not been extensively tested.
Removes unthrown IOException from the Canonicalizer.setInclusiveNamespacePrefixList()
method.
Fixes one bug in the verification of a leading plus sign in the IPv4 parts of IPv6 addresses.
Support the built-in Android parser.
Exclude org.w3c.dom from Jaxen files we copy in to avoid problems with some application servers.
Upgrade Jaxen to 1.1.6 to fix some IEEE-754 bugs involving -0.
Upgraded to Jaxen 1.1.4 to fix several XPath bugs involving function resolution and Java 7 compatibility.
Canonical XML 1.1
Fixes a bug that doubled query strings in base URLs.
Upgraded to Jaxen 1.1.3 to fix an XPath bug evaluating relational operators when one of the operands was a text, comment, or processing instruction node.
Throws NullPointerException
instead of MalformedUriException
when a null
Reader
is passed to Builder.build()
.
Maven 2 support
More automatic deploy process.
Fixed maven targets.
Slight optimization to XPath by combining two loops.
Bug fix for some obscure corner cases.
This release focuses on improved packaging with Maven and OSGI. Otherwise, no visible changes.
A very minor release that now prints the correct version number when you execute the JAR archive by typing java -jar xom.jar
The 1.2 release fixes a number of bugs, especially in canonicalization and XPath. However there's at least one bug fix in the core so I recommend all users upgrade. XOM 1.2 should be fully backwards compatible with code written to 1.0 and 1.1 APIs. 1.2 should also be somewhat easier to compile and edit due to various changes with UnicodeUtil and Jaxen. Actual new features in this release are fairly minor and include:
DOMConverter
can accept a NodeFactory
to be used in creating the XOM documentXPathContext
that finds the namespace URI for a prefix.New features implemented since 1.0 include:
setInternalDTDSubset
method in DocType
xml:id
support
Memory usage has been reduced,
and performance improved by up to 2-4 times for some common operations.
In addition, some bugs have been fixed in XOMTestCase
and in the handling of a few edge conditions in the internal DTD subset.
Furthermore, 1.1 works around quite a few more bugs in Crimson.
Essentially the same as Beta 11.
The README file was improved slightly and
all version numbers in the JavaDoc have been upgraded to 1.0.
A number of small edits have been made to the API documentation.
The only API-level change is that the deprecated setNodeFactory
method in XSLTransform
has been removed.
Beta 11 is the fifth release candidate. It restores the three servlet samples (FibonacciServlet, FibonacciSOAPServlet, and FibonacciXMLRPCServlet) but uses Ant conditions to only compile these files if the servlet classes are present. It also adds README, LICENSE, and LGPL files to the core distribution rather than simply placing these on the web site. Finally, http://www.cafeconleche.org/XOM/ has been replaced by http://www.xom.nu/ in the source code and documentation. The core API has not changed at all.
Beta 10 is the fourth release candidate. It removes three samples (FibonacciServlet, FibonacciSOAPServlet, and FibonacciXMLRPCServlet) to avoid having to distribute servlet.jar with XOM. It also modifies the Ant build file so the tools package is not compiled except when generating the betterdoc target. This makes the complete distribution more self-contained and easier to build. The core API has not changed at all.
Beta 9 is the third release candidate. It adds a few more unit tests and fixes some packaging issues that were bedeviling Windows system. (The zip and tar files no longer contain any test files whose names are legal on Unix but illegal on Windows.) Barring discovery of any last-minute bugs, this will be XOM 1.0. No further optimizations or fixes are planned before 1.0. All the changes are restricted to the tests package. The core API has not changed at all.
Beta 8 is the second release candidate. Barring discovery of any last-minute bugs, this will be XOM 1.0. No further optimizations or fixes are planned before 1.0. Changes in this release include:
Beta 7 is the first release candidate. There are still a few open issues with regard to error handling in XInclude that require clarification from the XInclude working group. If they decide that how XOM currently behaves is correct, then XOM 1.0 is essentially complete. If they decide to require different behavior a few changes may yet need to be made.
Changes in this release include:
Builder
is considerably more robust against buggy parsers. It converts all runtime exceptions thrown by such a parser (including XOM XMLException
s thrown by a NodeFactory
) into ParsingExceptions
. It uses a verifying factory for Saxon 7's AElfred derivative.getValue()
, toXML()
, DOM and SAX conversion, canonicalization, and XSL transformation by roughly a factor of two.Beta 6 is primarily a bug fix release. It also polishes off some rough edges in various corners of the API. Changes in this release include:
The deprecated setNodeFactory()
method in XSLTransform
has been removed. This is the only API-level change in this release.
The strings returned by toString
in Comment
, ProcessingInstruction
, Attribute
, and Text
are all now truncated if they get too long. Furthermore any embedded line breaks and tabs are escaped as \n, \r, and \t. This makes the objects easier to inspect in various debuggers and loggers.
SAXConverter
no longer converts XOM xml:base
attributes into SAX attributes. Instead the xml:base attributes are used to determine the URI information the Locator
reports. Providing xml:base
attributes as well would risk double counting some relative URLs.
Fixed bug where carriage returns in internal entity replacement text in the internal DTD subset was not properly escaped on reserialization
Fixed bug where carriage returns, less than signs, double quotes, and ampersands in attribute default values in the internal DTD subset were not properly escaped on reserialization.
Fixed a number of bugs in converting file names to base URIs
Improved compatibility with Turkish locales that do not see I as the upper case form of i or vice versa.
Fixed a bug in Serializer
that did not always properly trim whitespace
Hid the error messages logged by Xerces and Xalan on System.err
when deliberately testing error conditions. Therefore, there should be no output from the test cases when all tests pass.
Added a junithtml build target to convert JUnit results to HTML.
The Ant build file now specifies that the input encoding of all .java files is UTF-8. Most files are pure ASCII, but there are a couple of places where non-ASCII characters are used.
Unit test coverage has been improved.
Beta 5 primarily focuses on fixing bugs in XInclude and improving performance of builders when reading from files.
It also deprecates the setNodeFactory()
method
in XSLTransform
which
will be removed in the next release. In its place, there's a new constructor:
public XSLTransform(Document stylesheet, NodeFactory factory)
Finally, the four XSLTransform
constructors deprecated in the
last release have been removed.
1.0b4 primarily focuses on fixing bugs and improving performance in
the converters and XSLT package. XSLT transformation can now work directly from a XOM
Document
without an intermediate step that
serializes the Document
as a string. Consequently,
these four constructors in XSLTransform
have been deprecated and
will be removed in the next release:
public XSLTransform(InputStream stylesheet)
public XSLTransform(Reader stylesheet)
public XSLTransform(String URL)
public XSLTransform(File stylesheet)
Other changes include:
SAXConverter
can now convert Nodes
lists as well as Document
s.SAXConverter
now sets a Locator
that provides system IDs for individual elements.toXML
methods now use \n as the line separator, since this is more likely
to match the contents of text nodes created by parsing an XML document.
The goal is to minimize the number of documents with mixed line break strings.DOMConverter
that threw a NullPointerException
when converting XOM documents with only a single element to DOM.
The primary impetus for beta 3 is fixing a few bugs in the DOMConverter
.
Also,
Java encoding names like "8859_1"
are now recognized when using the repackaged Xerces
bundled with Java 1.5
I also spell checked the comments. :-)
The primary impetus for beta 2 is fixing some bugs that prevented the XOM-specific parsers from being loaded in Java 1.5 when the standard Xerces (as opposed to the Java 1.5 bundled Xerces) was not in the classpath.
This release also makes the JavaDoc well-formed (and possibly valid, I haven't checked) XHTML.
Beta 1 is feature and code complete. There are no known bugs in XOM. All that remains to be done is finishing the documentation and doing some minor code clean-ups. These include such housekeeping tasks as splitting long lines, spell checking the comments, and making sure the Javadoc is all valid XHTML. None of this should have any affect on client code. XOM is now believed to be ready for serious, production use.
Unless new bugs are uncovered, this may be the one and only beta release. Possibly I'll do some profiling runs to see if there are any more areas where I can save some memory or speed up some operations. Barring that, all that's needed before the final 1.0 release is finished documentation.
Beta 1 makes no backwards incompatible changes to the published API. Changes since the final alpha include:
The XInclude test suite is loaded and run from the W3C CVS server if it's not installed locally. Mistakes in the test suite (mostly involving document type declarations) are corrected on the fly.
Work-arounds for various JDK bugs that prevent round-tripping of some characters in Japanese encodings
Work-arounds for bugs in some versions of Xalan, as well for bugs in the OASIS XSLT conformance test suite.
Improved compatibility with Java 1.5
1.0a5 makes no backwards incompatible changes to the published API. Changes since the previous release include:
The ParsingException
and ValidityException
classes now have a getURI()
method that returns the
URI of the document whose error caused the exception.
Test suite now runs OASIS Microsoft and Xalan XSLT tests
Improved compatibility with Java 1.2
Improved compatibility with recent releases of Xalan, including those bundled with JDK 1.4.2_03 and later
1.0a4 makes no backwards incompatible changes to the published API. Changes since the previous release include:
Nodes.remove(int)
now returns the node removed.
The IBM virtual machine 1.4.1 is no longer special cased.
The API documentation has undergone extensive editing.
The unpublished nu.xom.xerces
package has been removed.
1.0a3 makes no backwards incompatible changes to the published API. It adds one new protected method. Changes since the previous release include:
The Element
copy constructor and
copy
methods are no longer recursive, so they
shouldn't cause stack overflows in deep documents. This necessitated adding a
protected shallowCopy()
method that can be used to create an instance of a subclass
of Element
. Overriding this is preferred to overriding copy()
when one wishes
to maintain the objects' types after a copy.
The getBaseURI()
method is also no longer recursive.
The W3C XML Schema Language and WML and HTML DOMs have been removed from the bundled version of Xerces to save space.
XOM now uses character references only when necessary for all encodings supported by the local virtual machine. However, this may be quite a bit slower than the explicitly supported encodings like UTF-8 and the ISO-8859 character sets. Measurements remain to be performed.
1.0a2 makes no changes to the published API. Behavioral changes since the previous release include:
java.net
URI classes.Builder
no longer sets any Java
system properties for improved compatibility
with applets and multiclassloader environments.DOMConverter
was fixed1.0a1 is the first alpha release of XOM. The API is now considered to be reasonably stable and frozen. I may add to the API in the future, but the current API will not change without a very good reason. Most features should work pretty much as intended. There are no API changes since 1.0d25. Behavioral changes since the previous release include:
XOM now fully supports the 2nd candidate recommendation syntax for XInclude;
including preservation of xml:lang
values.
The base URI handling has been modified as follows:
getBaseURI()
always returns an absolute URI or the empty string if the base URI is not known.
Other than the empty string it never returns a relative URI.
It never returns null.setBaseURI()
method only accepts
an absolute URI. It throws a MalformedURIException
if you attempt
to pass it a relative URI, or a URI with a fragment identifier. (Relative URIs are still allowed in
xml:base
attributes.)XOM will not double verify when being fed data through Norm Walsh's catalog filter; provided that the underlying parser is good.
.Constraints on parentage are not checked when building with
NonVerifyingFactory
.
DOMConverter
and several methods have been rewritten
with non-recursive algorithms. Some work remains to be done in this area, however.
There appear to be some bugs in Sun's JDK 1.4.2_03 that break about 5 or 6 of the unit tests. All tests pass with JDK 1.4.2_02 and JDK 1.5.0a1. Ant 1.5.x is required to build XOM. I have been unable to get the tests to run with Ant 1.6, and the Ant developers seem actively hostile to any reports about this issue.
1.0d25 is the second last call release of XOM. I had planned for this to be alpha 1 and API freeze. However, enough changes since the last release were discovered to be necessary, that I decided to make this 1.0d25 instead. Anything that didn't change since the last release is probably pretty stable. However, there have been some new changes in this release that are worth reviewing and may change again:
All 21 protected checkFoo
methods have been removed.
Instead the various mutator methods (setters and other methods that change the
state of an object are now non-final so they can be overridden.
The getter methods are stil final and the fields are all private.
Thus to change the state of an object setter methods will till need to call the
constraint-verifying superclass mehtods.
This should give subclasses a lot more flexibility while not
compromising on well-formedness.
The Serializer
now throws UnavailableCharacterException
,
a subclass of XMLException
, instead of a raw XMLException
when it encounters a character it can neither write nor escape in the current encoding.
NodeFactory.makeDocument
has been renamed startMakingDocument
.
NodeFactory.endDocument
has been renamed finishMakingDocument
.
Added a method to DOMConverter
that converts a DocumentFragment
to a Nodes
.
Added XSLTransform.toDocument()
method that converts a Nodes
to
a Document
.
Element.removeChildren()
now returns a Nodes
object containing the children removed.
The LeafNode
class has been removed. DocType
,
Text
, Comment
, and ProcessingInstruction
now directly extend Node
.
Removed the hasChildren
method from Element
,
Node
, ParentNode
, Attribute
and Document
.
Element.addAttribute
is declared to throw the more specific MultipleParentException
instead of
IllegalAddException
There are also several changes that do not affect the API
ParentNode.replaceChild()
will not remove the old child unless it can insert
the new child. It can no longer do one but not the other.
Document.replaceChild
now allows replacing of the
DocType
by another DocType
or the root element by another element
Many methods including getValue
and toXML
have been rewritten using non-recursive algorithms so they are no longer
limited by Java's stack size. The samples package includes an example
of a non-recursive serializer.
Much better testing of canonicalizer. I am now fairly convinced it is correct in all or almost all cases.
Line breaks are now used between declarations in internal DTD subset
The JAR is compiled without debugging symbols to save space. (These can be turned on again easily enough in build.xml if anyone needs them.)
Added a XOMSamples.jar archive that includes all the sample code
The core JAR archive is sealed.
The API documentation has been thoroughly proof-read from start to finish.
1.0d24 is a very fast release to fix a bug that prevented 1.0d23 from being used in multi-classloader environments like Tomcat. A couple of bugs that prevented some of the test cases from successfully completing on Windows have also been fixed, a bug in the FibonacciServlet sample was corrected, and some of the documentation has been improved. The API has not changed at all. XOM is still in "last call".
This is the last call, pre-alpha release of XOM. My plan is that the next release will be the official API freeze for 1.0. While nothing is written in stone, I do plan to strenuously resist any backwards incompatible changes in the API after the next release (1.0a1). If you have any concerns about the API, now is the time to get them in.
There are several backwards incompatible changes in this release.
Most notably, the various makeNode()
methods in the
NodeFactory
class all return Nodes
objects. This means a factory can replace
one node type with a different node type
(e.g. changing elements into attributes and vice versa) or replace a single
node with several nodes.
Oher changes that may require code modifications include:
Attribute.Type.toXML
is now Attribute.Type.getName()
. This was necessary
to be consistent with handling attributes of type ENUMERATION, which is not a DTD keyword
though it is referenced in the Infoset.
Support for the November 2003 Working Draft syntax of XInclude,
including the xpointer
, accept
, accept-charset
, and
accept-language
attributes.
Documents will need to be rewritten to use the new syntax. In keeping with the
terminology in the new working draft,
MissingHrefException
has been renamed
NoIncludeLocationException
. CircularIncludeExcepion
has been renamed InclusionLoopException
.
The methods that resolve Nodes
objects have been marked private.
NamespaceException
has been broken up.
IllegalNameException
is used for problems with a namespace prefix.
MalformedURIException
is used for problems with a namespace URI.
NamespaceConflictException
, a subclass of WellformednessException
,
is used for cases where attributes, elements, and/or additonal namespace declarations
have conflicting bindings for the same prefix.
Removed NodeFactory
's makeWhiteSpaceInElementContent()
method
Removed no-args constructors from the various exception classes.
More or less backwards compatible changes in 1.0d23 include
IllegalDataException
and its subclasses have getData
and
setData
methods
to get and set the exact text that caused the exception.
Subclasses include IllegalNameException
,
IllegalTargetException
, and IllegalCharacterDataException
.
IllegalCharacterDataException
is now used where
IllegalDataException
was used previously.
XOMTestCase
is part of the published API.
Factory methods are now invoked in document order. Previously this wasn't true for text nodes, which weren't flushed until after the next tag, processing instruction, etc. This was necessary to enable text nodes to be maximally contiguous, though in fact they might not be if the factory returned several text nodes in a row for non-text nodes. In any case, with the default factory, or with a custom factory that does not remove any nodes or change their base types (e.g. coment to Text) text nodes are still hold the maximum possible contiguous run of text after a build.
Added support for GB18030 (Chinese) and ISO-8859-11/TIS-620 (Thai) encoding on output (requires Java 1.4)
Verifier
is now based on table lookup.
All JDOM code has been removed.
Serialization speed-ups for Non-Unicode, non-Latin-1 encodings
It is now possible to supply a NodeFactory
to
XSLTransform
to be used for
constructing nodes in the result tree
Improved support for IBM JVM 1.4.1
The Nodes
class now has insert
and remove
methods,
in addition to append
.
Added NoSuchAttributeException
for parallelism with NoSuchChildException
Unit tests have been dramatically expanded. There are now over 700 separate test methods, many of which perform several tests.
No longer allow the namespace URI http://www.w3.org/XML/1998/namespace to have any prefix other than xml, per conformance with the namespaces erratum
Allow the xml:
prefix (with the right URI) to be used on elements
per conformance with the namespaces recommendation
Better exception messages when name and namespace arguments are swapped
getBaseURI
returns null if the base URI can't be determined due
to a malformed xml:base
attribute.
And of course numerous bugs have been fixed, especially in XInclude.
This release collects numerous small new features, refactorings, renamings, unit tests, sample programs, and bug fixes. Many programs will need minor modifications and recompilation to work against this release. Visible changes include:
NodeList
has been renamed Nodes
.ParseException
has been renamed ParsingException
to avoid a conflict with java.text.ParseException
preserveBaseURI()
method in Serializer
has been renamed setPreserveBaseURI()
in keeping with JavaBeans naming conventions.translate
methods in DOMConverter
have been renamed convert()
DOMConverter
can now convert individual DOM nodes into XOM objects.
It is no longer limited to converting entire documents.ValidityException
now has a getDocument()
method which returns the
complete well-formed but invalid document. It also has getValidityError(int n)
,
getLineNumber(int n)
, and getColumnNumber(int n)
methods which return
information about the successive validity errors in the document.Serializer
, writeMarkup
has been renamed writeRaw
and writeText
has been renamed writeEscaped
since in subclasses these may not actually be writing markup.writeXMLDeclaration()
,
writeStartTag()
, and writeEmptyElementTag()
.getColumnNumber()
method to
Serializer
to assist subclasses that
want to implement their own line breaking strategies.Builder
to be used when XIncludingXIncludeException
(and its subclasses) can now report the URI
of the document where the problem was detectedSAXConverter
DatabaseBuilder
sample based on Example 8-13 from Processing XML with JavaSourceCodeGenerator
sample program that converts a well-formed XML
document into the XOM statements necessary to create the document.This is probably the last version that will support the old, XInclude 2002 Candidate Recommendation syntax. The next release will likely support the new 2003 Working Draft syntax.
This release collects a number of small changes, refactorings, and bug fixes. Most programs should continue to work as they did previously without modification or recompilation. Visible changes include:
Added protected checkDetach
method in Node
which subclasses can override to prevent or track nodes being detached.
The copy
method is no longer final in the various
node classes such as Element
. Subclasses should override this
metod to return an instance of the speciifc subclass.
Cycles (an element acting as its own parent or ancestor)
are no longer allowed. Attempting to create one throws a
CycleException
.
NodeFactory.makeDocument()
no longer takes an Element as an argument.
It is the responsibility of the NodeFactory
to construct a suitable root
element. However, when parsing this will quickly be replaced by the
actual root element.
Serializer.setIndent
throws an IllegalArgumentException
for negative values
Fixed bug where line breaks would be added if indenting, even in elements
for which xml:space="preserve"
XInclude now consistently treats XPointers that don't match any subresource as resource errors, rather than including nothing.
xml:base
attributes added to XIncluded elements no longer
have fragment IDs
A couple more XPointer syntax errors are now detected when XIncluding
In XIncludeException
the getRootCause
and setRootCause()
methods have been replaced by initCause()
and getCause()
.
The initCause
method in the various exception classes
now behaves much more consistently with its definition in Java 1.4.
XSLException
no longer extends XMLException
.
This means it is now a checked
exception instead of a runtime exception.
Xalan 2.5.1 has replaced Saxon 6.5.2 as the bundled XSLT processor due to a bug in SAXON that incorrectly reported document fragments resulting from XSL transforms
Minor usability improvements and code cleanups in the build.xml file
Added an overview page to the API docs
This release adds a workaround for
Java's broken, non-conformant handling of file:
URLs on Windows. The problem manifested itself as
an inability to resolve relative URLs in documents built with
the
Builder.build(File)
method. This caused the
failure of a couple of dozen unit tests. Unix users were not affected
(which is why I didn't notice the problem sooner).
There are no API-level changes in this release.
The JAR archive is no longer compressed, which means a larger JAR archive but faster class loading on initial startup.
The major API level change in XOM 1.0d19 is in NodeFactory
.
makeElement
has been renamed startMakingElement
and endElement
has been renamed finishMakingElement
.
startMakingElement
behaves the same as the old
makeElement
. However, finishMakingElement
now has a slightly
different contract. if it returns null, the entire element is deleted from the tree.
It is no longer necessary to explicitly call detach
.
If it returns a different element than the one passed to it, then the old
element is deleted from the tree and the new one is inserted in its place.
This is more consistent with the other methods in this class.
Return the node you want added to the tree,
or null for no node at all.
The second big change has no API-level impact.
By default, the Serializer
and
toXML
methods now use
numeric character references to
to escape all tabs, carriage returns, and line feeds in
attribute values and all carriage returns in text nodes.
This helps make round tripping more reilable and robust.
However, if the user indicates that white space is not significant
by calling either setMaxLength
or
setIndent
, then these characters
may not be preserved. If the client calls setLineSeparator
,
then tabs will still be preserved but carriage returns and line
feeds may not be.
There are also several minor improvements and bug fixes:
Node.equals()
method now executes in about half the time
it took in previous releases.
1.0d18 adds one minor new feature and one major new feature.
The minor feature is that nu.xom.tests.XOMTestCase
is now public. This class is very useful for comparing two documents
or pieces thereof for deep equality. For example, I use it to compare the
actual output of the XInclude test cases to the expected outputs.
I'm still working on the API and detailed behavior, but I think it's solid
enough to be useful for other people's unit testing.
Now the major feature, and this one's way cool:
It is now possible to subclass NodeFactory
in order to filter and/or stream your processing.
XOM can now handle documents of effectively arbitrary size
with only slightly more memory use than the
underlying SAX parser!
I really need to write an article about
this style of mixed tree/stream processing, but in the meantime
here are the key things you need to know:
NodeFactory
subclass with the Builder
.
I've added a couple of constructors to Builder
to
make this easier.
NodeFactory
has
one makeNode
method for each of XOM's node types.
You must return a node of the requested type, but you can change its
name, namespace, value, or other characteristics before doing so.
makeNode
method.
This saves both the memory needed to store the node and the time
required to build it.
endElement()
in NodeFactory
. This supports streaming. Before the builder
calls this method, it has completely built the element with all its content.
The usual XOM methods
all work on it. You do not have process every element in order to process some.
You can do a quick check on the name and namespace of the element
(or other characteristics) to figure out what you want to do with it.
If you don't want to process the element, just return.
For example an XHTML spider could easily look at each a
element and ignore all the other elements in the document. Indeed
it wouldn't even have had to build them or any of their content in the
first place.
endElement()
method and
detach()
it when you're done.
As long as you haven't stored a reference to it somewhere,
the element can then be garbage collected as needed. This is
how XOM processes documents larger than available memory.
This is sort of like SAX callbacks, except it's
much more convenient because you have the entire element to work with.
You do not need to build a custom data structure to hold onto the content
until you're ready to work with it. The element is its own data structure.
NodeFactory
and two new constructors
in Builder
. The rest of the API
is unchanged. You can forget about it until you need it.
More details are in the JavaDoc for NodeFactory
,
and I've written lots of new sample programs that you'll find in the
nu.xom.samples
package. Many of them are streaming versions of earlier, less memory efficient
samples.
This developed from an idea proposed by John Cowan, based on Simon St. Laurent's work with MOE. There have been things like this before, (DOMBuilderFilter in DOM3, MOE, ElementScanner in JDOM, and of course SAX filters) but I don't think any API has done quite as neat a job as XOM now does. This is really powerful stuff. Not only does it make programs faster and much, much smaller. It makes them much easier to write. For instance, you can easily throw away all white space only nodes on build so you're left with only the real content of the document, no more white space nodes getting in the way of your navigation. I urge you to check this out. It will radically change how you think about processing XML.
This release is API compatible with 1.0d17. All programs that compiled in 1.0d17 should still compile in 1.0d18 without any edits.
The is primarily a bug fix release.
There are only very minor API changes, the most significant
of which is that XSLTransform
is final.
Other fixes and improvements in this release include:
toString
methods and fixed various bugs
thereby uncoveredxpointer()
scheme or unparsed entities.
In a couple of cases, it's actually conformant to the as yet
unpublished XInclude proposed recommendation rather than the
published candidate recommendation.
Builder
and Verifier
thanks to
SameThe primary focus of this release is adding unit tests for XSLT, and fixing the bugs they uncovered:
More accurate exception messages from the XSLTransform
constructors
XSLT unit tests
The distribution now includes the SAXON jar archive so that XSLT works with Java 1.2 and 1.3 VMs.
Fixed a nasty bug in Element.toXML
that was making XSLT transforms
fail when elements were in the default namespace
You can now transform a NodeList
as well as a complete document
Other assorted improvements in this release include:
The standard jar file no longer includes the samples, tests, and benchmarks packages. You can compile these from source if you need them, but omitting them makes the jar file smaller for developers who want to bundle XOM with their own applications.
The jar file is indexed to improve class loading speed.
I moved SAXConverter
and DOMConverter
out of the core package into a new
nu.xom.converters
package. They're fairly special
purpose.
Improved compatibility with Java 1.2.
SAX filters can no longer bypass well-formedness checks
Worked around a Xerces and Crimson bug that inhibits relative URL resolution from pathless base URLs such as http://www.cafeconleche.org
The FibonacciSOAPClient
sample program works now
Document.insertChild(DocType, position)
now throws an IllegalAddException
if the Document
already has a DocType
, rather than silently replacing it.
The primary focus of this release is XInclude. To my knowledge, XOM is now completely conformant with with the XInclude candidate recommendation including:
element()
schemesxml:base
attributes are added to included elements as necessary
to preserve base URI information
I've also written 24 unit tests for XInclude and fixed numerous bugs
including one in the
Document
and Element
copy constructors that failed to preserve base URI.
Other changes in this release include:
Element.getChildElements(String name, String namespaceURI method)
now allows a null or empty string local name to stand for any local name,
so you can use this method to get all elements in a certain namespace.
Serializer
no longer wraps and indents text when xml:space="preserve"
,
regardless of the setting of indents and maxlength.
This release should be completely compatible with code written against 1.0d14. You should not even need to recompile existing programs.
The primary focus of this release is speed. I've done extensive profiling of the CPU times used by XOM, and rearchitected classes to run faster by both macro and micro optimizations. One of the things I discovered was that parsing and serialization are dramatically slower than in-memory manipulations, typically by three orders of magnitude. Right now my belief is that any program that does any parsing or serialization (and it's hard to imagine what program wouldn't do at least one of those two) is going to spend so much time doing that, that nothing else is worth optimizing. Parsing and serialization are typically three orders of magnitude slower than in-memory manipulations, even when all I/O is performed between byte arrays. There's simply no point to optimizing anything else.
That said, I have optimized parsing/document building extensively in this release. It is much, much faster than in previous releases. It should now be competitive with any other tree-based API written in Java, though naturally it's still slower than a straight forward SAX parse because it sits on top of SAX. The biggest effects on speed now are I/O (don't forget to buffer your streams) and the speed of the underlying parser. I'm still recommending Xerces because it's the only I've found that's almost correct, but you can speed XOM up by a factor of a third by switching to Crimson, and possibly more by switching to Piccolo. However, both of those have nasty bugs that prevent the XOM unit tests from completing successfully. Xerces has a couple of bugs too, but fortunately nothing I couldn't work around.
Contrary to popular belief, most of the optimizations improved both speed and memory use.
There were few trade-offs between them. However, there was one notable exception.
The Text
class is now storing its data internally in UTF-8. This cuts memory usage for
mostly ASCII text by about 10-20%. However, it has a noticeable 10% speed penalty.
I'm not sure if I'm going to keep this strategy or not. Ideally, I'd like to provide
some sort of runtime switch to select this behavior (or not) but I haven't
yet figured out the right design to make this happen. The constraints on the design are:
Text
classsetValue()
method in the Text
classnu.xom
package cannot bypass verificationgetValue()
and setValue()
should not use instanceof
.
(Profiling has shown this is a
performance killer.)
There are no public API level changes in this release. However, the unit tests have been expanded dramatically, which resulted in the discovery and elimination of a number of bugs. Internal changes in 1.0d14 include:
Element.insertChild(String, int)
now throws a NullPointerException
if
the first argument is nullDocument
's copy constructor
that caused the prolog and epilog of a document not to be copied.nu.xom.samples
package is no longer bundled
with the main JavaDoc, to indicate that this is not really a
part of the public API. For the moment, if you want this you'll have to
build it yourself, though it's not very useful. I have not spent a
lot of effort on the comments in the samples package.
The primary focus of this release is memory. I've done extensive profiling
of the memory used by XOM, plugged memory leaks, and
rearchitected classes to use less memory. The Element
class has fewer fields
than before and uses lazy initialization so many complex fields are null
until and unless they're actually used. With this release XOM programs should use less than
half the memory they used previously. I now have a rough estimate
that for large (a hundred kilobytes or more), primarily ASCII-range XML documents
encoded in UTF-8,
the corresponding XOM Document
object is five to six times the
size of the input XML.
Less complex documents without attributes
or namespaces are likely to be smaller than documents of the same physical size
with attributes and namespaces.
If the original document is encoded in UTF-16, the size difference
is likely to be more like 2 to 3 times.
Measurements are currently showing that almost
all the space is taken up by strings and char arrays (mostly inside strings and
string buffers). There might be a few places where I can make a nip here
or a tuck there, but further large-scale memory optimization would have to look at
using UTF-8 internally instead of UTF-16. (Possibly I can get away with
doing this in just a couple of places like the Text
class.)
One area I can still explore is whether it might make sense to intern strings.
Generally, the parser does this for anything read from a document, and
the compiler does it for string literals; but there might still be a few
opportunities here.
I've also done a little work on speed as well, though not nearly as extensive. Mostly I just picked off some low-hanging fruit the profiler made obvious. More serious work remains to be done. My inital measurements focused on document building. About 25-35% of the time was eaten by the parser. Another 25-35% went into verification, the biggest chunk of which was text content. The rest was divided up into dribs and drabs of actual document building. The single biggest time waster was this method:
private static boolean isXMLCharacter(int c) {
if (c <= 0xD7FF) {
if (c >= 0x20) return true;
else {
if (c == '\n') return true;
if (c == '\r') return true;
if (c == '\t') return true;
return false;
}
}
if (c < 0xE000) return false; if (c <= 0xFFFD) return true;
if (c < 0x10000) return false; if (c <= 0x10FFFF) return true;
return false;
}
Even small optimizations here could have a large effect,
so let me know if you see any. However,
I'm probably going to redesign the XOMHandler
class
in 1.0d14 so it bypasses verification.
The assumption is the parser will have already checked
all this.
There are a few API level changes in this release:
insertChild
(and checkInsertChild
and
checkRemoveChild
) have been reversed.
These methods are now:
public void insertChild(Node child, int position)
protected void checkInsertChild(Node child, int position)
protected void checkRemoveChild(Node child, int position)
The previous order just didn't feel natural to me.
removeChild
methods now return the Node
they remove:
public Node removeChild(int position)
public Node removeChild(Node child)
Builder
method
public Document build(String document, String baseURI)is now declared to throw an
IOException
like the other build()
methods because an IOException
can occur while
parsing the external DTD subset.equals()
and hashCode()
methods were removed
from the XSLTransform
class.
They're probably not necessary, and their behavior was underspecified.
Element
were marked final: getAttributeCount(
),
getNamespacePrefix(int index)
, removeChildren()
,
and getAttribute(int)
. Their previous non-finality
was an oversight. In addition, they're a number of small changes in behavior that don't change the API:
Serializer
transparently uses a custom subclass of
OutputStreamWriter
that does handle EBCDIC correctly.
EBCDIC input is still broken, at least for parsers that rely on
Java to do EBCDIC-Unicode conversions.
Serializer
and Builder
DocType
can now be the empty string,
in conformance with the XML spec.Finally, there were a number of small bug fixes, and lots of code cleanups throughout. The most significant bug fix involved setting or changing the namespace URI of XHTML elements (and other elements that use the default namespace).
This release removes the insertBefore
insertAfter
methods from ParentNode
because:
However, if anyone howls too loudly about this, I can probably be convinced to put them back in.
This release also fixes a bug that arose when removing the namespace from an element that had attributes, such as might occur when converting XHTML to plain vanilla HTML.
The new feature in this release is an ANT build file. This should make it much easier to compile XOM from source. ANT is not included though. You'll have to download and install it separately.
There are no API-level changes in this release. All code that ran before should still run. This release does fix three assorted bugs reported by users:
Not surprisingly these all appeared in the Builder
and Serializer
classes, which out of all the classes in XOM
are the least well-covered by unit tests.
I've expanded the unit tests to catch these and related bugs.
The unit tests all pass, assuming you use a non-buggy SAX2 parser.
However, if you run the JUnit GUI from the ANT build file,
some confusing class loader issues cause the more-buggy Crimson
to be loaded instead of the less-buggy Xerces. This breaks four
unit tests. Everything should pass if you run
the tests directly instead of from ANT.
(That is, type "java -Xmx96m junit.swingui.TestRunner nu.xom.tests.XOMTests"
instead of "ant testui".)
If anyone can explain to me how I might fix this,
I'd appreciate it.
This release fixes various bugs in namespaces,
and makes one API change. The
declareNamespace
method is once again
addNamespaceDeclaration
.
Under the hood, however, there are much more significant changes in namespace handling, and these are likely to break some existing applications. In particular,
getNamespaceDeclarationCount
now counts all the local namespaces of the
element; not just additional namespace declarations.
It has at least one entry for the namespace of the element
(even if the element is in no namespace), one namespace for each
attribute in a namespace, and one namespace for each additional namespace declaration.
However, namespaces used multiple times are only counted once.
Namespaces in-scope from an ancestor but not directly used on the
element are not included.
getNamespacePrefix(int i)
iterates across this
list of local namespaces.
Chances are all code that calls either of these two
methods will need to
be rewritten.
getNamespacePrefix("")
should now always return the
default namespace in scope.
If no default namespace is in scope
it returns the empty string, not null.
Removed vestigial getNextSibling()
and getPreviousSibling()
methods from Document
.
These should have been removed earlier.
Comment
:check
to checkValue
setData
to setValue
ProcessingInstruction
class:checkData
to checkValue
setData
to setValue
Text
:check
to checkValue
setData
to setValue
ParentNode
:checkRemove
to checkRemoveChild
for symmetry with checkInsertChild
Element
:
public final void appendChild(String text) public final void insertChild(String text, int position)
Fixed Builder
bug that prevented parsing
File
objects whose
filenames contained spaces and other non-URL legal characters
Fixed equals()
method in Attribute.Type
to work in mutliclassloader
environments
Corrected usage instructions in samples programs to include the package name
Added checks on values of xml:base
attributes that they are legal IRIs.
Mainly this involves checking the hex escaping.
XSLT works (modulo some obscure bugs in handling the
undeclaration of the default namespace. I need to get some
clarification on the proper behavior of SAX processors to
fix this.) The TrAX XOMSource
and XOMResult
classes are not yet
public because I'm still thinking about the proper API for these,
but you can use the XSLTransform
class for most use-cases.
You'll need a TrAX compliant XSLT engine such as
Saxon
or Xalan-J 2.4 somewhere in your classpath
to use this.
It is now possible to undeclare the default namespace on a prefixed element
by passing the empty string as the prefix and URI to
declareNamespace()
.
Added constraint that an element cannot have two attributes with the same local name and same namespace URI, but different prefixes.
Changed automatic attribute replacement to depend on local name and namespace URI and never on qualified name alone.
Removed the getFirstChild(
),
getPreviousSibling(
), and getNextSibling()
methods from Node. These really didn't fit the XOM model of indexed access,
and were slower than the indexed equivalents.
Added indexOf()
method to ParentNode that returns the position of
a given node within its parent, or -1 if the node is not a child of this ParentNode
.
This is helpful for those few cases where
you do need to identify a node's sibling.
public int indexOf(Node child)
Spell checked the API documentation
Moved XOMResult
into the nu.xom.transform
package.
XSLT still doesn't work, but it's a little closer to working.
This release makes very limited backwards incompatible changes to the API.
(A few formerly public methods in Serializer
are now protected.)
Almost all code that previously compiled and ran with 1.0d4 and 1.0d5, should still compile and run.
New features in the API in this release include:
Namespace URIs must now be absolute URI references
Element.toXML
now generates empty-element tags for empty elements
Added a nu.xom.xincluder
package to provide XInclude support
The samples package includes a driver program that uses this
to resolve XIncludes in existing documents.
Added a nu.xom.canonical
package to provide Canonical XML serialization.
The samples package includes a driver program that can canonicalize
documents.
Serializer
has four new protected methods
to provide subclasses with more access to the underlying OutputStream
:
protected final void writePCDATA(java.lang.String text) throws IOException
protected final void writeAttributeValue(java.lang.String value) throws IOException
protected final void writeMarkup(java.lang.String text) throws IOException
protected final void breakLine() throws IOException
In addition, several bugs were fixed:
Fixed TextWriter
bug that prevented the line separator from being changed
Fixed a bug that allowed the namespace URI of a prefixed element to be changed to the empty string.
Fixed a bug that allowed the prefix of an element to be changed to something that conflicts with one of its attributes or additional namespace declarations
Fixed a bug that prevented the detach()
method from working on leaf nodes
Fixed a bug pointed out by Laurent Bihanic in getNamespaceURI(String prefix)
that failed to return namespace URIs from more than one level up in the hierarchy
Fixed a cosmetic bug in the handling of nbsp in ISO-8859-11 Thai
Relative URLs in system identifiers for DTDs are now resolved against the base URI of the document specified in the builder instead of the current working directory.
This release makes no backwards incompatible changes to the API. All code that previously compiled and ran with 1.0d4, should still compile and run. New features in the API in this release include:
I've added
getName()
, equals()
, hashCode()
, and
toString()
methods to the Attribute.Type
inner class.
Environments with multiple class loaders should use
the equals()
method instead of direct equality comparison.
I added a new build
method to Builder
that builds a XOM Document
from a java.io.File
.
I added two more build
methods to Builder
that allow the base URI to be specified when
building from a Reader
or an InputStream
.
I added an experimental build
method
to Builder
that builds a XOM Document
directly
from a String
containing well-formed XML.
I cleaned up the internal code in Builder
substantially by
refactoring duplicate code into private methods.
I fixed a bug that was preventing the default XMLReader
from being loaded
in some circumstances
Serializer
now supports all defined ISO-8859 character sets,
including:
Note that although XOM supports them, not all Java virtual machines do.
Serializer
now
matches character set names case-insensitively as suggested by the
XML specification.
Fixed a bug in UnicodeWriter
that was preventing reserved characters
such as & and < from being
escaped when the encoding was some variant of Unicode. (This is more evidence
that premature optimization is the root of all evil. I just couldn't resist an obvious optimization
in the UnicodeWriter class, and it came back to bite me in the ass.)
Fixed a cosmetic bug that added unnecessary xmlns=""
declarations on root elements by Serializer
and toXML
in Element
Fixed incorrect hexadecimal escape sequences generated by TextWriter
The major addition in 1.0d4 are methods to get and set the base URI
of a node. You can invoke getBaseURI
from any Node
object to retrieve the URL against which relative URLs in that Node
should
be resolved. This is calculated in keeping with XML Base. That is,
if an xml:base
attribute is in scope its value is used.
Otherwise, the URI of the entity in which the Node
appears
is loaded. You can change the underlying URI of the entity using
the setBaseURI
method in ParentNode
.
When a document is built, the parser fills in the base URI for each node.
This is stored separately from xml:base
attributes, which are not treated
differently than any other attribute.
When a document is serialized, you may request that the serializer fill
in extra xml:base
attributes not present in the infoset
to preserve the underlying base URIs. However, since this is a structural change to the document,
this feature is turned off by default.
Other API level changes include:
The Attributes
and Namespaces
classes are no longer
part of the public API. Instead the Element
class has these four public methods:
public Attribute getAttribute(int index)
public int getAttributeCount()
public int getNamespaceDeclarationCount()
public void getNamespacePrefix(int i)
getStringForm
has been renamed toXML
readAttribute
has been renamed getAttributeValue
addAdditionalNamespace
has been renamed declareNamespace
The removeChildren
method
has been moved from ParentNode
into Element
because it's impossible to remove all the children of a
Document
.
The following protected methods allow subclasses to
monitor insertions and deletions
from subclasses of Element
and Document
:
public void checkInsertChild()
public void checkRemoveChild()
The following protected methods allow subclasses of Element
to monitor
namespace declarations:
public void checkAddNamespaceDeclaration()
public void checkRemoveNamespaceDeclaration()
The following protected methods allow subclasses of Element
to monitor
changes of local name, namespace prefix, and namespace URI:
public void checkLocalName()
public void checkNamespacePrefix()
public void checkNamespaceURI()
The missing write(DocType)
method has been added
to Serializer
. This fixes a nasty infinite recursion when serializing documents
with document type declarations.
In addition several bugs were fixed, the JavaDoc was further cleaned up and improved, and more than a dozen new unit tests were added.
The major change in 1.0d3 is that the TreeNode
class has been replaced by the
ParentNode
class. The only immediate subclasses of
ParentNode
are Element
and Document
.
Attribute
is the only immediate subclass of Node
The other four node types are subclasses
of LeafNode
which is a subclass of Node
.
All navigation methods—getChild
,
getNextSibling
, getParent
, etc.—are now in Node
.
All insertion and deletion methods—appendChild
,
insertChild
,
removeChild
, etc.—are only available in
ParentNode
, that is,
Document
and Element
.
Other API-level changes since 1.0d2 include:
add
is now addAttribute
LeafNode
is publicI also spent a lot of time improving the JavaDoc.
I've posted 1.0d2 to fix the first bugs discovered, clean up the source code, and make a few changes to method names that seemed wise. API-level changes since Tuesday night include:
readAttribute
is now getAttributeValue
howManyChildren
is now getChildCount
Copyright 2002-2005, 2009, 2013, 2018-2023 Elliotte Rusty Harold
elharo@ibiblio.org
Last Modified January 22, 2023