Saturday, October 14, 2006

A Custom Digester rule

I recently took another look at Apache Commons Digester, a popular toolkit for parsing XML files into JavaBeans. The last time I used it, it was to parse an XML configuration file for an application I was building. Nowadays I would use Spring, but that was a long time ago.

The Digester is built on top of a standard SAX parser. A SAX parser reacts to events fired when the various opening and closing tags are encountered. Since there is a single event handler per parser, the code to handle various types of elements for a single event can get pretty messy. Unlike the standard SAX parser, the Digester matches Rules to element patterns. The element patterns look like XPath expressions, and the Rules are objects which operate on a Stack maintained by the Digester.

Because the code for handling various tags (as indicated by the XPath like expressions) are all encapsulated in the Rules, and because the Digester package comes with a small but very comprehensive set of generic Rules, parser code written using the Digester is very readable and easy to maintain.

The last time I used Digester, I had pretty much copied and pasted code that I found somewhere, and magically it all worked, so I did not really bother to understand how it actually worked. This time around, my requirements went beyond what was addressed by the basic examples, so I was forced to read up a bit on it. These web pages served as excellent introductions:

In addition, there is also the Reference Manual buried in the API Javadocs. It is also always a good idea to download the source distribution and look at the examples. Finally, the The Jakarta Commons Cookbook has some very interesting recipes on Digester use as well.

My first take was that you could use Digester only if you built your JavaBean to conform really closely to the XML input file. It was not clear to me was how to map an attribute or element named "foo" to a JavaBean property named "bar". It is actually fairly easy with the CallMethodRule. The basic pattern of parsing an XML file with a Digester is as follows:

1
2
3
4
5
6
Digester digester = new Digester();
digester.addRule("/person", new ObjectCreateRule(Person.class));
digester.addRule("/person/name", new CallMethodRule("setNomDePlume", 1));
digester.addRule("/person/name", new CallParamRule(0));
...
Person person = digester.parse(xmlFileName);

While the above pattern sufficed for most of my requirements, the one thing I could not get from the built in Rules was to set the contents of a Node to a String bean property. Digester offers the NodeCreateRule which can read a Node object specified by a pattern and set a Node member variable, but since I wanted to set the variable directly, I created a very simple custom rule which I called the SetSerializedNodeRule, and which is shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
public class SetSerializedNodeRule extends NodeCreateRule {

  private String method;

  public SetSerializedNodeRule() throws ParserConfigurationException {
    super(Node.ELEMENT_NODE);
  }

  public SetSerializedNodeRule(String method) throws ParserConfigurationException {
    this();
    this.method = method;
  }

  public void end() throws Exception {
    Element nodeToSerialize = (Element) super.digester.pop();
    String serializedNode = serializeNode(nodeToSerialize);
    invokeMethodOnTopOfStack(method, serializedNode);
  }

  protected String serializeNode(Element nodeToSerialize) throws Exception {
    StringWriter writer = new StringWriter();
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.newDocument();
    OutputFormat format = new OutputFormat(doc);
    XMLSerializer serializer = new XMLSerializer(writer, format);
    serializer.serialize(nodeToSerialize);
    String serialized = writer.getBuffer().toString();
    return serialized;
  }

  protected void invokeMethodOnTopOfStack(String methodName, String param) throws Exception {
    Object objOnTopOfStack = digester.peek();
    MethodUtils.invokeExactMethod(objOnTopOfStack, methodName, param);
  }
}

To help write this Rule, I looked at the source for NodeCreateRule and some of the other Rule objects in the Apache Commons Digester source distribution. The XML serialization uses the org.apache.xml XMLSerializer to convert the Node object to a String, then calls the specified method on the top level object at the top of the Stack. Calling this rule is similar to calling the CallMethodRule, passing in the method name that should be invoked with the serialized contents of the Node object we are pointing to. Here is an example:

1
digester.addRule("foo/bar", new SerializeNodeRule("setBody"));

I found Digester to be quite simple to use, thanks to its clean design and the Rules that come bundled with the distribution. The readability and maintainability of the code also goes up enormously when switching from SAX or DOM parser implementations. I haven't really run any performance numbers, but I am guessing that the performance would be slower than an equivalent SAX parser but much faster than a DOM parser, and the memory footprint would be comparable to a SAX parser.

8 comments (moderated to prevent spam):

M.W.Park said...

wow!! it really helped me.
thanks a lot.

Sujit Pal said...

Hi manywaypark, thanks for the feedback and I am glad it helped.

Anonymous said...

Does this work with JDK 1.5? I just get null for the popped node

Sujit Pal said...

Hi Rob, its been a while (3 yrs) since the post, and I honestly don't remember if I was using JDK 1.5 at the time, but I suspect I was...I don't have the code handy to rerun it anymore (died in a disk crash). Sorry...

Anonymous said...

Digester uses reflection which is slow. The Digester object is not thread safe and a new instance must be created every time. This is therefore useful when loading custom one off xml configurations (e.g. struts 1.x uses it to process the struts-config.xml ) but not good when parsing xml at run time.

Sujit Pal said...

Yes, agreed about the speed issues, its also quite a lot of work to build, currently I just use XPath for this sort of stuff.

Victor said...

Thank you for sharing your findings, this saved me at least half a day's work. You are soo cool.

Sujit Pal said...

You are welcome Victor :-).