Saturday, October 06, 2007

Converting XML to Badgerfish JSON

The Badgerfish convention defines a standard way to convert an XML document to a JSON object. Their website lists tools written in PHP and Ruby, and even a web service, but I needed one for Java. Since the conversion rules are nicely enumerated on their site, it did not seem terribly difficult to write one myself, so I did. The code for the converter is modeled after the JDOM XMLOutputter, and allows for outputting either the compact format (for computer programs) or the pretty format (for humans). Unlike the JDOM XMLOutputter, however, methods are only provided to work with a JDOM Document object. An additional convenience outputString() method is provided to work with an XML string, converting it to a JDOM Document internally. The code is shown below:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
package com.mycompany.myapp.converters;

import java.io.IOException;
import java.io.OutputStream;
import java.io.StringReader;
import java.io.Writer;
import java.util.List;

import net.sf.json.JSONObject;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.jdom.Attribute;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.Namespace;
import org.jdom.input.SAXBuilder;
import org.jdom.output.Format;

/**
 * Provides methods to convert an XML string into an equivalent JSON string 
 * using the Badgerfish convention described in http://badgerfish.ning.com.
 *  
 * Conversion Rules copied from the website are enumerated below:
 * 
 * 1. Element names become object properties
 * 2. Text content of elements goes in the $ property of an object.
 *    <alice>bob</alice>
 *    becomes
 *    { "alice": { "$" : "bob" } }
 * 3. Nested elements become nested properties
 *    <alice><bob>charlie</bob><david>edgar</david></alice>
 *    becomes
 *    { "alice": { "bob" : { "$": "charlie" }, "david": { "$": "edgar"} } }
 * 4. Multiple elements at the same level become array elements.
 *    <alice><bob>charlie</bob><bob>david</bob></alice>
 *    becomes
 *    { "alice": { "bob" : [{"$": charlie" }, {"$": "david" }] } }
 * 5. Attributes go in properties whose names begin with @.
 *    <alice charlie="david">bob</alice>
 *    becomes
 *    { "alice": { "$" : "bob", "@charlie" : "david" } }
 * 6. Active namespaces for an element go in the element's @xmlns property.
 * 7. The default namespace URI goes in @xmlns.$.
 *    <alice xmlns="http://some-namespace">bob</alice>
 *    becomes
 *    { "alice": { "$" : "bob", "@xmlns": { "$" : "http:\/\/some-namespace"} } }
 * 8. Other namespaces go in other properties of @xmlns.
 *    <alice xmlns="http:\/\/some-namespace" xmlns:charlie="http:\/\/some-other-namespace">bob</alice>
 *    becomes
 *    { "alice": { "$" : "bob", "@xmlns": { "$" : "http:\/\/some-namespace", "charlie" : "http:\/\/some-other-namespace" } } }
 * 9. Elements with namespace prefixes become object properties, too.
 *    <alice xmlns="http://some-namespace" xmlns:charlie="http://some-other-namespace"> <bob>david</bob> <charlie:edgar>frank</charlie:edgar> </alice>
 *    becomes
 *    { "alice" : { "bob" : { "$" : "david" , "@xmlns" : {"charlie" : "http:\/\/some-other-namespace" , "$" : "http:\/\/some-namespace"} } , "charlie:edgar" : { "$" : "frank" , "@xmlns" : {"charlie":"http:\/\/some-other-namespace", "$" : "http:\/\/some-namespace"} }, "@xmlns" : { "charlie" : "http:\/\/some-other-namespace", "$" : "http:\/\/some-namespace"} } }
 *    
 * @author Sujit Pal
 */
public class JsonOutputter {

  private final Log log = LogFactory.getLog(getClass());

  private int indent = 0;
  
  /**
   * Set the format for the outputter. Default is compact format.
   * @param format the format to set.
   */
  public void setFormat(Format format) {
    String indentString = format.getIndent();
    if (indentString != null) {
      indent = format.getIndent().length();
    }
  }
  
  /**
   * Converts a JDOM Document into a JSON string and writes the result into
   * the specified OutputStream.
   * @param document the JDOM Document.
   * @param ostream the OutputStream.
   * @throws IOException if one is thrown.
   */
  public void output(Document document, OutputStream ostream) throws IOException {
    ostream.write(outputString(document).getBytes());
  }
  
  /**
   * Converts the JDOM Document into a JSON string and writes the result into 
   * the specified Writer.
   * @param document the JDOM Document.
   * @param writer the Writer.
   * @throws IOException if one is thrown.
   */
  public void output(Document document, Writer writer) throws IOException {
    writer.write(outputString(document));
  }
  
  /**
   * Convenience method that accepts an XML string and returns a String 
   * representing the converted JSON Object.
   * @param xml the input XML string.
   * @return the String representation of the converted JSON object.
   * @throws IOException if one is thrown.
   * @throws JDOMException if one is thrown.
   */
  public String outputString(String xml) throws IOException, JDOMException {
    SAXBuilder builder = new SAXBuilder();
    Document doc = builder.build(new StringReader(xml));
    return outputString(doc);
  }
  
  /**
   * Converts the JDOM Document into a JSON String and returns it.
   * @param document the JDOM Document.
   * @return the JSON String representing the JDOM Document.
   */
  public String outputString(Document document) {
    Element rootElement = document.getRootElement();
    JSONObject jsonObject = new JSONObject();
    JSONObject namespaceJsonObject = getNamespaceJsonObject(rootElement);
    processElement(rootElement, jsonObject, namespaceJsonObject);
    processChildren(rootElement, jsonObject, namespaceJsonObject);
    if (indent == 0) {
      return StringUtils.replace(jsonObject.toString(), "/", "\\/");
    } else {
      return StringUtils.replace(jsonObject.toString(indent), "/", "\\/");
    }
  }

  /**
   * Process the children of the specified JDOM element. This method is recursive.
   * The children for the given element are found, and the method is called for
   * each child.
   * @param element the element whose children needs to be processed.
   * @param jsonObject the reference to the JSON Object to update.
   * @param namespaceJsonObject the reference to the root Namespace JSON object.
   */
  private void processChildren(Element element, JSONObject jsonObject, JSONObject namespaceJsonObject) {
    List<Element> children = element.getChildren();
    JSONObject properties;
    if (jsonObject.has(getQName(element))) {
      properties = jsonObject.getJSONObject(getQName(element));
    } else {
      properties = new JSONObject();
    }
    for (Element child : children) {
      // Rule 1: Element names become object properties
      // Rule 9: Elements with namespace prefixes become object properties, too.
      JSONObject childJsonObject = new JSONObject();
      processElement(child, childJsonObject, namespaceJsonObject);
      processChildren(child, childJsonObject, namespaceJsonObject);
      if (! childJsonObject.isEmpty()) {
        properties.accumulate(getQName(child), childJsonObject.getJSONObject(getQName(child)));
      }
    }
    if (! properties.isEmpty()) {
      jsonObject.put(getQName(element), properties);
    }
  }

  /**
   * Process the text content and attributes of a JDOM element into a JSON object.
   * @param element the element to parse.
   * @param jsonObject the JSONObject to update with the element's properties.
   * @param namespaceJsonObject the reference to the root Namespace JSON object.
   */
  private void processElement(Element element, JSONObject jsonObject, JSONObject namespaceJsonObject) {
    JSONObject properties = new JSONObject();
    // Rule 2: Text content of elements goes in the $ property of an object.
    if (StringUtils.isNotBlank(element.getTextTrim())) {
      properties.accumulate("$", element.getTextTrim());
    }
    // Rule 5: Attributes go in properties whose names begin with @.
    List<Attribute> attributes = element.getAttributes();
    for (Attribute attribute : attributes) {
      properties.accumulate("@" + attribute.getName(), attribute.getValue());
    }
    if (! namespaceJsonObject.isEmpty()) {
      properties.accumulate("@xmlns", namespaceJsonObject);
    }
    if (! properties.isEmpty()) {
      jsonObject.accumulate(getQName(element), properties);
    }
  }
  
  /**
   * Return a JSON Object containing the default and additional namespace
   * properties of the Element. 
   * @param element the element whose namespace properties are to be extracted.
   * @return the JSON Object with the namespace properties.
   */
  private JSONObject getNamespaceJsonObject(Element element) {
    // Rule 6: Active namespaces for an element go in the element's @xmlns property.
    // Rule 7: The default namespace URI goes in @xmlns.$.
    JSONObject namespaceProps = new JSONObject();
    Namespace defaultNamespace = element.getNamespace();
    if (StringUtils.isNotBlank(defaultNamespace.getURI())) {
      namespaceProps.accumulate("$", defaultNamespace.getURI());
    }
    // Rule 8: Other namespaces go in other properties of @xmlns.
    List<Namespace> additionalNamespaces = element.getAdditionalNamespaces();
    for (Namespace additionalNamespace : additionalNamespaces) {
      if (StringUtils.isNotBlank(additionalNamespace.getURI())) {
        namespaceProps.accumulate(additionalNamespace.getPrefix(), additionalNamespace.getURI());
      }
    }
    return namespaceProps;
  }

  /**
   * Return the qualified name (namespace:elementname) of the element.
   * @param element the element to set.
   * @return the element name qualified with its namespace.
   */
  private String getQName(Element element) {
    if (StringUtils.isNotBlank(element.getNamespacePrefix())) {
      return element.getNamespacePrefix() + ":" + element.getName();
    } else {
      return element.getName();
    }
  }
}

The only dependencies for this code are commons-lang, JDOM and json-lib. I guess I could have just used the methods built into String, but I have gotten too used to StringUtils doing null-safe operations for me. JDOM happens to be my favorite XML parsing and generation toolkit by far, even though there are many toolkits that are more popular because they are faster. I also prefer using json-lib for JSON stuff than the more popular org.json module because of the way json-lib is architected.

Most of the rules have expected outputs for a given input, so testing the converter was simply a matter of writing a JUnit test case and making sure the inputs returned the expected outputs. Here is the JUnit test I wrote to test the converter.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
package com.mycompany.myapp.converters;

import java.io.StringReader;
import java.io.StringWriter;

import junit.framework.Assert;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.jdom.Document;
import org.jdom.input.SAXBuilder;
import org.jdom.output.Format;
import org.jdom.output.XMLOutputter;
import org.junit.Before;
import org.junit.Test;

/**
 * Test for XML to JSON conversion tool.
 * @author Sujit Pal
 */
public class JsonOutputterTest {

  private final Log log = LogFactory.getLog(getClass());
  
  private JsonOutputter jsonOutputter;
  
  @Before
  public void setUp() throws Exception {
    jsonOutputter = new JsonOutputter();
    // this call is redundant, really
    jsonOutputter.setFormat(Format.getCompactFormat());
  }
  
  /**
   * Rule 1: Element names become object properties
   * <foo><bar><baz>baztext</baz></bar></foo>
   * becomes:
   * {"foo":{"bar":{"baz":{"$":"baztext"}}}}
   */
  @Test
  public void testBadgerfishRule1() throws Exception {
    String xml = "<foo><bar><baz>baztext</baz></bar></foo>";
    String json = jsonOutputter.outputString(xml);
    log.debug("Rule 1:" + json);
    Assert.assertEquals("{\"foo\":{\"bar\":{\"baz\":{\"$\":\"baztext\"}}}}", json);
  }

  /**
   * Rule 2: Text content of elements goes in the $ property of an object.
   * <alice>bob</alice>
   * becomes
   * {"alice":{"$":"bob"}}
   */
  @Test
  public void testBadgerfishRule2() throws Exception {
    String xml = "<alice>bob</alice>";
    String json = jsonOutputter.outputString(xml);
    log.debug("Rule 2:" + json);
    Assert.assertEquals("{\"alice\":{\"$\":\"bob\"}}", json);
  }

  /**
   * Rule 3: Nested elements become nested properties
   * <alice><bob>charlie</bob><david>edgar</david></alice>
   * becomes
   * {"alice":{"bob":{"$":"charlie"},"david":{"$":"edgar"}}}
   */
  @Test
  public void testBadgerfishRule3() throws Exception {
    String xml = "<alice><bob>charlie</bob><david>edgar</david></alice>";
    String json = jsonOutputter.outputString(xml);
    log.debug("Rule 3:" + json);
    Assert.assertEquals("{\"alice\":{\"bob\":{\"$\":\"charlie\"},\"david\":{\"$\":\"edgar\"}}}", json);
  }

  /**
   * Rule 4: Multiple elements at the same level become array elements.
   * <alice><bob>charlie</bob><bob>david</bob></alice>
   * becomes
   * {"alice":{"bob":[{"$":"charlie"},{"$":"david"}]}}
   */
  @Test
  public void testBadgerfishRule4() throws Exception {
    String xml = "<alice><bob>charlie</bob><bob>david</bob></alice>";
    String json = jsonOutputter.outputString(xml);
    log.debug("Rule 4:" + json);
    Assert.assertEquals("{\"alice\":{\"bob\":[{\"$\":\"charlie\"},{\"$\":\"david\"}]}}", json);
  }
  
  /**
   * Rule 5: Attributes go in properties whose names begin with @.
   * <alice charlie="david">bob</alice>
   * becomes
   * {"alice":{"$":"bob","@charlie":"david"}}
   */
  @Test
  public void testBadgerfishRule5() throws Exception {
    String xml = "<alice charlie=\"david\">bob</alice>";
    String json = jsonOutputter.outputString(xml);
    log.debug("Rule 5:" + json);
    Assert.assertEquals("{\"alice\":{\"$\":\"bob\",\"@charlie\":\"david\"}}", json);
  }
  
  /**
   * Rule 6: Active namespaces for an element go in the element's @xmlns property.
   * Rule 7: The default namespace URI goes in @xmlns.$.
   * <alice xmlns="http://some-namespace">bob</alice>
   * becomes
   * {"alice":{"$":"bob","@xmlns":{"$":"http:\/\/some-namespace"}}}
   */
  @Test
  public void testBadgerfishRule6And7() throws Exception {
    String xml = "<alice xmlns=\"http://some-namespace\">bob</alice>";
    String json = jsonOutputter.outputString(xml);
    log.debug("Rule 6+7:" + json);
    Assert.assertEquals("{\"alice\":{\"$\":\"bob\",\"@xmlns\":{\"$\":\"http:\\/\\/some-namespace\"}}}", json);
  }

  /**
   * Rule 8: Other namespaces go in other properties of @xmlns.
   * <alice xmlns="http:\/\/some-namespace" xmlns:charlie="http:\/\/some-other-namespace">bob</alice>
   * becomes
   * {"alice":{"$":"bob","@xmlns":{"$":"http:\/\/some-namespace","charlie":"http:\/\/some-other-namespace"}}}
   */
  @Test
  public void testBadgerfishRule8() throws Exception {
    String xml = "<alice xmlns=\"http://some-namespace\" xmlns:charlie=\"http://some-other-namespace\">bob</alice>";
    String json = jsonOutputter.outputString(xml);
    log.debug("Rule 8:" + json);
    Assert.assertEquals("{\"alice\":{\"$\":\"bob\",\"@xmlns\":{\"$\":\"http:\\/\\/some-namespace\",\"charlie\":\"http:\\/\\/some-other-namespace\"}}}", json);
  }

  /**
   * Rule 9: Elements with namespace prefixes become object properties, too.
   * <alice xmlns="http://some-namespace" xmlns:charlie="http://some-other-namespace"> <bob>david</bob> <charlie:edgar>frank</charlie:edgar> </alice>
   * becomes
   * {"alice":{"bob":{"$":"david","@xmlns":{"$":"http:\/\/some-namespace","charlie":"http:\/\/some-other-namespace"}},"charlie:edgar":{"$":"frank","@xmlns":{"$":"http:\/\/some-namespace","charlie":"http:\/\/some-other-namespace"}},"@xmlns":{"$":"http:\/\/some-namespace","charlie":"http:\/\/some-other-namespace"}}}
   */
  @Test
  public void testBadgerfishRule9() throws Exception {
    String xml = "<alice xmlns=\"http://some-namespace\" xmlns:charlie=\"http://some-other-namespace\"> <bob>david</bob> <charlie:edgar>frank</charlie:edgar> </alice>";
    String json = jsonOutputter.outputString(xml);
    log.debug("Rule 9:" + json);
    Assert.assertEquals("{\"alice\":{\"bob\":{\"$\":\"david\",\"@xmlns\":{\"$\":\"http:\\/\\/some-namespace\",\"charlie\":\"http:\\/\\/some-other-namespace\"}},\"charlie:edgar\":{\"$\":\"frank\",\"@xmlns\":{\"$\":\"http:\\/\\/some-namespace\",\"charlie\":\"http:\\/\\/some-other-namespace\"}},\"@xmlns\":{\"$\":\"http:\\/\\/some-namespace\",\"charlie\":\"http:\\/\\/some-other-namespace\"}}}", json);
  }
}

I did not know much about the Badgerfish convention until quite recently. This move towards being able to generate XML into a standard JSON format seems really cool, and I wonder if it is widely used. Frankly, given that so many Java applications use XML and JSON, I was hoping to just snag the code from the net, rather than have to write it myself. If you use Java and generate JSON using the Badgerfish convention, I would love to know of alternative approaches you may be using for the conversion.

11 comments:

  1. I use the following style sheet with XSLT to translate XML to JSON in java. Works pretty well:

    http://www.bramstein.nl/xsltjson/

    ReplyDelete
  2. Thanks very much, Erik, I'll check out the link.

    ReplyDelete
  3. Could you please clarify the licensing terms for this code? I am about to write a similar code and I would love to avoid rewriting it again since this code already exists.

    ReplyDelete
  4. Hi Krokodil, you are welcome to use the code if it is of use to you. There are no licensing terms attached, except maybe attribution if you are using this as part of some open source project.

    ReplyDelete
  5. Thank you very much and congratulations for this class (and class test): this is exactly what I was looking for my project.

    ReplyDelete
  6. Great work sujit. This is what i'm looking for. Do you have similar code for JSON to XML conversion.

    Again thanks for this.

    Regards,
    Siva

    ReplyDelete
  7. Thanks for the kind words Siva. I haven't built one for JSON to XML (I don't need one), but it should be simple to do using Jackson to convert the JSON to a Java object and then XStream from Java object to XML.

    ReplyDelete
  8. This is workin perfectly fine
    used to convert normal xml to a json of badgerfish style
    xml tags are denoted with $
    xml attribute are denoted with @

    ReplyDelete
  9. Very usefull to convert xml to badgerfish json

    ReplyDelete
  10. Thank you Arshad for the confirmation!

    ReplyDelete

Comments are moderated to prevent spam.