I recently needed to view an XML file I had generated, to verify that the code worked right. Normally I would just view the file with Firefox using the file:// protocol, and Firefox would show me the document tree with its default XML rendering. But this file was quite large (about 800MB) and Firefox was not able to complete loading the file. It did not crash, but the keyboard and mouse became unresponsive and I had to manually kill Firefox from the command prompt.
So I figured that since Firefox is a general purpose browser, it was probably expecting too much to ask it to render such large files, and I would have better luck with software that was optimized to only render XML - in other words, an XML viewer. So I did a quick search for "XMLViewer" on Google, but came up empty in terms of software I could actually use. There appears to be many more XML Viewers for Windows world than for Linux. The only ones I came up with were KXMLViewer and gxmlviewer.
The screenshots for KXMLViewer look nice but the functionality was not exactly what I wanted. Gxmlviewer was tested on RedHat 7.1 and has probably not been updated since. I was unable to either install the RPM or build from source using the downloads on Fedora Core 3. I could probably have done it if I had tried a little harder, but I decided to pass.
Having failed to find tools to do what I wanted, I started wondering if I probably should just build it myself. Since I wanted to parse large XML files, using a SAX parser seemed to be the obvious choice. As a matter of fact, one of the reasons Firefox was crashing because it was trying to slurp in all the 800MB of data before trying to render it. It needs to do that because it allows you additional controls to expand and collapse elements. That functionality would be a nice-to-have for me, but I definitely did not need it. All I wanted was something that would format the XML (which was written for compactness) into something that I could read. Writing a SAX parser that formats the output into a nice indented document tree structures is practically one of the first examples you encounter when you read about SAX parsing. So building such a tool would be trivial.
To navigate, ie go up or down one or more pages can be done simply using the Linux utility less. The [SPACE] key moves the formatted output forward a page, and [CTRL-B] moves it back. To call the tool, specify the file name and the tabstop indent.
$ xmlcat.py my_really_large_xml_file.xml 4 | less
I wrote xmlcat.py in Python. Here is the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
#!/usr/bin/python # A simple SAX Parser to view large XML files as a nicely formatted XML # document tree. Pipe the output through less and move forward and backward # using [SPACE] and [CTRL-B] respectively. Standard less keyboard commands # will also work. # import string import sys from xml.sax import make_parser from xml.sax.handler import ContentHandler class PrettyPrintingContentHandler(ContentHandler): """ Subclass of the SAX ContentHandler to print document tree """ def __init__(self, indent): """ Ctor """ self.indent = indent self.level = 0 self.chars = '' def startElement(self, name, attrs): """ Set the level and print opening tag with attributes """ self.level = self.level + 1 attrString = "" qnames = attrs.getQNames() for i in range(0, len(qnames)): attrString = attrString + " " + qnames[i] + "=\"" + attrs.getValueByQName(qnames[i]) + "\"" print self.tab(self.level) + "<" + string.rstrip(name) + attrString + ">" def endElement(self, name): """ Print the characters and the closing tag """ if (len(string.strip(self.chars)) > 0): print self.tab(self.level + 1) + string.rstrip(self.chars) self.chars = '' print self.tab(self.level) + "</" + string.rstrip(name) + ">" self.level = self.level - 1 def characters(self, c): """ Accumulate characters, ignore whitespace """ if (len(string.strip(c)) > 0): self.chars = self.chars + c def tab(self, n): """ Print the tabstop for the current element """ tab = "" for i in range(1, n): for j in range(1, int(self.indent)): tab = tab + " " return tab def usage(): """ Print the usage """ print "Usage: xmlcat.py xml_file indent | less" print "Use [SPACE] and [CTRL-B] to move forward and backward" sys.exit(-1) def main(): """ Check the arguments, instantiate the parser and parse """ if (len(sys.argv) != 3): usage() file = sys.argv indent = sys.argv parser = make_parser() prettyPrintingContentHandler = PrettyPrintingContentHandler(indent) parser.setContentHandler(prettyPrintingContentHandler) parser.parse(file) if __name__ == "__main__": main()
As you can see, the code is quite trivial and based in large part on this DevShed article, but I am including it here anyway. It took me all of half an hour to write and test, so its not rocket science, but it may help you save half an hour when you are looking for a similar tool, and you stumble upon this page.