Saturday, August 02, 2008

Parsing custom modules with ROME Fetcher

Last week, I described a fairly basic feed fetcher written using ROME's Fetcher library. My intent is provide clients of our RSS 2.0 based API a convenient way to access the XML as a Java object, negating the need for XML parsing on their end. Our API does return standard RSS 2.0, but we add extra information using a custom module at both the feed and the entry level, which the implementation described in the last post could not parse out as a module. Instead, it treated it as foreign markup, which a client could parse out using the following snippet.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
  SyndFeed feed = client.execute(...);
  List<SyndEntry> entries = feed.getEntries();
  for (SyndEntry entry : entries) {
    ...
    List<Element> foreignMarkups = (List<Element>) entry.getForeignMarkup();
    for (Element foreignMarkup : foreignMarkups) {
      if (foreignMarkup.getNamespaceURI().equals(MyModule.URI)) {
        // we got our custom module, now parse it
        if (foreignMarkup.getName().equals("score")) {
          // extract and populate the value of score
          float score = Float.valueOf(foreignMarkup.getValue());
        }
        ...
      }
    }
  }

This is a workable solution, but not ideal. What I would like is for clients to be able to get a reference to the custom module by URI, then use the getters and setters defined on the module to populate their objects. This post describes the changes I had to make to get this functionality to work.

ROME depends on a plug-in mechanism that is driven by the rome.properties file. ROME comes with one built in, but it can be overriden by placing one's custom rome.properties file at the root of the classpath. So here is my rome.properties file for reference.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# rome.properties

rss_2.0my.feed.ModuleGenerator.classes=\
com.sun.syndication.io.impl.DCModuleGenerator \
com.sun.syndication.io.impl.SyModuleGenerator \
com.sun.syndication.feed.module.opensearch.impl.OpenSearchModuleGenerator \
com.mycompany.myapp.mymodule.MyModuleGenerator

rss_2.0my.feed.ModuleParser.classes=\
com.sun.syndication.io.impl.DCModuleParser \
com.sun.syndication.io.impl.SyModuleParser \
com.sun.syndication.feed.module.opensearch.impl.OpenSearchModuleParser \
com.mycompany.myapp.mymodule.MyModuleParser

rss_2.0my.item.ModuleParser.classes=\
com.mycompany.myapp.mymodule.MyModuleParser

rss_2.0my.item.ModuleGenerator.classes=\
com.mycompany.myapp.mymodule.MyModuleGenerator

WireFeedParser.classes=\
com.sun.syndication.io.impl.RSS090Parser \
com.sun.syndication.io.impl.RSS091NetscapeParser \
com.sun.syndication.io.impl.RSS091UserlandParser \
com.sun.syndication.io.impl.RSS092Parser \
com.sun.syndication.io.impl.RSS093Parser \
com.sun.syndication.io.impl.RSS094Parser \
com.sun.syndication.io.impl.RSS10Parser  \
com.sun.syndication.io.impl.RSS20wNSParser  \
com.sun.syndication.io.impl.RSS20Parser  \
com.sun.syndication.io.impl.Atom10Parser \
com.sun.syndication.io.impl.Atom03Parser \
com.mycompany.myapp.mymodule.MyRss20Parser

WireFeedGenerator.classes=\
com.sun.syndication.io.impl.RSS090Generator \
com.sun.syndication.io.impl.RSS091NetscapeGenerator \
com.sun.syndication.io.impl.RSS091UserlandGenerator \
com.sun.syndication.io.impl.RSS092Generator \
com.sun.syndication.io.impl.RSS093Generator \
com.sun.syndication.io.impl.RSS094Generator \
com.sun.syndication.io.impl.RSS10Generator  \
com.sun.syndication.io.impl.RSS20Generator  \
com.sun.syndication.io.impl.Atom10Generator \
com.sun.syndication.io.impl.Atom03Generator \
com.mycompany.myapp.mymodule.MyRss20Generator

Converter.classes=\
com.sun.syndication.feed.synd.impl.ConverterForAtom10 \
com.sun.syndication.feed.synd.impl.ConverterForAtom03 \
com.sun.syndication.feed.synd.impl.ConverterForRSS090 \
com.sun.syndication.feed.synd.impl.ConverterForRSS091Netscape \
com.sun.syndication.feed.synd.impl.ConverterForRSS091Userland \
com.sun.syndication.feed.synd.impl.ConverterForRSS092 \
com.sun.syndication.feed.synd.impl.ConverterForRSS093 \
com.sun.syndication.feed.synd.impl.ConverterForRSS094 \
com.sun.syndication.feed.synd.impl.ConverterForRSS10  \
com.sun.syndication.feed.synd.impl.ConverterForRSS20 \
com.mycompany.myapp.mymodule.ConverterForMyRss20

You will notice that I am using a custom version of RSS 2.0 called rss_2.0my. The reason for this is ROME SyndEntry does not have a way to set the rss/channel/item/source element, which I handled by extending ROME. Additionally, we use a custom module MyModule that carries information that is not possible to send using standard RSS. We also use the Amazon OpenSearch and the Content modules.

We have used ROME for a while to successfully generate the feeds, but this was the first time we were looking at eating our own dog food, as it were. I am not sure if our setup is that uncommon, but either there are gaps in the ROME parsing code or we are doing something wrong. If you have been through this and solved this more cleanly, your comments and suggestions would be appreciated.

Change to ROME: FeedParsers.java

Currently, the code for FeedParsers.getParserFor(Document) loops through the parsers defined for WireFeedParser.classes in rome.properties. Each parser's isMyType() method is invoked, and if satisfied, the first parser is selected.

The problem is that the RSS20Parser.isMyType(Document) is too loose, all it does is verify that the root element is "rss" and the value of rss.@version startswith "2.0". So what is selected is the RSS20Parser, which is not what I want, and besides all my modules are registered to the feed type rss-2.0my, which is what is supported by MyRss20Parser.

So I needed it to keep going until it found the last matched parser, so I made the method sticky. An alternative approach would be to traverse the list backwards, since the selection would then go from the most specific parser to the least specific. Here is my code for FeedParsers.getParserFor(Document).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    public WireFeedParser getParserFor(Document document) {
        List parsers = getPlugins();
        WireFeedParser selectedParser = null;
        for (int i=0;i < parsers.size();i++) {
            WireFeedParser parser = (WireFeedParser) parsers.get(i);
            if (parser.isMyType(document)) {
                selectedParser = parser;
            }
        }
        return selectedParser;
    }

Change to MyRss20Parser.java

I added a isMyType(Document) method to my custom RSS20Parser so it does not call the superclass's isMyType(). It does piggyback on RSS20Parser.isMyType() to figure out if it RSS 2.0, and if so, whether it contains the namespace for MyModule.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  public boolean isMyType(Document document) {
    boolean isValidRss20 = super.isMyType(document);
    boolean isValidMyRss20 = false;
    if (isValidRss20) {
      Element rssRoot = document.getRootElement();
      List<Namespace> namespaces = rssRoot.getAdditionalNamespaces();
      for (Namespace namespace : namespaces) {
        if (namespace.getURI().equals(MyModule.URI));
        isValidMyRss20 = true;
        break;
      }
    }
    return isValidRss20 && isValidMyRss20;
  }

The changes to ROME their CVS versions downloaded on 2008-07-23. I plan on submitting a patch for the changes, so hopefully it will be available in future releases of the software. Unless, of course, there is a workaround, which I would be happy to use.

Additionally, I am using the ROME Fetcher code from CVS. Unlike the 0.9 version available at the time of this writing, the CVS version has support for configurable connection and read timeouts, which I wanted to provide. But I already talked about that in the previous post.

With these changes, I am finally able to get results using the following code on my client application.

4 comments (moderated to prevent spam):

niklas2k2 said...

Hi, Sujit
I have read your posts on the Rome API and found them very helpful. I ran into the same problem as you did, not beeing able to parse my own custom modules. I can parse item custom modules but not at feed level. This seems very strange, the API seems to be built in the same way for feed modules and item modules.

This is how I configured feed module parsing in the rome.properties file:

# Parsers for RSS 2.0 (w/NS) feed modules
#
rss_2.0wNS.feed.ModuleParser.classes=com.sun.syndication.io.impl.DCModuleParser \
com.creuna.calendar.syndication.CalendarFeedModuleParser


and


# Parsers for RSS 2.0 feed modules
#
rss_2.0.item.ModuleParser.classes=com.sun.syndication.io.impl.DCModuleParser \
com.creuna.calendar.syndication.CalendarFeedModuleParser

Have you solved this problem or is the only way to use a whole new custom FeedParser?

/
Best Regards
Niklas

Sujit Pal said...

Hi niklas2k2, glad the posts helped. To answer your question, though, no, I did not have a better solution, the solution I ended up with was the custom WireFeedParser as I described in the blog. Fortunately, once you set up the infrastructure to direct the WireFeedParser to the feed ModuleParser classes for your custom variant, you are unlikely to have to touch it (so you can pretty much forget about it), since new custom tags will be handled in the feed.ModuleParser class.

Anonymous said...

Can u pls post the MyModule classes so that i can take a look.

Sujit Pal said...

Hi Anonymous, you can find the code for my custom modules here and here.