Saturday, October 13, 2007

Custom Modules with ROME

I have been looking at ROME lately. ROME is a very popular open-source Java based RSS/Atom Feed parsing and generation library, originally developed by a group of Sun developers. I originally looked at ROME as a way to parse external feeds. Although the feeds were either in RSS or Atom, there are various versions of both RSS and Atom, which are quite different from each other in subtle ways. ROME abstracts away all the differences, allowing you to treat them as high level Java objects. ROME uses JDOM, my favorite XML parsing library, to do the parsing and building of XML behind the scenes.

My first application using ROME was to parse and aggregate a bunch of external feeds, and was ridiculously simple, about half a page of Java code. As Mark Woodman says in his article "Rome in a Day: Parse and Publish Feeds in Java" on XML.com, the sheer variety of RSS and Atom flavors are enough to make a grown programmer cry. However, the way in which ROME abstracts away all these variations and the simplicity of my resulting code almost made me cry... tears of joy.

However, given that most of the "smarts" of the application would be in the selection of the feeds themselves, and since that required domain expertise I did not have, I decided to put that project aside for a while and explore the other part of ROME, trying to use it to build a feed instead. In any case, in retrospect, I would probably be looking at using the rome-fetcher subproject instead to build the aggregator, since that already provides code to build "well-behaved" feed fetchers.

The feed I choose to rebuild was an existing RSS 2.0 feed. It was generated using JSP XML templates powered by Java services. As such, there were many custom extensions built in, which were not part of any of the standard modules that ROME uses. So I basically had to build a custom extension module for ROME to support the custom tags that this feed used. This post describes the (pretty simple) process.

I could not find any place where this process was described in sufficient newbie detail for me, so after quickly looking through the RSS/Atom/ROME books on Amazon, I settled on "RSS and Atom in Action" (RAIA) by Dave Johnson. One reason for my choice was that it was published by Manning and they provide downloadable PDF versions free with their printed book, so I got my book about 15 minutes after I ordered from my home computer. As it turned out, I was done reading the PDF version by the time the print edition arrived about 5 days later. The print version sits unopened on my desk for now, but I am sure I will need it some day.

But enough idle chatter. Lets get right down to building a custom module that supports three of my custom tags, called my:tag, my:tagDate and my:isTagged respectively. The first is a String, the second a Date and the third a Boolean.

Each module has four components that needs to be built and hooked up with ROME. An interface that describes the URI for the namespace in which the custom tags will live and the getters and setters for each custom tag supported, an implementation of that interface, a parser and a generator. They are shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// MyModule.java
package com.mycompany.feeds.modules.mymodule;

import java.util.Date;

import com.sun.syndication.feed.module.Module;

public interface MyModule extends Module {
  
  public static final String URI = "http://www.my.com/spec";

  public String getTag();
  public void setTag(String tag);
  public Date getTagDate();
  public void setTagDate(Date tagDate);
  public Boolean getIsTagged();
  public void setIsTagged(Boolean isTagged);
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// MyModuleImpl.java
package com.mycompany.feeds.modules.mymodule;

import java.util.Date;

import com.sun.syndication.feed.module.ModuleImpl;

public class MyModuleImpl extends ModuleImpl implements MyModule {

  private static final long serialVersionUID = -8275118704842545845L;

  private Boolean isTagged;
  private Date tagDate;
  private String tag;
  
  // boilerplate code. Eclipse will generate all but the constructor but
  // will keep reporting an error until you do it.
  public MyModuleImpl() {
    super(MyModule.class, MyModule.URI);
  }

  public void copyFrom(Object obj) {
    MyModule module = (MyModule) obj;
    setTag(module.getTag());
    setTagDate(module.getTagDate());
    setIsTagged(module.getIsTagged());
  }

  public Class getInterface() {
    return MyModule.class;
  }

  // getter and setter impls for MyModule interface
  public Boolean getIsTagged() {
    return isTagged;
  }

  public String getTag() {
    return tag;
  }

  public Date getTagDate() {
    return tagDate;
  }

  public void setIsTagged(Boolean isTagged) {
    this.isTagged = isTagged;
  }

  public void setTag(String tag) {
    this.tag = tag;
  }

  public void setTagDate(Date tagDate) {
    this.tagDate = tagDate;
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// MyModuleGenerator.java
package com.mycompany.feeds.modules.mymodule;

import java.text.SimpleDateFormat;
import java.util.Collections;
import java.util.HashSet;
import java.util.Set;

import org.jdom.Element;
import org.jdom.Namespace;

import com.sun.syndication.feed.module.Module;
import com.sun.syndication.io.ModuleGenerator;

public class MyModuleGenerator implements ModuleGenerator {

  // boilerplate code
  private static final Namespace NAMESPACE = Namespace.getNamespace("my", MyModule.URI);
  private static final Set NAMESPACES;
  static {
    Set<Namespace> namespaces = new HashSet<Namespace>();
    namespaces.add(NAMESPACE);
    NAMESPACES = Collections.unmodifiableSet(namespaces);
  }

  public String getNamespaceUri() {
    return MyModule.URI;
  }

  public Set getNamespaces() {
    return NAMESPACES;
  }

  // Implements the module generation logic
  private final SimpleDateFormat dateFormat = new SimpleDateFormat("MM/dd/yyyy");

  public void generate(Module module, Element element) {
    MyModule myModule = (MyModule) module;
    if (myModule.getTag() != null) {
      Element myElement = new Element("tag", NAMESPACE);
      myElement.setText(myModule.getTag());
      element.addContent(myElement);
    }
    if (myModule.getTagDate() != null) {
      Element myElement = new Element("tagDate", NAMESPACE);
      myElement.setText(dateFormat.format(myModule.getTagDate()));
      element.addContent(myElement);
    }
    if (myModule.getIsTagged() != null) {
      Element myElement = new Element("isTagged", NAMESPACE);
      myElement.setText(String.valueOf(myModule.getIsTagged()));
      element.addContent(myElement);
    }
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// MyModuleParser.java
package com.mycompany.feeds.modules.mymodule;

import java.text.ParseException;
import java.text.SimpleDateFormat;

import org.jdom.Element;
import org.jdom.Namespace;

import com.sun.syndication.feed.module.Module;
import com.sun.syndication.io.ModuleParser;

public class MyModuleParser implements ModuleParser {

  // boilerplate
  public String getNamespaceUri() {
    return MyModule.URI;
  }

  // implements the parsing for MyModule
  private final SimpleDateFormat dateFormat = new SimpleDateFormat("MM/dd/yyyy");

  public Module parse(Element element) {
    Namespace myNamespace = Namespace.getNamespace(MyModule.URI);
    MyModule myModule = new MyModuleImpl();
    if (element.getNamespace().equals(myNamespace)) {
      if (element.getName().equals("tag")) {
        myModule.setTag(element.getTextTrim());
      }
      if (element.getName().equals("tagDate")) {
        try {
          myModule.setTagDate(dateFormat.parse(element.getTextTrim()));
        } catch (ParseException e) {
          // don't set it if bad date format
        }
      }
      if (element.getName().equals("isTagged")) {
        myModule.setIsTagged(Boolean.valueOf(element.getTextTrim()));
      }
    }
    return myModule;
  }
}

To let ROME know that these modules should be used, we need to create a rome.properties file in our classpath. The ROME JAR file already has rome.properties files within it that controls its default configuration, and it will read our rome.properties in addition to its own configuration. Our three tags are all item level tags, so we will need to only map the parsers to the item level. The RAIA book shows you how to map feed level modules as well, but the process is quite similar. Here is my rome.properties file (it would be located in src/main/resources in a Maven2 project). I am building an RSS 2.0 feed, so I map it to that dialect here.

1
2
3
4
5
6
# rome.properties
rss_2.0.item.ModuleParser.classes=\
com.mycompany.feeds.modules.mymodule.MyModuleParser

rss_2.0.item.ModuleGenerator.classes=\
com.mycompany.feeds.modules.mymodule.MyModuleGenerator

Finally, here is how I would call it from within my code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
  @Test
  public void testFeedWithMyModule() throws Exception {
    // create the feed object
    SyndFeed feed = new SyndFeedImpl();
    feed.setFeedType("rss_2.0");
    feed.setTitle("My test feed");
    ...
    // add the MyModule namespace to the feed
    feed.getModules().add(new MyModuleImpl());

    // create the item
    SyndEntry entry = new SyndEntryImpl();
    ...
    // create the module, populate and add to the item
    MyModule myModule = new MyModuleImpl();
    myModule.setTag("tagValue");
    myModule.setTagDate(new Date());
    myModule.setIsTagged(true);
    entry.getModules().add(myModule);
    ...
    // add entry(s) to the feed
    feed.getEntries().add(entry);
    // print it out
    WireFeedOutput output = new WireFeedOutput();
    WireFeed wireFeed = feed.createWireFeed("rss_2.0");
    log.debug(output.outputString(wireFeed));
  }

As you can see, adding custom modules to ROME is a bit involved, but its really quite simple. Before I started on these two projects, I did not know much about all the various flavors of ROME and about these custom extensions. In fact, this is the first time I have used Namespaces in JDOM. However, Dave Johnson's book provides a lot of background information and a lot of nice examples in Java and C#. I would highly recommend it to anyone who needs to get up to speed quickly with ROME and RSS/Atom. Another very informative article is the article "ROME: Escape syndication hell, a Java developer's perspective" written by two of the original developers of ROME.

33 comments (moderated to prevent spam):

Pranav said...

Hi,

I found your article really helpful.I'm working on a research project wherein I need to generate RSS feed as one of the output.

I recently ran into a problem though.I created the module,generator & parser files and stored their references in rome.properties file as mentioned. It works fine for one project.

But when I try to run a simple code to generate RSS in a different project(in the same application server environment),it still searches for the generator file of a different project and the program fails as a result.

Here's what I'm trying to do:

SyndFeed feed = new SyndFeedImpl();
feed.setFeedType(FEED_TYPE);
feed.setTitle("RESTful Interface");
feed.setDescription("This feed uses Rome to return the XML in ATOM or RSS");

List entries = new ArrayList();
SyndEntry entry = null;
SyndContent description = null;

for(String s:arr){
entry = new SyndEntryImpl();
description = new SyndContentImpl();
description.setType("text/plain");
description.setValue(s);
entry.setDescription(description);
entries.add(entry);
}

feed.setEntries(entries);

SyndFeedOutput output = new SyndFeedOutput();

// Heres where the program fails
return output.outputString(feed);

It gives the following error:
13:02:33,495 ERROR [STDERR] Caused by: java.lang.RuntimeException: could not instantiate plugin com.sun.syndication.io.impl.RSS090Generator

This may look weird but I'm not able to resolve this problem.It might be related to a classpath problem but I'm not sure.

Any inputs will be valuable.

Thanks,
Pranav

Sujit Pal said...

Hi Pranav, I am looking at the code you provided, and I don't see any references to your custom module. If you have no custom modules, then you don't need your own rome.properties, you can use the one that comes packaged with rome.jar (jar tvf rome.jar | grep rome.properties). If you do supply it, for it to be used, it should be on your classpath /ahead/ of rome.properties. I do this by putting my rome.properties in src/main/resources which comes ahead of the ~/.m2/repository libs. So I am guessing that may be the problem you are seeing.

It's actually a good idea to take a look at the packaged rome.properties, it provides you insight about the ROME properties overriding mechanism. I found it helpful when I did another customization (don't remember details, but it's described in one of my posts).

Sujit Pal said...

Also in your code:
SyndFeedOutput output = new SyndFeedOutput();
// Heres where the program fails
return output.outputString(feed);

I generally use this idiom (not saying your code is wrong, just that its different from what I use).

WireFeedOutput output = new WireFeedOutput();
WireFeed wirefeed = feed.createWireFeed(FEED_TYPE);
return output.outputString(wirefeed);

Anonymous said...

Good work,

your examples helped me out a lot.

regards
Jon

Sujit Pal said...

Thanks, Jon.

Anonymous said...

You want want to look at vtd-xml, the latest and most advanced parsing/indexing/XPath engine that offers a lot of cool features

vtd-xml

Sujit Pal said...

Thanks for the pointer. I guess vtd-xml is getting quite popular, going by the number of comments about it on a number of my posts (unless you are doing this multiple times - if you are, please stop, since you are doing a disservice to both our sites).

I did look at it, although not really in depth, in connection with another XML parsing (of a very huge file). I ultimately decided to use StAX, only because I am more familiar with it, and because my XML contains namespaces. vtd-xml seems to be optimized to very large but simple XML files, but I could be wrong. In any case, vtd-xml, at least for the moment, is not for me.

Regarding using it here, that wouldn't work at all, since ROME is built on JDOM as its internal parser, and any customization would necessarily have to be done using it.

Vetrik said...

Thanks for code and I tried the one .. while executing I see the below error. Any help is appreciated.

Exception in thread "main" java.lang.NullPointerException
at com.sun.syndication.feed.synd.impl.ConverterForRSS092.createRSSItem(ConverterForRSS092.java:126)
at com.sun.syndication.feed.synd.impl.ConverterForRSS093.createRSSItem(ConverterForRSS093.java:46)
at com.sun.syndication.feed.synd.impl.ConverterForRSS094.createRSSItem(ConverterForRSS094.java:105)
at com.sun.syndication.feed.synd.impl.ConverterForRSS090.createRSSItems(ConverterForRSS090.java:149)
at com.sun.syndication.feed.synd.impl.ConverterForRSS090.createRealFeed(ConverterForRSS090.java:129)
at com.sun.syndication.feed.synd.impl.ConverterForRSS091Userland.createRealFeed(ConverterForRSS091Userland.java:106)
at com.sun.syndication.feed.synd.impl.ConverterForRSS094.createRealFeed(ConverterForRSS094.java:90)
at com.sun.syndication.feed.synd.impl.ConverterForRSS090.createRealFeed(ConverterForRSS090.java:104)
at com.sun.syndication.feed.synd.SyndFeedImpl.createWireFeed(SyndFeedImpl.java:231)
at TestRssFeed.main(TestRssFeed.java:43)

Sujit Pal said...

Hi Vetrik, I just tried looking at the source for the ConverterForRSS092.java:126 but there are only 121 lines in there, I am using rome-0.9. I would suggest taking a look at the source for this class in your distribution - the reason for the NPE is (usually) quite easy to find.

Jeremy T said...

Many thanks for this. Your code turned a long, difficult job into a short one.
FYI, I found a problem with the parser code (I'm using Rome v1.0) and had to use a different idiom, as follows:

public Module parse(Element element) {

PrismModule p = new PrismModuleImpl();

if ( ! "item".equals(element.getName())) {
return p;
}

Element issn = element.getChild("issn", NAMESPACE);
if (issn != null) {
p.setIssn(issn.getTextTrim());
}
Element publicationName = element.getChild("publicationName", NAMESPACE);
if (publicationName != null) {
p.setPublicationName(publicationName.getTextTrim());
}
Element issue = element.getChild("issue", NAMESPACE);
if (issue != null) {
p.setIssue(issue.getTextTrim());
}
Element number = element.getChild("number", NAMESPACE);
if (number != null) {
p.setNumber(number.getTextTrim());
}
Element startingPage = element.getChild("startingPage", NAMESPACE);
if (startingPage != null) {
p.setStartingPage(startingPage.getTextTrim());
}
Element volume = element.getChild("volume", NAMESPACE);
if (volume != null) {
p.setVolume(volume.getTextTrim());
}
Element publicationDate = element.getChild("publicationDate", NAMESPACE);
if (publicationDate != null) {
p.setPublicationDate(publicationDate.getTextTrim());
}
Element person = element.getChild("person", NAMESPACE);
if (person != null) {
p.setPerson(person.getTextTrim());
}
Element references = element.getChild("references", NAMESPACE);
if (references != null) {
p.setReferences(references.getTextTrim());
}
return p;
}

Sujit Pal said...

Thanks Jeremy.

Anonymous said...

Nice post,

It accurately describes creating custom fields in a RSS feed, built by Rome.

Thanks,
Anthony

Sujit Pal said...

Thanks, Anthony.

The Spiker said...

Hi. When I use two custom modules in my rome.properties, the "generate" method invoked is that of the other module.

rss_2.0.item.ModuleParser.classes=com.ntsrv.stakeholder.rss.GenericInfoModuleParser, com.ntsrv.stakeholder.rss.RouteModuleParser

rss_2.0.item.ModuleGenerator.classes=com.ntsrv.stakeholder.rss.GenericInfoModuleGenerator, com.ntsrv.stakeholder.rss.RouteModuleGenerator

When I use the GenericInfoModule, the "generate" method of RouteModuleGenerator is called.
Can you help me?

Sujit Pal said...

I suspect that you may have defined the same URI for both modules?

Anonymous said...

With these instructions I was able to generate my own custom RSS feed in no time. Thanks really for publishing this!

Sujit Pal said...

Thanks Anonymous, glad it helped you.

Anonymous said...

Great article! The only one dealing with this topic on the web that I could find. Thanks Sujit.

Sujit Pal said...

You are welcome, glad it helped you.

anca luca said...

Great job, thanks very much, beautifully done!

I used the example to do parsing of custom tags in items in a feed with Rome. Also I needed to use the modification proposed by Jeremy T with rome 1.0, but I did it a bit differently:

Namespace myNamespace = Namespace.getNamespace(MyModule.URI);
MyModule myModule = new MyModuleImpl();
// element is the item element, get all the children with the "my" namespace and build up the module from them
@SuppressWarnings("rawtypes")
Iterator eltsIterator = element.getDescendants(new ElementFilter(myNamespace));
while (eltsIterator.hasNext()) {
Element myElt = (Element) eltsIterator.next();
if (myElt.getName().equals(myModule.MYELT_TAGNAME)) {
myModule.setAbstract(myElt.getTextNormalize());
}
// test all other tagnames here
(...)
}

Sujit Pal said...

Thanks anca luca!

Dan Moore said...

Thanks so much for this article. Very helpful!

Sujit Pal said...

Thanks Dan, you are welcome.

Antonis said...

Thank you for your article!

I know it's old but it was exactly what i was looking for.

Sujit Pal said...

Hi Antonis, thanks and you are welcome.

amit goel said...

thanks.. it works like a charm... i just can't thnaking for such an easy integration tutorial..

write a book if you haven't written one till now.. u really know how to explain..

Sujit Pal said...

Thanks for the kind words Amit. It probably just looks that way because the subject is easy but under-documented, so all I did was figure it out for my use case and write about it. I guess the post hit a niche because of ROME's under-documented property :-).

amit goel said...

hi sujit,

just hit a problem my custom tags get generated and i can see them in the file..

but when i fetch it using SyndFeedInput , and pring the output on the browser.. my custom tags are missing in the output..

i use the following code for fetching the feed xml from file and pass it to the browser in an API ...

InputStream is = fetchArticleXMLfromS3(article.getArticleId());
InputSource ins = new InputSource(is);
SyndFeed itemFeed = new SyndFeedInput().build(ins);
WireFeedOutput output = new WireFeedOutput();
WireFeed wirefeed = itemFeed.createWireFeed();
response = output.outputString(wirefeed);

i am not able to understand when is it going wrong, if it is generated properly. it is not even being printed on console..

Sujit Pal said...

Hi Amit, I believe you just encountered the the next problem I faced :-). The next one you will probably encounter is on the parsing end so adding it here for your reference as well.

Amit Goel said...

Thanks Sujit,

i figured out the parsing as i could not wait for your reply as it was same as your next step :-) ..

by the way, thanks for the next to next step also which help me in understanding it better...

thanks for the tutorial.. u r a life saver.. :-)

by the way, i am assuming that it will work for atom feeds also just by changing feedType .. is it or not ?

Sujit Pal said...

Hi Amit, good to know your problem is solved. Regarding support for Atom, I haven't tried it (we only wanted to support RSS in the application). but it should just work like RSS - that is the promise of ROME after all.

Anonymous said...

Sujit,

I am using Rome1.0 I have followed the steps mentioned in the ROME tutorial and have confirmed my approach with your tutorial as well, however when I add a custom module to a an entry while creating a RSS1.0 feed I get the following error.

java.lang.NullPointerException
at com.sun.syndication.feed.module.impl.ModuleUtils.getModule(ModuleUtils.java:50)
at com.sun.syndication.feed.rss.Item.getModule(Item.java:371)
at com.sun.syndication.io.impl.RSS10Generator.populateItem(RSS10Generator.java:91)
at com.sun.syndication.io.impl.RSS090Generator.addItem(RSS090Generator.java:216)
at com.sun.syndication.io.impl.RSS090Generator.addItems(RSS090Generator.java:209)
at com.sun.syndication.io.impl.RSS090Generator.populateFeed(RSS090Generator.java:90)
at com.sun.syndication.io.impl.RSS090Generator.generate(RSS090Generator.java:56)
at com.sun.syndication.io.WireFeedOutput.outputJDom(WireFeedOutput.java:271)
at com.sun.syndication.io.WireFeedOutput.output(WireFeedOutput.java:211)
at com.sun.syndication.io.WireFeedOutput.output(WireFeedOutput.java:190)
at com.sun.syndication.io.SyndFeedOutput.output(SyndFeedOutput.java:134)

I do not use a maven project , I have put my rome.properties with the custom Module generator class name in the root of my non-maven Java project.

Can you please let me know why I get this error. How do I make sure that the custom module generator is being picked up by Rome.

Sujit Pal said...

Hi, the placement of rome.properties in your jar file is correct, it should be picked up if the jar is in the classpath. I looked at the code for ModuleUtils.java and it looks like the NPE is because the module is null or its URI is null. Running this code through a debugger may help to find out more.