Saturday, July 26, 2008

A RSS Feed Client in Java

In his article, "Demystifying RESTful Data Coupling", Steve Vinoski says:

Developers who favor technologies that promote interface specialization typically raise two specific objections to the uniform-interface constraint designed into the Representational State Transfer (REST) architectural style. One is that different resources should each have specific interfaces and methods that more accurately reflect their precise functionality. The other objection to the concept of a uniform interface is that it merely shifts all coupling issues and other problems to the data exchanged between client and server.

We have faced similar concerns from clients of our RSS-2.0 based REST API. While the concerns are easier to address because our XML format is a well-known standard, and we can point them to several implementations of RSS feed parsers, such as Mark Pilgrim's Python Universal Feed Parser, the ROME Fetcher, or the Jakarta FeedParser, to name a few. In addition, because of the popularity of RSS, almost all major programming languages have built-in support or contributed modules to parse various flavors of RSS, so clients can usually find an off-the-shelf parser or toolkit that works well with their programming language of choice.

However, thinking through Steve Vinoski's comment a little more with reference to my particular context, I came up with the idea of using the ROME SyndFeed object as a Data Transfer Object (DTO). Since ROME is a popular project, its data structures are well documented, both on its own website and in various books such as Dave Johnson's "RSS and Atom in Action", client programmers can look at publicly available documentation to figure out how to convert the SyndFeed into objects that would be consumable by their application.

What makes the task easier is that ROME already has a Fetcher module, which takes care of the various nuances of parsing special headers from RSS feeds, local caching and such. While the generally available 0.9 release (at the time of this writing) does not have support for connection and read timeouts on the underlying HTTP client, the version in CVS (and probably releases following 0.9) would have this support, so I used that.

So what we would provide would be a "client library" consisting of a single class:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
// ApiClient.java
package com.healthline.feeds.client;

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLEncoder;
import java.util.Map;
import java.util.UUID;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.fetcher.FetcherException;
import com.sun.syndication.fetcher.impl.HashMapFeedInfoCache;
import com.sun.syndication.fetcher.impl.HttpClientFeedFetcher;
import com.sun.syndication.io.FeedException;

/**
 * Client for API. Based on the ROME FeedFetcher project.
 * Provides a single execute() method to point to the RSS based webservice.
 * The response is RSS 2.0 XML, which is converted into a SyndFeed object and 
 * returned to the caller to parse as needed.
 */
public class ApiClient {

  private final Log log = LogFactory.getLog(getClass());
  
  private URL serviceUrl;

  private HttpClientFeedFetcher fetcher = null;
  
  /**
   * Constructs a ApiClient instance.
   * @param serviceUrl the location of the service.
   * @param useLocalCache true if you want to cache responses locally.
   * @param connectTimeout the connection timeout (ms) for the network connection.
   * @param readTimeout the read timeout (ms) for the network connection.
   */
  public ApiClient(URL serviceUrl, boolean useLocalCache, int connectTimeout, 
      int readTimeout) {
    super();
    this.serviceUrl = serviceUrl;
    fetcher = new HttpClientFeedFetcher();
    fetcher.setUserAgent("MyApiClientFetcher-1.0");
    fetcher.setConnectTimeout(connectTimeout);
    fetcher.setReadTimeout(readTimeout);
    if (useLocalCache) {
      fetcher.setFeedInfoCache(HashMapFeedInfoCache.getInstance());
    }
  }
  
  /**
   * Executes a service request and returns a ROME SyndFeed object.
   *
   * @param methodName the methodName to execute.
   * @param params a Map of name value pairs.
   * @return a SyndFeed object.
   */
  public SyndFeed execute(String methodName, Map<String,String> params) {
    URL feedUrl = buildUrl(methodName, params);
    SyndFeed feed = null;
    try {
      feed = fetcher.retrieveFeed(feedUrl);
    } catch (FetcherException e) {
      throw new RuntimeException("Failed to fetch URL:[" + 
        feedUrl.toExternalForm() + "]. HTTP Response code:[" + 
        e.getResponseCode() + "]", e);
    } catch (FeedException e) {
      throw new RuntimeException("Failed to parse response for URL:[" + 
        feedUrl.toString() + "]", e);
    } catch (IOException e) {
      throw new RuntimeException("IO Error fetching URL:[" + 
        feedUrl.toString() + "]", e);
    }
    return feed;
  }

  /**
   * Convenience method to build up the request URL from the method name and
   * the Map of query parameters.
   * @param methodName the method name to execute.
   * @param params the Map of name value pairs of parameters.
   * @return
   */
  private URL buildUrl(String methodName, Map<String,String> params) {
    StringBuilder urlBuilder = new StringBuilder(serviceUrl.toString());
    urlBuilder.append("/").append(methodName);
    int numParams = 0;
    for (String paramName : params.keySet()) {
      String paramValue = params.get(paramName);
      if (StringUtils.isBlank(paramValue)) {
        continue;
      }
      try {
        paramValue = URLEncoder.encode(paramValue, "UTF-8");
      } catch (UnsupportedEncodingException e) {
        // will never happen, but just in case it does, we throw the error up
        throw new RuntimeException(e);
      }
      urlBuilder.append(numParams == 0 ? "?" : "&").
      append(paramName).
      append("=").
      append(paramValue);
      numParams++;
    }
    try {
      if (log.isDebugEnabled()) {
        log.debug("Requesting:[" + urlBuilder.toString() + "]");
      }
      return new URL(urlBuilder.toString());
    } catch (MalformedURLException e) {
      throw new RuntimeException("Malformed URL:[" + urlBuilder.toString() + "]", e);
    }
  }
}

All the client has to do is instantiate this class with the parameters, then execute the service command. This is completely generic, by the way, not tied to our API service in any way. As an example, I tried hitting the RSS feed for the National Public Radio (NPR) Top Stories Page with the test code below:

Based on our original requirement, the objective is to convert the SyndFeed object returned from the call to ApiClient.execute() to an appropriate user object. We call our user object SearchResult, and it is a POJO as shown below:

1
2
3
4
5
6
7
8
9
// SearchResult.java
public class SearchResult {

  private String title;
  private String url;
  private String summary;
  // auto-generated getters and setters removed for brevity
  ...
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// NprApiClient.java
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import com.mycompany.feeds.client.ApiClient;
import com.sun.syndication.feed.synd.SyndCategory;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;

public class NprApiClient {

  private static final String SERVICE_URL = "http://www.npr.org/rss";
  private static final boolean USE_CACHE = true;
  private static final int DEFAULT_CONN_TIMEOUT = 5000;
  private static final int DEFAULT_READ_TIMEOUT = 1000;
  
  private ApiClient apiClient;
  
  public NprApiClient() throws Exception {
    apiClient = new ApiClient(new URL(SERVICE_URL), USE_CACHE, DEFAULT_CONN_TIMEOUT, 
      DEFAULT_READ_TIMEOUT);
  }
  
  @SuppressWarnings("unchecked")
  public List<SearchResult> getTopStories() {
    Map<String,String> args = new HashMap<String,String>();
    args.put("id", "1001");
    SyndFeed feed = apiClient.execute("rss.php", args);
    List<SyndEntry> entries = feed.getEntries();
    List<SearchResult> results = new ArrayList<SearchResult>();
    for (SyndEntry entry : entries) {
      SearchResult result = new SearchResult();
      result.setTitle(entry.getTitle());
      result.setUrl(entry.getLink());
      result.setSummary(entry.getDescription().getValue());
      results.add(result);
    }
    return results;
  }
  
  public static void main(String[] args) {
    try {
      NprApiClient client = new NprApiClient();
      List<SearchResult> results = client.doTopStorySearch();
      for (SearchResult result : results) {
        System.out.println(result.getTitle());
        System.out.println("URL:" + result.getUrl());
        System.out.println(result.getSummary());
        System.out.println("--");
      }
    } catch (Exception e) {
      System.err.println(e.getMessage());
      throw new RuntimeException(e);
    }
  }
}
Here are the (partial) results from the run. I have truncated the results after the first few results but you get the idea.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
Housing Bill Clears Senate, Awaits Bush's Signature
URL:http://www.npr.org/templates/story/story.php?storyId=92964747&ft=1&f=1001
The Senate met in a rare Saturday session and gave final congressional approval to a wide-ranging 
housing bill.  The bill aims to bolster the sagging housing market and includes measures aimed at 
shoring up Fannie Mae and Freddie Mac. The president says he'll sign it when it reaches his desk, 
early next week.
--
What's The Deal With The XM-Sirius Merger?
URL:http://www.npr.org/templates/story/story.php?storyId=92960423&ft=1&f=1001
The FCC has approved the merger of XM and Sirius satellite radio after 17 months of behind-
the-scenes negotiations. While some critics have said the merger represents a monopoly, it 
appears that the two weak companies may be combining to form one weak company.
--
Military Tribunals Begin At Guantanamo
URL:http://www.npr.org/templates/story/story.php?storyId=92960420&ft=1&f=1001
The first war crimes trials since World War II started this week at Guantanamo Bay. Andrew 
McBride, a former Justice Department official, discusses the trials, as well as how Guantanamo's 
war crimes compare with those of 1945.
--
...

Although the above code is good enough for a standard RSS feed parsing client, I was not able to get results out of our custom tags (for our RSS-2.0 based API I spoke about earlier). I plan to investigate this, since we use a variety of open-source RSS custom modules (such as Amazon's OpenSearch as well as our own home-grown custom module to satisfy several data requirements that cannot be accommodated by standard RSS 2.0. Because of this, it is important for our clients to be able to parse out our custom module and its contents from the SyndFeed object.

I will investigate this on my own and write about it in a future post. From what I see so far, the ROME Fetcher is not passing the custom module information through in the SyndFeed object it parses out of the XML. It is possible that I am just missing some configuration piece that would enable it. In the meantime, if you happen to know how to do this, would really appreciate you letting me know.

Be the first to comment. Comments are moderated to prevent spam.