Our API provides a set of REST services over RSS 2.0. One of them is a content service, which allows you to call up a piece of content by URL or ID. So far, we have been serving single pieces of content - recently the need came up for serving server generated mashups which would be presented to an external client, who would then restyle and rearrange the components as they see fit. This means that we have to serve multiple pieces of content that form a coherent whole - in other words, structured hierarchical data.
The RSS 2.0 Specification does not provide any support for hierarchical data. I briefly describe the approaches that occurred to me, then focus on the one I liked the most, which I also suggested in response to a question on StackOverflow. Here they are:
- Serve the entire content inside a content:encoded tag.
- Use the old style content tags.
- Write a custom ROME module to support inlined item hierarchies.
- Provide structure information as a category tag.
The first approach is one we already use for our current content service. While the assumption is that the client would use the content as-is on their site, the content is provided as XHTML, so the client can build an XML parser to parse the content if they so desired.
I briefly looked at the old-style content tags, but the embedded RDF tags within the content:items and content:item tags appeared quite confusing. They seem to be used only in RSS 1.0 however, so not quite sure if they can even be used (via ROME) with RSS 2.0. Also, I kind of lost interest when I realized that you can only model a single level with this approach.
I also briefly dallied with the idea of writing my own custom module which would allow nesting of items within items. It would have been interesting to do, but it seemed like a lot of unnecessary work. Moreover, the resulting XML would not have rendered in a standard browser, and would have needed custom parsing on the client side as well. There is a proposal to extend Atom to do something similar, so the idea is probably not that far-fetched.
The last idea (which I am going to expand on) is to return each component of the mashup as a separate item, each with its own content:encoded tag containing the XHTML content for that component. Each item has a category tag with an XPath like value that indicates "where" this component is on the page. The nice thing with this approach is that this requires no additional modules, and the output is renderable in a web browser.
A Concrete Example
As an example, lets see how this approach would play out with a service that provided RSS feeds of pages on this blog. These pages consist of various components or sections - the dek, the Post component, the Tag Cloud component, the main content component, etc. The RSS for this would look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | <?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/"
<title>My feed title</title>
<description>Test for Hierarchical Data in RSS Feed</description>
<title>Scripting Luke with Javascript</title>
<description>For quite some time now, I have been looking...</description>
<category domain="path">/</category>
<title>Salmon Run</title>
<category domain="path">/dek</category>
<content:encoded><![CDATA[Swimming upstream...blah, blah, blah]]></content:encoded>
<category domain="path">/toc</category>
<content:encoded><![CDATA[<div class="widget-content">...</div>]]></content:encoded>
<category domain="path">/cloud</category>
<content:encoded><![CDATA[<div class='widget-content cloud-label-widget-content'>...</div>]]></content:encoded>
<title>About me</title>
<category domain="path">/section</category>
<content:encoded><![CDATA[I am a programmer...blah, blah, blah]]></content:encoded>
<media:content url="http://img.blogspot.com/path/to/my/picture.jpg" />
<title>Scripting Luke with Javascript</title>
<category domain="path">/content</category>
<content:encoded><![CDATA[<p>For quite some time now, I have been looking...</p>]]></content:encoded>
<category domain="path">/labels</category>
<content:encoded><![CDATA[javascript, lucene, luke, scripting, search]]></content:encoded>
<category domain="path">/toc/2010</category>
<category domain="path">/toc/2010/11</category>
<title>Scripting Luke with Javascript</title>
<category domain="path">/toc/2010/11/03</category>
<category domain="path">/toc/2010/03</category>
<title>Java REST Client for Ehcache server</title>
<category domain="path">/toc/2010/03/03</category>
<category domain="path">/toc/2009</category>
<category domain="path">/toc/2009/12</category>
<title>A Lucene POS Tagging Tokenfilter</title>
<category domain="path">/toc/2009/12/02</category>
The above XML was created from a bean which was manually populated with values. The XML generation code is not particularly interesting, but it is included below as part of the full code for this post. Real-life code would probably call multiple backing services to generate the bean, which would then be rendered into RSS by ROME.
Parsing this XML to extract the components is almost trivially simple. In this case, we are interested in the top level components, ie, those at level 1 of the tree. Our parsing code looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | @Test
public void testParseSectionStructure() throws Exception {
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
List<SyndEntry> entries = feed.getEntries();
CollectionUtils.filter(entries, new SectionFilterPredicate());
print(entries, false);
private class SectionFilterPredicate implements Predicate<SyndEntry> {
@Override public boolean evaluate(SyndEntry entry) {
String path = getCategoryValue(entry, "path");
return StringUtils.split(path, "/").length == 1;
and results in the following list. The print() method above only prints the title and the path for this test, but obviously the client can do whatever he wants with the SyndEntry object.
1 2 3 4 5 6 | [junit] Salmon Run (/dek)
[junit] Posts (/toc)
[junit] Labels (/cloud)
[junit] About me (/section)
[junit] Scripting Luke with Javascript (/content)
[junit] Share/Save (/labels)
What about multi-level elements such as the Posts element? Well, depending on whether the client will simply dump the component on his page or change it around, we can either provide the XHTML in a content:encoded tag or break it up into individual items. We have chosen to do both here, for illustration. To extract only the navigation elements, the parsing code could do something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | @Test
public void testParseNavigationStructure() throws Exception {
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
List<SyndEntry> entries = feed.getEntries();
CollectionUtils.filter(entries, new NavigationFilterPredicate());
Collections.sort(entries, new PathComparator());
print(entries, true);
private class NavigationFilterPredicate implements Predicate<SyndEntry> {
@Override public boolean evaluate(SyndEntry entry) {
String path = getCategoryValue(entry, "path");
return path.startsWith("/toc") &&
StringUtils.split(path, "/").length > 1;
private class PathComparator implements Comparator<SyndEntry> {
@Override public int compare(SyndEntry entry1, SyndEntry entry2) {
String[] paths1 =
StringUtils.split(getCategoryValue(entry1, "path"), "/");
String[] paths2 =
StringUtils.split(getCategoryValue(entry2, "path"), "/");
Integer y1 = paths1.length < 2 ?
Integer.MAX_VALUE : Integer.valueOf(paths1[1]);
Integer y2 = paths2.length < 2 ?
Integer.MAX_VALUE : Integer.valueOf(paths2[1]);
if (y1 == y2) {
Integer m1 = paths1.length < 3 ?
Integer.MAX_VALUE : Integer.valueOf(paths1[2]);
Integer m2 = paths2.length < 3 ?
Integer.MAX_VALUE : Integer.valueOf(paths2[2]);
if (m1 == m2) {
Integer d1 = paths1.length < 4 ?
Integer.MAX_VALUE : Integer.valueOf(paths1[3]);
Integer d2 = paths2.length < 4 ?
Integer.MAX_VALUE : Integer.valueOf(paths2[3]);
if (d1 == d2) {
String title1 = entry1.getTitle();
String title2 = entry2.getTitle();
return title1.compareTo(title2);
} else {
return d2.compareTo(d1);
} else {
return m2.compareTo(m1);
} else {
return y2.compareTo(y1);
Which results in the following output. Note that the sorting is really not required if the server lists them out in the correct order, I only add it in as an example of how simple it is. In our case, the sorting was a bit more complex since we have ordering information built into the path components and we need to sort in reverse chronological order - in most cases, a simple lexical sort would be all that would be required, even if the server did not return them in the correct order (which would be a really strange and non-intuitive thing for the server to not do).
1 2 3 4 5 6 7 8 | [junit] ..2010 (/toc/2010)
[junit] ...November (/toc/2010/11)
[junit] ....Scripting Luke with Javascript (/toc/2010/11/03)
[junit] ...March (/toc/2010/03)
[junit] ....Java REST Client for Ehcache server (/toc/2010/03/03)
[junit] ..2009 (/toc/2009)
[junit] ...December (/toc/2009/12)
[junit] ....A Lucene POS Tagging Tokenfilter (/toc/2009/12/02)
Equally simple is the ability to manipulate the structure. For example, the client may want to only show the navigation elements for the current year posts. To do that, the code would look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | @Test
public void testParseNavigationStructure() throws Exception {
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
List<SyndEntry> entries = feed.getEntries();
CollectionUtils.filter(entries, new AndPredicate(
new NavigationFilterPredicate(), new CurrentYearFilterPredicate()));
print(entries, true);
private class CurrentYearFilterPredicate implements Predicate<SyndEntry> {
private int currentYear;
public CurrentYearFilterPredicate() {
Calendar cal = Calendar.getInstance();
this.currentYear = cal.get(Calendar.YEAR);
@Override public boolean evaluate(SyndEntry entry) {
String[] pathElements =
StringUtils.split(getCategoryValue(entry, "path"), "/");
if (pathElements.length >= 2) {
return currentYear == Integer.valueOf(pathElements[1]);
return false;
Which results in the following output.
1 2 3 4 5 | [junit] ..2010 (/toc/2010)
[junit] ...November (/toc/2010/11)
[junit] ....Scripting Luke with Javascript (/toc/2010/11/03)
[junit] ...March (/toc/2010/03)
[junit] ....Java REST Client for Ehcache server (/toc/2010/03/03)
I personally think that the approach of "marking up" RSS item tags to indicate the location is quite simple and elegant. It does not require any extension or modification of the RSS specification, just a shared understanding about a single tag between server and client. But since I came up with it, I'm probably not being totally objective.
In my example, I have used ROME on both the server (XML generation) and client (XML parsing) ends to build and parse the RSS feed (as I would in real life too). Obviously, there is no guarantee that clients would use ROME (although it would simplify their lives enormously if they did). But the parsing stuff I talked about above can be done very easily with almost any XML parsing toolkit.
What do you think? Do you think the burden imposed on this client with this approach is excessive? Would it be better to come up with a custom microformat that handles this sort of data instead of using RSS? Would you have tackled the problem differently? Would appreciate hearing your concerns or if you have better/different ways of solving this problem.
Full Code
The full code for the above is modelled as a single JUnit test class, which is shown below. The interesting parts of the code has already been discussed above, this is here only for completeness.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 | package com.mycompany.feeds.hierarchicaldata;
import java.io.File;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import org.apache.commons.collections15.CollectionUtils;
import org.apache.commons.collections15.Predicate;
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.Test;
import com.sun.syndication.feed.WireFeed;
import com.sun.syndication.feed.module.content.ContentModule;
import com.sun.syndication.feed.module.content.ContentModuleImpl;
import com.sun.syndication.feed.module.mediarss.MediaEntryModuleImpl;
import com.sun.syndication.feed.module.mediarss.types.MediaContent;
import com.sun.syndication.feed.module.mediarss.types.UrlReference;
import com.sun.syndication.feed.synd.SyndCategory;
import com.sun.syndication.feed.synd.SyndCategoryImpl;
import com.sun.syndication.feed.synd.SyndContent;
import com.sun.syndication.feed.synd.SyndContentImpl;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndEntryImpl;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.feed.synd.SyndFeedImpl;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.WireFeedOutput;
public class HierarchyDataTest {
private final Log log = LogFactory.getLog(getClass());
public void testBuild() throws Exception {
Page page = buildTestPage();
SyndFeed feed = buildEmptyFeed();
// main entry
SyndEntry mainEntry = new SyndEntryImpl();
setDescription(mainEntry, page.description);
addCategory(mainEntry, "path", "/");
addEntry(feed, mainEntry);
// sections
for (Section section : page.sections) {
SyndEntry entry = new SyndEntryImpl();
setContent(entry, section.content);
setImage(entry, section.imageUrl);
addCategory(entry, "path", section.path);
addEntry(feed, entry);
WireFeedOutput output = new WireFeedOutput();
WireFeed wirefeed = feed.createWireFeed();
FileUtils.writeStringToFile(new File("/tmp/hierarchy_test.xml"),
output.outputString(wirefeed), "UTF-8");
public void testParseSectionStructure() throws Exception {
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
List<SyndEntry> entries = feed.getEntries();
CollectionUtils.filter(entries, new SectionFilterPredicate());
print(entries, false);
public void testParseNavigationStructure() throws Exception {
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
List<SyndEntry> entries = feed.getEntries();
CollectionUtils.filter(entries, new NavigationFilterPredicate());
Collections.sort(entries, new PathComparator());
System.out.println("All navigation elements");
print(entries, true);
// show only current year navigation
System.out.println("Current year navigation elements");
CollectionUtils.filter(entries, new CurrentYearFilterPredicate());
print(entries, true);
// ==== bean building methods ====
private Page buildTestPage() {
Page page = new Page();
page.link = "http://sujitpal.blogspot.com/2010/11/scripting-luke-with-javascript.html";
page.title = "Scripting Luke with Javascript";
page.description = "For quite some time now, I have been looking...";
// dek
Section dek = new Section();
dek.title = "Salmon Run";
dek.content = "Swimming upstream...blah, blah, blah";
dek.path = "/dek";
// table of contents
Section toc = new Section();
toc.title = "Posts";
toc.content = "<div class=\"widget-content\">...</div>";
toc.path = "/toc";
// cloud
Section cloud = new Section();
cloud.title = "Labels";
cloud.content = "<div class='widget-content cloud-label-widget-content'>...</div>";
cloud.path = "/cloud";
// profile
Section profile = new Section();
profile.title = "About me";
profile.content = "I am a programmer...blah, blah, blah";
profile.imageUrl = "http://img.blogspot.com/path/to/my/picture.jpg";
profile.link = "http://profiles.blogspot.com/path/to/my/complete/profile.html";
profile.path = "/section";
// main content
Section content = new Section();
content.title = "Scripting Luke with Javascript";
content.link = "http://sujitpal.blogspot.com/2010/11/scripting-luke.html";
content.path = "/content";
content.content = "<p>For quite some time now, I have been looking...</p>";
// labels
Section labels = new Section();
labels.title = "Share/Save";
labels.path = "/labels";
labels.content = "javascript, lucene, luke, scripting, search";
// navigation elements
Section nav1 = new Section();
nav1.title = "2010";
nav1.link = "http://sujitpal.blogspot.com/search?updated-min=2010-01-01T00%3A00%3A00-08%3A00&updated-max=2011-01-01T00%3A00%3A00-08%3A00&max-results=33";
nav1.path = "/toc/2010";
Section nav11 = new Section();
nav11.title = "November";
nav11.link = "http://sujitpal.blogspot.com/2010_11_01_archive.html";
nav11.path = "/toc/2010/11";
Section nav111 = new Section();
nav111.title = "Scripting Luke with Javascript";
nav111.link = "http://sujitpal.blogspot.com/2010/11/scripting-luke-with-javascript.html";
nav111.path = "/toc/2010/11/03";
Section nav12 = new Section();
nav12.title = "March";
nav12.link = "http://sujitpal.blogspot.com/2010_03_01_archive.html";
nav12.path = "/toc/2010/03";
Section nav121 = new Section();
nav121.title = "Java REST Client for Ehcache server";
nav121.link = "http://sujitpal.blogspot.com/2010/03/java-rest-client-interface-for-ehcache.html";
nav121.path = "/toc/2010/03/03";
Section nav2 = new Section();
nav2.title = "2009";
nav2.link = "http://sujitpal.blogspot.com/search?updated-min=2009-01-01T00%3A00%3A00-08%3A00&updated-max=2010-01-01T00%3A00%3A00-08%3A00&max-results=44";
nav2.path = "/toc/2009";
Section nav21 = new Section();
nav21.title = "December";
nav21.path = "/toc/2009/12";
Section nav211 = new Section();
nav211.title = "A Lucene POS Tagging Tokenfilter";
nav211.link = "http://sujitpal.blogspot.com/2009/12/lucene-pos-tagging-tokenfilter.html";
nav211.path = "/toc/2009/12/02";
// return the page bean
return page;
private class Page {
public String title;
public String link;
public String description;
public List<Section> sections = new ArrayList<Section>();
private class Section {
public String title;
public String link;
public String content;
public String path;
public String imageUrl;
// ==== feed building methods ====
private SyndFeed buildEmptyFeed() {
SyndFeed feed = new SyndFeedImpl();
feed.setTitle("My feed title");
feed.setDescription("Test for Hierarchical Data in RSS Feed");
return feed;
private void addEntry(SyndFeed feed, SyndEntry entry) {
List<SyndEntry> entries = feed.getEntries();
if (entries == null) {
entries = new ArrayList<SyndEntry>();
private void setImage(SyndEntry entry, String imageUrl) {
if (imageUrl != null) {
try {
MediaContent[] contents = new MediaContent[] {
new MediaContent(new UrlReference(imageUrl))
MediaEntryModuleImpl module = new MediaEntryModuleImpl();
} catch (MalformedURLException e) {
private void setContent(SyndEntry entry, String content) {
if (content != null) {
List<String> contents = new ArrayList<String>();
ContentModule module = new ContentModuleImpl();
private void setDescription(SyndEntry entry, String description) {
SyndContent content = new SyndContentImpl();
private void addCategory(SyndEntry entry, String name, String value) {
SyndCategory category = new SyndCategoryImpl();
// ==== feed parsing methods ====
private class SectionFilterPredicate implements Predicate<SyndEntry> {
@Override public boolean evaluate(SyndEntry entry) {
String path = getCategoryValue(entry, "path");
return StringUtils.split(path, "/").length == 1;
private class NavigationFilterPredicate implements Predicate<SyndEntry> {
@Override public boolean evaluate(SyndEntry entry) {
String path = getCategoryValue(entry, "path");
return path.startsWith("/toc") &&
StringUtils.split(path, "/").length > 1;
private class CurrentYearFilterPredicate implements Predicate<SyndEntry> {
private int currentYear;
public CurrentYearFilterPredicate() {
Calendar cal = Calendar.getInstance();
this.currentYear = cal.get(Calendar.YEAR);
@Override public boolean evaluate(SyndEntry entry) {
String[] pathElements =
StringUtils.split(getCategoryValue(entry, "path"), "/");
if (pathElements.length >= 2) {
return currentYear == Integer.valueOf(pathElements[1]);
return false;
// if we need to change the implicit ordering
private class PathComparator implements Comparator<SyndEntry> {
@Override public int compare(SyndEntry entry1, SyndEntry entry2) {
String[] paths1 =
StringUtils.split(getCategoryValue(entry1, "path"), "/");
String[] paths2 =
StringUtils.split(getCategoryValue(entry2, "path"), "/");
Integer y1 = paths1.length < 2 ?
Integer.MAX_VALUE : Integer.valueOf(paths1[1]);
Integer y2 = paths2.length < 2 ?
Integer.MAX_VALUE : Integer.valueOf(paths2[1]);
if (y1 == y2) {
Integer m1 = paths1.length < 3 ?
Integer.MAX_VALUE : Integer.valueOf(paths1[2]);
Integer m2 = paths2.length < 3 ?
Integer.MAX_VALUE : Integer.valueOf(paths2[2]);
if (m1 == m2) {
Integer d1 = paths1.length < 4 ?
Integer.MAX_VALUE : Integer.valueOf(paths1[3]);
Integer d2 = paths2.length < 4 ?
Integer.MAX_VALUE : Integer.valueOf(paths2[3]);
if (d1 == d2) {
String title1 = entry1.getTitle();
String title2 = entry2.getTitle();
return title1.compareTo(title2);
} else {
return d2.compareTo(d1);
} else {
return m2.compareTo(m1);
} else {
return y2.compareTo(y1);
private String getCategoryValue(SyndEntry entry, String name) {
List<SyndCategory> categories = entry.getCategories();
String value = null;
for (SyndCategory category : categories) {
if (name.equals(category.getTaxonomyUri())) {
value = category.getName();
return value;
private void print(List<SyndEntry> entries, boolean indent) {
for (SyndEntry entry : entries) {
String path = getCategoryValue(entry, "path");
System.out.println((indent ?
StringUtils.repeat(".", StringUtils.split(path, "/").length) : "") +
entry.getTitle() + " (" + path + ")");