Thursday, November 25, 2010

Modeling Hierarchical Data with RSS and ROME

Overview

Our API provides a set of REST services over RSS 2.0. One of them is a content service, which allows you to call up a piece of content by URL or ID. So far, we have been serving single pieces of content - recently the need came up for serving server generated mashups which would be presented to an external client, who would then restyle and rearrange the components as they see fit. This means that we have to serve multiple pieces of content that form a coherent whole - in other words, structured hierarchical data.

The RSS 2.0 Specification does not provide any support for hierarchical data. I briefly describe the approaches that occurred to me, then focus on the one I liked the most, which I also suggested in response to a question on StackOverflow. Here they are:

  1. Serve the entire content inside a content:encoded tag.
  2. Use the old style content tags.
  3. Write a custom ROME module to support inlined item hierarchies.
  4. Provide structure information as a category tag.

The first approach is one we already use for our current content service. While the assumption is that the client would use the content as-is on their site, the content is provided as XHTML, so the client can build an XML parser to parse the content if they so desired.

I briefly looked at the old-style content tags, but the embedded RDF tags within the content:items and content:item tags appeared quite confusing. They seem to be used only in RSS 1.0 however, so not quite sure if they can even be used (via ROME) with RSS 2.0. Also, I kind of lost interest when I realized that you can only model a single level with this approach.

I also briefly dallied with the idea of writing my own custom module which would allow nesting of items within items. It would have been interesting to do, but it seemed like a lot of unnecessary work. Moreover, the resulting XML would not have rendered in a standard browser, and would have needed custom parsing on the client side as well. There is a proposal to extend Atom to do something similar, so the idea is probably not that far-fetched.

The last idea (which I am going to expand on) is to return each component of the mashup as a separate item, each with its own content:encoded tag containing the XHTML content for that component. Each item has a category tag with an XPath like value that indicates "where" this component is on the page. The nice thing with this approach is that this requires no additional modules, and the output is renderable in a web browser.

A Concrete Example

As an example, lets see how this approach would play out with a service that provided RSS feeds of pages on this blog. These pages consist of various components or sections - the dek, the Post component, the Tag Cloud component, the main content component, etc. The RSS for this would look something like this:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" 
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:media="http://search.yahoo.com/mrss/"
     xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
     version="2.0">
  <channel>
    <title>My feed title</title>
    <link>http://www.foo.com/path/to/hypothetical/feed/service</link>
    <description>Test for Hierarchical Data in RSS Feed</description>
    <language>en_US</language>
    <dc:language>en_US</dc:language>
    <item>
      <title>Scripting Luke with Javascript</title>
      <link>http://sujitpal.blogspot.com/2010/11/scripting-luke-with-javascript.html</link>
      <description>For quite some time now, I have been looking...</description>
      <category domain="path">/</category>
      <guid>http://sujitpal.blogspot.com/2010/11/scripting-luke-with-javascript.html</guid>
    </item>
    <item>
      <title>Salmon Run</title>
      <category domain="path">/dek</category>
      <content:encoded><![CDATA[Swimming upstream...blah, blah, blah]]></content:encoded>
    </item>
    <item>
      <title>Posts</title>
      <category domain="path">/toc</category>
      <content:encoded><![CDATA[<div class="widget-content">...</div>]]></content:encoded>
    </item>
    <item>
      <title>Labels</title>
      <category domain="path">/cloud</category>
      <content:encoded><![CDATA[<div class='widget-content cloud-label-widget-content'>...</div>]]></content:encoded>
    </item>
    <item>
      <title>About me</title>
      <link>http://profiles.blogspot.com/path/to/my/complete/profile.html</link>
      <category domain="path">/section</category>
      <guid>http://profiles.blogspot.com/path/to/my/complete/profile.html</guid>
      <content:encoded><![CDATA[I am a programmer...blah, blah, blah]]></content:encoded>
      <media:content url="http://img.blogspot.com/path/to/my/picture.jpg" />
    </item>
    <item>
      <title>Scripting Luke with Javascript</title>
      <link>http://sujitpal.blogspot.com/2010/11/scripting-luke.html</link>
      <category domain="path">/content</category>
      <guid>http://sujitpal.blogspot.com/2010/11/scripting-luke.html</guid>
      <content:encoded><![CDATA[<p>For quite some time now, I have been looking...</p>]]></content:encoded>
    </item>
    <item>
      <title>Share/Save</title>
      <category domain="path">/labels</category>
      <content:encoded><![CDATA[javascript, lucene, luke, scripting, search]]></content:encoded>
    </item>
    <item>
      <title>2010</title>
      <link>http://sujitpal.blogspot.com/search?updated-min=2010-01-01T00%3A00%3A00-08%3A00&amp;updated-max=2011-01-01T00%3A00%3A00-08%3A00&amp;max-results=33</link>
      <category domain="path">/toc/2010</category>
      <guid>http://sujitpal.blogspot.com/search?updated-min=2010-01-01T00%3A00%3A00-08%3A00&amp;updated-max=2011-01-01T00%3A00%3A00-08%3A00&amp;max-results=33</guid>
    </item>
    <item>
      <title>November</title>
      <link>http://sujitpal.blogspot.com/2010_11_01_archive.html</link>
      <category domain="path">/toc/2010/11</category>
      <guid>http://sujitpal.blogspot.com/2010_11_01_archive.html</guid>
    </item>
    <item>
      <title>Scripting Luke with Javascript</title>
      <link>http://sujitpal.blogspot.com/2010/11/scripting-luke-with-javascript.html</link>
      <category domain="path">/toc/2010/11/03</category>
      <guid>http://sujitpal.blogspot.com/2010/11/scripting-luke-with-javascript.html</guid>
    </item>
    <item>
      <title>March</title>
      <link>http://sujitpal.blogspot.com/2010_03_01_archive.html</link>
      <category domain="path">/toc/2010/03</category>
      <guid>http://sujitpal.blogspot.com/2010_03_01_archive.html</guid>
    </item>
    <item>
      <title>Java REST Client for Ehcache server</title>
      <link>http://sujitpal.blogspot.com/2010/03/java-rest-client-interface-for-ehcache.html</link>
      <category domain="path">/toc/2010/03/03</category>
      <guid>http://sujitpal.blogspot.com/2010/03/java-rest-client-interface-for-ehcache.html</guid>
    </item>
    <item>
      <title>2009</title>
      <link>http://sujitpal.blogspot.com/search?updated-min=2009-01-01T00%3A00%3A00-08%3A00&amp;updated-max=2010-01-01T00%3A00%3A00-08%3A00&amp;max-results=44</link>
      <category domain="path">/toc/2009</category>
      <guid>http://sujitpal.blogspot.com/search?updated-min=2009-01-01T00%3A00%3A00-08%3A00&amp;updated-max=2010-01-01T00%3A00%3A00-08%3A00&amp;max-results=44</guid>
    </item>
    <item>
      <title>December</title>
      <category domain="path">/toc/2009/12</category>
    </item>
    <item>
      <title>A Lucene POS Tagging Tokenfilter</title>
      <link>http://sujitpal.blogspot.com/2009/12/lucene-pos-tagging-tokenfilter.html</link>
      <category domain="path">/toc/2009/12/02</category>
      <guid>http://sujitpal.blogspot.com/2009/12/lucene-pos-tagging-tokenfilter.html</guid>
    </item>
  </channel>
</rss>

The above XML was created from a bean which was manually populated with values. The XML generation code is not particularly interesting, but it is included below as part of the full code for this post. Real-life code would probably call multiple backing services to generate the bean, which would then be rendered into RSS by ROME.

Parsing this XML to extract the components is almost trivially simple. In this case, we are interested in the top level components, ie, those at level 1 of the tree. Our parsing code looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  @Test
  public void testParseSectionStructure() throws Exception {
    SyndFeedInput input = new SyndFeedInput();
    SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
    List<SyndEntry> entries = feed.getEntries();
    CollectionUtils.filter(entries, new SectionFilterPredicate());
    print(entries, false);
  }

  private class SectionFilterPredicate implements Predicate<SyndEntry> {
    @Override public boolean evaluate(SyndEntry entry) {
      String path = getCategoryValue(entry, "path");
      return StringUtils.split(path, "/").length == 1;
    }
  }

and results in the following list. The print() method above only prints the title and the path for this test, but obviously the client can do whatever he wants with the SyndEntry object.

1
2
3
4
5
6
    [junit] Salmon Run (/dek)
    [junit] Posts (/toc)
    [junit] Labels (/cloud)
    [junit] About me (/section)
    [junit] Scripting Luke with Javascript (/content)
    [junit] Share/Save (/labels)

What about multi-level elements such as the Posts element? Well, depending on whether the client will simply dump the component on his page or change it around, we can either provide the XHTML in a content:encoded tag or break it up into individual items. We have chosen to do both here, for illustration. To extract only the navigation elements, the parsing code could do something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
  @Test
  public void testParseNavigationStructure() throws Exception {
    SyndFeedInput input = new SyndFeedInput();
    SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
    List<SyndEntry> entries = feed.getEntries();
    CollectionUtils.filter(entries, new NavigationFilterPredicate());
    Collections.sort(entries, new PathComparator());
    print(entries, true);
  }

  private class NavigationFilterPredicate implements Predicate<SyndEntry> {
    @Override public boolean evaluate(SyndEntry entry) {
      String path = getCategoryValue(entry, "path");
      return path.startsWith("/toc") && 
        StringUtils.split(path, "/").length > 1;
    }
  }

  private class PathComparator implements Comparator<SyndEntry> {
    @Override public int compare(SyndEntry entry1, SyndEntry entry2) {
      String[] paths1 = 
        StringUtils.split(getCategoryValue(entry1, "path"), "/");
      String[] paths2 = 
        StringUtils.split(getCategoryValue(entry2, "path"), "/");
      Integer y1 = paths1.length < 2 ? 
        Integer.MAX_VALUE : Integer.valueOf(paths1[1]);
      Integer y2 = paths2.length < 2 ? 
        Integer.MAX_VALUE : Integer.valueOf(paths2[1]);
      if (y1 == y2) {
        Integer m1 = paths1.length < 3 ? 
          Integer.MAX_VALUE : Integer.valueOf(paths1[2]);
        Integer m2 = paths2.length < 3 ? 
          Integer.MAX_VALUE : Integer.valueOf(paths2[2]);
        if (m1 == m2) {
          Integer d1 = paths1.length < 4 ? 
            Integer.MAX_VALUE : Integer.valueOf(paths1[3]);
          Integer d2 = paths2.length < 4 ? 
            Integer.MAX_VALUE : Integer.valueOf(paths2[3]);
          if (d1 == d2) {
            String title1 = entry1.getTitle();
            String title2 = entry2.getTitle();
            return title1.compareTo(title2);
          } else {
            return d2.compareTo(d1);
          }
        } else {
          return m2.compareTo(m1);
        }
      } else {
        return y2.compareTo(y1);
      }
    }
  }

Which results in the following output. Note that the sorting is really not required if the server lists them out in the correct order, I only add it in as an example of how simple it is. In our case, the sorting was a bit more complex since we have ordering information built into the path components and we need to sort in reverse chronological order - in most cases, a simple lexical sort would be all that would be required, even if the server did not return them in the correct order (which would be a really strange and non-intuitive thing for the server to not do).

1
2
3
4
5
6
7
8
    [junit] ..2010 (/toc/2010)
    [junit] ...November (/toc/2010/11)
    [junit] ....Scripting Luke with Javascript (/toc/2010/11/03)
    [junit] ...March (/toc/2010/03)
    [junit] ....Java REST Client for Ehcache server (/toc/2010/03/03)
    [junit] ..2009 (/toc/2009)
    [junit] ...December (/toc/2009/12)
    [junit] ....A Lucene POS Tagging Tokenfilter (/toc/2009/12/02)

Equally simple is the ability to manipulate the structure. For example, the client may want to only show the navigation elements for the current year posts. To do that, the code would look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  @Test
  public void testParseNavigationStructure() throws Exception {
    SyndFeedInput input = new SyndFeedInput();
    SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
    List<SyndEntry> entries = feed.getEntries();
    CollectionUtils.filter(entries, new AndPredicate(
      new NavigationFilterPredicate(), new CurrentYearFilterPredicate()));
    print(entries, true);
  }

  private class CurrentYearFilterPredicate implements Predicate<SyndEntry> {

    private int currentYear;
    
    public CurrentYearFilterPredicate() {
      Calendar cal = Calendar.getInstance();
      this.currentYear = cal.get(Calendar.YEAR);
    }
    
    @Override public boolean evaluate(SyndEntry entry) {
      String[] pathElements = 
        StringUtils.split(getCategoryValue(entry, "path"), "/");
      if (pathElements.length >= 2) {
        return currentYear == Integer.valueOf(pathElements[1]);
      }
      return false;
    }
  }

Which results in the following output.

1
2
3
4
5
    [junit] ..2010 (/toc/2010)
    [junit] ...November (/toc/2010/11)
    [junit] ....Scripting Luke with Javascript (/toc/2010/11/03)
    [junit] ...March (/toc/2010/03)
    [junit] ....Java REST Client for Ehcache server (/toc/2010/03/03)

Conclusion

I personally think that the approach of "marking up" RSS item tags to indicate the location is quite simple and elegant. It does not require any extension or modification of the RSS specification, just a shared understanding about a single tag between server and client. But since I came up with it, I'm probably not being totally objective.

In my example, I have used ROME on both the server (XML generation) and client (XML parsing) ends to build and parse the RSS feed (as I would in real life too). Obviously, there is no guarantee that clients would use ROME (although it would simplify their lives enormously if they did). But the parsing stuff I talked about above can be done very easily with almost any XML parsing toolkit.

What do you think? Do you think the burden imposed on this client with this approach is excessive? Would it be better to come up with a custom microformat that handles this sort of data instead of using RSS? Would you have tackled the problem differently? Would appreciate hearing your concerns or if you have better/different ways of solving this problem.

Full Code

The full code for the above is modelled as a single JUnit test class, which is shown below. The interesting parts of the code has already been discussed above, this is here only for completeness.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
package com.mycompany.feeds.hierarchicaldata;

import java.io.File;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

import org.apache.commons.collections15.CollectionUtils;
import org.apache.commons.collections15.Predicate;
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.Test;

import com.sun.syndication.feed.WireFeed;
import com.sun.syndication.feed.module.content.ContentModule;
import com.sun.syndication.feed.module.content.ContentModuleImpl;
import com.sun.syndication.feed.module.mediarss.MediaEntryModuleImpl;
import com.sun.syndication.feed.module.mediarss.types.MediaContent;
import com.sun.syndication.feed.module.mediarss.types.UrlReference;
import com.sun.syndication.feed.synd.SyndCategory;
import com.sun.syndication.feed.synd.SyndCategoryImpl;
import com.sun.syndication.feed.synd.SyndContent;
import com.sun.syndication.feed.synd.SyndContentImpl;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndEntryImpl;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.feed.synd.SyndFeedImpl;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.WireFeedOutput;

public class HierarchyDataTest {

  private final Log log = LogFactory.getLog(getClass());
  
  @Test
  public void testBuild() throws Exception {
    Page page = buildTestPage();
    SyndFeed feed = buildEmptyFeed();
    // main entry
    SyndEntry mainEntry = new SyndEntryImpl();
    mainEntry.setTitle(page.title);
    mainEntry.setLink(page.link);
    setDescription(mainEntry, page.description);
    addCategory(mainEntry, "path", "/");
    addEntry(feed, mainEntry);
    // sections
    for (Section section : page.sections) {
      SyndEntry entry = new SyndEntryImpl();
      entry.setTitle(section.title);
      entry.setLink(section.link);
      setContent(entry, section.content);
      setImage(entry, section.imageUrl);
      addCategory(entry, "path", section.path);
      addEntry(feed, entry);
    }
    WireFeedOutput output = new WireFeedOutput();
    WireFeed wirefeed = feed.createWireFeed();
    FileUtils.writeStringToFile(new File("/tmp/hierarchy_test.xml"), 
      output.outputString(wirefeed), "UTF-8");
  }
  
  @Test
  public void testParseSectionStructure() throws Exception {
    SyndFeedInput input = new SyndFeedInput();
    SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
    List<SyndEntry> entries = feed.getEntries();
    CollectionUtils.filter(entries, new SectionFilterPredicate());
    print(entries, false);
  }

  @Test
  public void testParseNavigationStructure() throws Exception {
    SyndFeedInput input = new SyndFeedInput();
    SyndFeed feed = input.build(new File("/tmp/hierarchy_test.xml"));
    List<SyndEntry> entries = feed.getEntries();
    CollectionUtils.filter(entries, new NavigationFilterPredicate());
    Collections.sort(entries, new PathComparator());
    System.out.println("All navigation elements");
    print(entries, true);
    // show only current year navigation
    System.out.println("Current year navigation elements");
    CollectionUtils.filter(entries, new CurrentYearFilterPredicate());
    print(entries, true);
  }

  // ==== bean building methods ====
  
  private Page buildTestPage() {
    Page page = new Page();
    page.link = "http://sujitpal.blogspot.com/2010/11/scripting-luke-with-javascript.html";
    page.title = "Scripting Luke with Javascript";
    page.description = "For quite some time now, I have been looking...";
    // dek
    Section dek = new Section();
    dek.title = "Salmon Run";
    dek.content = "Swimming upstream...blah, blah, blah";
    dek.path = "/dek";
    page.sections.add(dek);
    // table of contents
    Section toc = new Section();
    toc.title = "Posts";
    toc.content = "<div class=\"widget-content\">...</div>";
    toc.path = "/toc";
    page.sections.add(toc);
    // cloud
    Section cloud = new Section();
    cloud.title = "Labels";
    cloud.content = "<div class='widget-content cloud-label-widget-content'>...</div>";
    cloud.path = "/cloud";
    page.sections.add(cloud);
    // profile
    Section profile = new Section();
    profile.title = "About me";
    profile.content = "I am a programmer...blah, blah, blah";
    profile.imageUrl = "http://img.blogspot.com/path/to/my/picture.jpg";
    profile.link = "http://profiles.blogspot.com/path/to/my/complete/profile.html";
    profile.path = "/section";
    page.sections.add(profile);
    // main content
    Section content = new Section();
    content.title = "Scripting Luke with Javascript";
    content.link = "http://sujitpal.blogspot.com/2010/11/scripting-luke.html";
    content.path = "/content";
    content.content = "<p>For quite some time now, I have been looking...</p>";
    page.sections.add(content);
    // labels
    Section labels = new Section();
    labels.title = "Share/Save";
    labels.path = "/labels";
    labels.content = "javascript, lucene, luke, scripting, search";
    page.sections.add(labels);
    // navigation elements
    Section nav1 = new Section();
    nav1.title = "2010";
    nav1.link = "http://sujitpal.blogspot.com/search?updated-min=2010-01-01T00%3A00%3A00-08%3A00&updated-max=2011-01-01T00%3A00%3A00-08%3A00&max-results=33";
    nav1.path = "/toc/2010";
    page.sections.add(nav1);
    Section nav11 = new Section();
    nav11.title = "November";
    nav11.link = "http://sujitpal.blogspot.com/2010_11_01_archive.html";
    nav11.path = "/toc/2010/11";
    page.sections.add(nav11);
    Section nav111 = new Section();
    nav111.title = "Scripting Luke with Javascript";
    nav111.link = "http://sujitpal.blogspot.com/2010/11/scripting-luke-with-javascript.html";
    nav111.path = "/toc/2010/11/03";
    page.sections.add(nav111);
    Section nav12 = new Section();
    nav12.title = "March";
    nav12.link = "http://sujitpal.blogspot.com/2010_03_01_archive.html";
    nav12.path = "/toc/2010/03";
    page.sections.add(nav12);
    Section nav121 = new Section();
    nav121.title = "Java REST Client for Ehcache server";
    nav121.link = "http://sujitpal.blogspot.com/2010/03/java-rest-client-interface-for-ehcache.html";
    nav121.path = "/toc/2010/03/03";
    page.sections.add(nav121);
    Section nav2 = new Section();
    nav2.title = "2009";
    nav2.link = "http://sujitpal.blogspot.com/search?updated-min=2009-01-01T00%3A00%3A00-08%3A00&updated-max=2010-01-01T00%3A00%3A00-08%3A00&max-results=44";
    nav2.path = "/toc/2009";
    page.sections.add(nav2);
    Section nav21 = new Section();
    nav21.title = "December";
    nav21.path = "/toc/2009/12";
    page.sections.add(nav21);
    Section nav211 = new Section();
    nav211.title = "A Lucene POS Tagging Tokenfilter";
    nav211.link = "http://sujitpal.blogspot.com/2009/12/lucene-pos-tagging-tokenfilter.html";
    nav211.path = "/toc/2009/12/02";
    page.sections.add(nav211);
    // return the page bean
    return page;
  }

  private class Page {
    public String title;
    public String link;
    public String description;
    public List<Section> sections = new ArrayList<Section>();
  }
  
  private class Section {
    public String title;
    public String link;
    public String content;
    public String path;
    public String imageUrl;
  }
  
  // ==== feed building methods ====
  
  private SyndFeed buildEmptyFeed() {
    SyndFeed feed = new SyndFeedImpl();
    feed.setFeedType("rss_2.0");
    feed.setTitle("My feed title");
    feed.setDescription("Test for Hierarchical Data in RSS Feed");
    feed.setLink("http://www.foo.com/path/to/hypothetical/feed/service");
    feed.setLanguage("en_US");
    return feed;
  }

  private void addEntry(SyndFeed feed, SyndEntry entry) {
    List<SyndEntry> entries = feed.getEntries();
    if (entries == null) {
      entries = new ArrayList<SyndEntry>();
    }
    entries.add(entry);
  }
  
  private void setImage(SyndEntry entry, String imageUrl) {
    if (imageUrl != null) {
      try {
        MediaContent[] contents = new MediaContent[] {
          new MediaContent(new UrlReference(imageUrl))
        };
        MediaEntryModuleImpl module = new MediaEntryModuleImpl();
        module.setMediaContents(contents);
        entry.getModules().add(module);
      } catch (MalformedURLException e) {
        log.warn(e);
      }
    }
  }
  
  private void setContent(SyndEntry entry, String content) {
    if (content != null) {
      List<String> contents = new ArrayList<String>();
      contents.add(content);
      ContentModule module = new ContentModuleImpl();
      module.setEncodeds(contents);
      entry.getModules().add(module);
    }
  }
  
  private void setDescription(SyndEntry entry, String description) {
    SyndContent content = new SyndContentImpl();
    content.setValue(description);
    entry.setDescription(content);
  }
  
  private void addCategory(SyndEntry entry, String name, String value) {
    SyndCategory category = new SyndCategoryImpl();
    category.setName(value);
    category.setTaxonomyUri(name);
    entry.getCategories().add(category);
  }
  
  // ==== feed parsing methods ====
  
  private class SectionFilterPredicate implements Predicate<SyndEntry> {
    @Override public boolean evaluate(SyndEntry entry) {
      String path = getCategoryValue(entry, "path");
      return StringUtils.split(path, "/").length == 1;
    }
  }

  private class NavigationFilterPredicate implements Predicate<SyndEntry> {
    @Override public boolean evaluate(SyndEntry entry) {
      String path = getCategoryValue(entry, "path");
      return path.startsWith("/toc") && 
        StringUtils.split(path, "/").length > 1;
    }
  }

  private class CurrentYearFilterPredicate implements Predicate<SyndEntry> {

    private int currentYear;
    
    public CurrentYearFilterPredicate() {
      Calendar cal = Calendar.getInstance();
      this.currentYear = cal.get(Calendar.YEAR);
    }
    
    @Override public boolean evaluate(SyndEntry entry) {
      String[] pathElements = 
        StringUtils.split(getCategoryValue(entry, "path"), "/");
      if (pathElements.length >= 2) {
        return currentYear == Integer.valueOf(pathElements[1]);
      }
      return false;
    }
  }

  // if we need to change the implicit ordering
  private class PathComparator implements Comparator<SyndEntry> {

    @Override public int compare(SyndEntry entry1, SyndEntry entry2) {
      String[] paths1 = 
        StringUtils.split(getCategoryValue(entry1, "path"), "/");
      String[] paths2 = 
        StringUtils.split(getCategoryValue(entry2, "path"), "/");
      Integer y1 = paths1.length < 2 ? 
        Integer.MAX_VALUE : Integer.valueOf(paths1[1]);
      Integer y2 = paths2.length < 2 ? 
        Integer.MAX_VALUE : Integer.valueOf(paths2[1]);
      if (y1 == y2) {
        Integer m1 = paths1.length < 3 ? 
          Integer.MAX_VALUE : Integer.valueOf(paths1[2]);
        Integer m2 = paths2.length < 3 ? 
          Integer.MAX_VALUE : Integer.valueOf(paths2[2]);
        if (m1 == m2) {
          Integer d1 = paths1.length < 4 ? 
            Integer.MAX_VALUE : Integer.valueOf(paths1[3]);
          Integer d2 = paths2.length < 4 ? 
            Integer.MAX_VALUE : Integer.valueOf(paths2[3]);
          if (d1 == d2) {
            String title1 = entry1.getTitle();
            String title2 = entry2.getTitle();
            return title1.compareTo(title2);
          } else {
            return d2.compareTo(d1);
          }
        } else {
          return m2.compareTo(m1);
        }
      } else {
        return y2.compareTo(y1);
      }
    }
  }

  private String getCategoryValue(SyndEntry entry, String name) {
    List<SyndCategory> categories = entry.getCategories();
    String value = null;
    for (SyndCategory category : categories) {
      if (name.equals(category.getTaxonomyUri())) {
        value = category.getName();
        break;
      }
    }
    return value;
  }

  private void print(List<SyndEntry> entries, boolean indent) {
    for (SyndEntry entry : entries) {
      String path = getCategoryValue(entry, "path");
      System.out.println((indent ? 
        StringUtils.repeat(".", StringUtils.split(path, "/").length) : "") + 
        entry.getTitle() + " (" + path + ")");
    }
  }
}

2 comments:

  1. Hi There i have a Query , Is there any possibility that we can create Tokens of Numeric values and Store it into one field as Numeric Index

    ReplyDelete
  2. I guess so, although you would have to be able to convert the number back to its position in the input tree, so you would either need to build the intelligence into your numeric index, or maintain some sort of lookup.

    ReplyDelete

Comments are moderated to prevent spam.