Saturday, January 06, 2007

Spring enabled Lucene Searcher

As I mentioned in my previous blog entry, I have been dabbling with Lucene lately. Since we already use Spring MVC in our web application, and the component I was developing would be part of the web application, I naturally set about looking for ways to integrate my Lucene searcher into the Spring Application Context. As I looked around the web, however, I came across the SpringModules site, which is a site dedicated to integrating Spring with popular Java software that is not already integrated with Spring. SpringModules is still at the 0.7 release at the time of this writing, but one of the modules included in this release is the Lucene integration module. This entry describes my experiences building a Lucene searcher using this module.

A typical snippet of Lucene Searcher code to look up values in a "content" field of an index would look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public class NonSpringLuceneSeacher {

  private String pathToIndexSearcher;
  private Searcher searcher;

  public NonSpringLuceneSearcher(String pathToIndexSearcher) {
    this.pathToIndexSearcher = pathToIndexSearcher;
  }

  public List<MySearchResult> search(String queryString) throws Exception {
    if (searcher == null) {
      try {
        // lazily instantiate searcher
        searcher = new IndexSearcher(pathToIndexSearcher);
      } catch (IOException e) {
        throw e;
      }
    }
    Query query = QueryParser.parse(queryString, "content", new StandardAnalyzer());
    Hits hits = searcher.search(query);
    List<MySearchResult> results = new ArrayList<MySearchResult>();
    for (int i = 0; i < hits.length(); i++) {
      Document doc = hits.doc(i);
      MySearchResult result = new MySearchResult();
      result.setField(doc.get("field1"));
      // ...populate the MySearchResult bean
      results.add(result);
    }
    return results;
  }

  public void close() throws Exception {
    if (searcher != null) {
      searcher.close();
    }
  }
}

The first problem with this type of code is that the caller has to remember to close the searcher once he is done. This would typically be done at application shutdown in case of a web application. The second problem is the lazy instantiation of IndexSearcher, which means that if something went wrong when instantiating it, the caller will know when the search() method is called. Even if we instantiated the IndexSearcher in a static block, the exception would be thrown at about the same time. Ideally, we should know that there is something wrong with our IndexSearcher when we start our application.

The third problem is that there is some boilerplate code. Lucene is much better than other Java components such as JDBC in this respect, but we can get rid of some of the code using the classes available in the Lucene SpringModule module. The same class using this module is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
public class SpringLuceneSeacher extends LuceneSearchSupport {

  public List<MySearchResult> search(String queryString) throws Exception {
    Query query = QueryParser.parse(queryString, "content", getAnalyzer());
    List<MySearchResult> results = getTemplate().search(query, new HitExtractor() {
      public Object mapHit(int id, Document doc, float score) {
        MySearchResult result = new MySearchResult();
        result.setField(doc.get("field1"));
        // ...populate the MySearchResult bean
        return result;
      }
    });
    return results;
  }
}

The IndexSearcher itself is configured in the ApplicationContext using XML and injected into the bean. This is shown below. The Spring container takes care of instantiating the beans at application startup, so you would know at application startup if there is a problem with building the IndexSearcher possibly because of a bad directory mapping. Because our new Searcher extends LuceneSearchSupport, the searcher will be closed at application shutdown, so we dont have to worry about it. Finally, using Spring enables us to put all of our configuration information into the ApplicationContext, and since this is XML rather than Java code, it is easier to maintain.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  <bean id="fsDirectory" class="org.springmodules.lucene.index.support.FSDirectoryFactoryBean">
    <property name="location" value="file:/path/to/index/" />
  </bean>

  <bean id="searcherFactory" class="org.springmodules.lucene.search.factory.SimpleSearcherFactory">
    <property name="directory" ref="fsDirectory" />
  </bean>

  <bean id="springLuceneSearcher" class="com.mycompany.SpringLuceneSearcher">
    <property name="searcherFactory" ref="searcherFactory" />
    <property name="analyzer">
      <bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
    </property>
  </bean>

Obviously, this covers a very simple usage case. However, the Lucene SpringModule covers advanced usages as well. Instead of the SimpleSearcherFactory, you could use the ParallelSearcherFactory or the ParallelMultiSearcherFactory to get back various types of Searchers. The LuceneTemplate (which is returned from the call to LuceneSearchSupport.getTemplate()) also offers a variety of search() methods that correspond to the various search() methods available in the IndexSearcher.

Personally, I was quite impressed with the Lucene SpringModule. I found it to be easy to use, and it resulted in more robust, cleaner code. I plan to use it more in the future. If you know of any reasons why I should not, please let me know.

2 comments (moderated to prevent spam):

Lukas said...

You should definitely give a try to Compass (www.opensymphony.com/compass/). It's integration with Spring is easy and it allows you to also index domain objects (which are typically stored in database) just using annotations. I will write some post on my blog in the future if I have a chance.

Sujit Pal said...

Thank you, compass looks interesting. I just finished implementing something similar (not completely though, and smaller in scale, so not sure if a better approach would have been to customize compass from my cursory understanding of its features), so I should at least look at it as a source of ideas.