In his article, Advanced Java Content Repository API, Sunil Patil says that two of the most popular advanced features of a JCR compliant content repository (one of which is Jackrabbit) are Versioning and Observation. Since I was already looking at Jackrabbit, I decided to check out these APIs a bit to see if I could find some use for them.
I can see the Versioning API being useful for organizations who actually generate their own content, and would need to track any changes made to documents. This is particularly true in industries with strong compliance requirements, such as Finance, Healthcare, etc. Although we do generate some amount of internal content, typically they don't need to be maintained and revised, they just expire after a period of time, so we don't really have a need for version history. So I read about it in Sunil Patil's article, but didn't make any effort to actually try it in my own use case.
The Observation API looked interesting. It allows you to register Listeners on various predefined events such as a Node being removed or added, and Properties being added, removed or changed. I got interested in it because I thought that perhaps we could use these events to trigger legacy code that did not depend on the repository. As before, I decided to use the JCR module from the Spring Modules Project to make integration with Spring easier.
As an experiment, I decided to use the Observation API to trap a content update event, which would then trigger off a Lucene index update. The content update consists of dropping the content node for the content, creating a new one, and re-inserting the properties back in. The code for ContentUpdater.java is shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | import java.io.File;
import java.io.IOException;
import javax.jcr.Node;
import javax.jcr.NodeIterator;
import javax.jcr.RepositoryException;
import javax.jcr.Session;
import javax.jcr.query.Query;
import javax.jcr.query.QueryManager;
import javax.jcr.query.QueryResult;
import org.apache.log4j.Logger;
import org.springframework.beans.factory.annotation.Required;
import org.springmodules.jcr.JcrCallback;
import org.springmodules.jcr.JcrTemplate;
public class ContentUpdater {
private static final Logger LOGGER = Logger.getLogger(ContentUpdater.class);
private String contentSource;
private JcrTemplate jcrTemplate;
private IParser parser;
@Required
public void setContentSource(String contentSource) {
this.contentSource = contentSource;
}
@Required
public void setJcrTemplate(JcrTemplate jcrTemplate) {
this.jcrTemplate = jcrTemplate;
}
@Required
public void setParser(IParser parser) {
this.parser = parser;
}
public void update(final File file) {
jcrTemplate.execute(new JcrCallback() {
public Object doInJcr(Session session) throws IOException, RepositoryException {
try {
DataHolder dataHolder = parser.parse(file);
String contentId = dataHolder.getProperty("contentId");
Node contentSourceNode = getContentNode(session, contentSource, null);
Node contentNode = getContentNode(session, contentSource, contentId);
if (contentNode != null) {
contentNode.remove();
}
contentNode = contentSourceNode.addNode("content");
for (String propertyKey : dataHolder.getPropertyKeys()) {
String value = dataHolder.getProperty(propertyKey);
contentNode.setProperty(propertyKey, value);
}
session.save();
} catch (Exception e) {
throw new IOException("Parse error", e);
}
return null;
}
});
}
public Node getContentNode(final Session session, final String contentSource,
final String contentId) throws Exception {
if (contentId == null) {
return session.getRootNode().getNode(contentSource);
}
QueryManager queryManager = session.getWorkspace().getQueryManager();
Query query = queryManager.createQuery("//" + contentSource +
"/content[@contentId='" + contentId + "']", Query.XPATH);
QueryResult result = query.execute();
NodeIterator ni = result.getNodes();
if (ni.hasNext()) {
Node contentNode = ni.nextNode();
return contentNode;
} else {
return null;
}
}
}
|
When the session.save() happens, a bunch of events are thrown out by Jackrabbit to be picked up by any interested EventListener objects. We define one such EventListener which listens to one specific event generated by the ContentUpdater.java class, and handles it. The code for the ContentUpdatedEventListener.java is shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | import java.io.IOException;
import java.util.List;
import javax.jcr.Node;
import javax.jcr.NodeIterator;
import javax.jcr.Property;
import javax.jcr.PropertyIterator;
import javax.jcr.RepositoryException;
import javax.jcr.Session;
import javax.jcr.observation.Event;
import javax.jcr.observation.EventIterator;
import javax.jcr.observation.EventListener;
import javax.jcr.query.Query;
import javax.jcr.query.QueryManager;
import javax.jcr.query.QueryResult;
import org.apache.log4j.Logger;
import org.springframework.beans.factory.annotation.Required;
import org.springmodules.jcr.JcrCallback;
import org.springmodules.jcr.JcrTemplate;
/**
* Event listener that gets called whenever a source File node changes.
*/
public class ContentUpdatedEventListener implements EventListener {
private static final Logger LOGGER = Logger.getLogger(ContentUpdatedEventListener.class);
private JcrTemplate jcrTemplate;
private List<IEventHandler> eventHandlers;
@Required
public void setJcrTemplate(JcrTemplate jcrTemplate) {
this.jcrTemplate = jcrTemplate;
}
@Required
public void setEventHandlers(List<IEventHandler> eventHandlers) {
this.eventHandlers = eventHandlers;
}
public void onEvent(final EventIterator eventIterator) {
jcrTemplate.execute(new JcrCallback() {
public Object doInJcr(Session session) throws IOException, RepositoryException {
while (eventIterator.hasNext()) {
Event event = eventIterator.nextEvent();
if (event.getType() == Event.NODE_ADDED) {
QueryManager queryManager = session.getWorkspace().getQueryManager();
Query query = queryManager.createQuery("/" + event.getPath(), Query.XPATH);
QueryResult result = query.execute();
NodeIterator nodes = result.getNodes();
if (nodes.hasNext()) {
Node contentNode = nodes.nextNode();
PropertyIterator properties = contentNode.getProperties();
DataHolder dataHolder = new DataHolder();
while (properties.hasNext()) {
Property property = properties.nextProperty();
dataHolder.setProperty(property.getName(), property.getValue().getString());
}
LOGGER.debug("Did I get here?");
for (IEventHandler eventHandler : eventHandlers) {
try {
eventHandler.handle(dataHolder);
} catch (Exception e) {
LOGGER.info("Failed to handle event:" + event.getPath() +
" of type:" + event.getType() +
" by " + eventHandler.getClass().getName(), e);
}
}
}
}
}
return null;
}
});
}
}
|
To make the design more modular and cleaner, the EventListener can be injected with a List of IEventHandler objects, whose handle() method gets called in a for loop, so multiple actions can happen when an event is trapped by the Listener. The IEventHandler.java code is shown below:
1 2 3 | public interface IEventHandler {
public void handle(DataHolder holder) throws Exception;
}
|
A dummy implementation that does nothing but prints that it is updating a Lucene index is shown below, for illustration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import org.apache.log4j.Logger;
import org.springframework.beans.factory.annotation.Required;
/**
* A dummy class to demonstrate event handling.
*/
public class LuceneIndexUpdateEventHandler implements IEventHandler {
private static final Logger LOGGER = Logger.getLogger(LuceneIndexUpdateEventHandler.class);
private String indexPath;
@Required
public void setIndexPath(String indexPath) {
this.indexPath = indexPath;
}
public void handle(DataHolder holder) throws Exception {
LOGGER.info("Updated Lucene index at:" + indexPath);
}
}
|
Finally, we tie it all together with Spring configuration. Here is the applicationContext.xml file. Refer to my last post for the complete applicationContext.xml file, I just show the diffs here to highlight the changes and explain them:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | <beans ...>
...
<bean id="jcrSessionFactory" class="org.springmodules.jcr.JcrSessionFactory">
...
<property name="eventListeners">
<list>
<ref bean="contentUpdatedEventListenerDefinition"/>
</list>
</property>
</bean>
<!-- The updater -->
<bean id="myRandomContentUpdater" class="com.mycompany.myapp.ContentUpdater">
<property name="contentSource" value="myRandomContentSource"/>
<property name="jcrTemplate" ref="jcrTemplate"/>
<property name="parser" ref="someRandomDocumentParser"/>
</bean>
<!-- Linked to the EventListener via this bean -->
<bean id="contentUpdatedEventListenerDefinition" class="org.springmodules.jcr.EventListenerDefinition">
<property name="absPath" value="/"/>
<property name="eventTypes" value="1"/><!-- Event.NODE_ADDED -->
<property name="listener" ref="contentUpdatedEventListener"/>
</bean>
<!-- The EventListener -->
<bean id="contentUpdatedEventListener" class="com.mycompany.myapp.ContentUpdatedEventListener">
<property name="jcrTemplate" ref="jcrTemplate"/>
<property name="eventHandlers">
<list>
<ref bean="luceneIndexUpdateEventHandler"/>
</list>
</property>
</bean>
<!-- The EventHandler -->
<bean id="luceneIndexUpdateEventHandler" class="com.mycompany.myapp.LuceneIndexUpdateEventHandler">
<property name="indexPath" value="/tmp/lucene"/>
</bean>
</beans>
|
The first change is to register one or more EventListenerDefinition beans to the JcrSessionFactory. This is shown in the first block above. The second block is simply the configuration for the ContentUpdater. The third block is the EventListenerDefinition which says that the EventListener it defines listens to all events starting from root and fiters on event type 1 (Event.NODE_ADDED), and the actual reference to the EventListener bean. The fourth block is the definition and configuration for the ContentUpdatedEventListener EventListener implementation, which also takes in a List of IEventHandler objects. In our case the list contains only the reference to the dummy LuceneIndexUpdaterEventHandler class. The final block is the bean definition for the IEventHandler.
To run this code, I have a very simple JUnit harness that calls the ContentUpdater.update() method with a File reference. The node corresponding to the File is updated and an event sent, and we get to see a log message like the following in our logs. Notice that this log is usually emitted after JUnit's messages, signifying that this is called asynchronously.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.246 sec
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
13 Sep 2007 09:33:29,509 INFO com.healthline.jrtest.LuceneIndexUpdateEventHandler
com.healthline.jrtest.LuceneIndexUpdateEventHandler.handle(LuceneIndexUpdateEventHandler.java:25)
(ObservationManager, ): Updated Lucene index at:/tmp/lucene
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8 seconds
[INFO] Finished at: Thu Sep 13 09:33:29 PDT 2007
[INFO] Final Memory: 11M/86M
[INFO] ------------------------------------------------------------------------
|
The Observation API reminds me of a middleware application that I maintained for a while at a previous job, which was a bridge between our various home-grown content management systems and our actual publishing system. Events were sent as HTTP requests, and were converted into actual publishing requests by the application and sent to the publishing system. Jackrabbit's Observation API would be a perfect fit in this situation, and it would be so much more elegant.
As I was exploring the Versioning and Observation APIs, I had an epiphany. I realized the reason I have this whole love-hate thing (love the features, can't find enough reason to implement it) with Jackrabbit is because its targeted to a business model different from mine. Jackrabbit (and I am guessing any CMS in general) are targeted to businesses which tend to manage their content in individual pieces, such as news stories in a news company or product spec sheets for manufacturing companies, for example. Unlike them, we manage our content in bulk, regenerating all content from a content provider in batch mode. That may change in the future, and perhaps it would then be time to reconsider.
No comments:
Post a Comment
Comments are moderated to prevent spam.