At long last, we come to (for me, anyway) the interesting parts of Alfresco - the ability to do stuff beyond just entering data and watch it show up in the web client. Not that it wasn't useful - I learnt a lot about the Alfresco API in the process, which is likely to help me do customizations as well. But enough about that - let's start on the Alfresco "PATH module".
Pages on most web sites today have "friendly" URLs, which drive off the title of the page. It makes the URL easier to remember, and also has SEO value. Drupal has the PATH module, which automatically generates a friendly URL based on tokens representing properties of the node in question. Alfresco, being a Document Management System rather than a Web Content Management System, does not (at least AFAIK going by a cursory look at AlfrescoForge). In any case, based on my understanding of Alfresco's Behaviours, it seemed like an interesting application, so I decided to implement it.
Implementation Notes
What I needed to do was to intercept the node as it was being updated, and if the friendly URL (furl) value is not set in the node, tokenize the title (removing stopwords, special characters, punctuation, etc) into a furl, check to see if the furl is already assigned to some other node, and if so, attempt to disambiguate it by adding a running sequence number. For example, if my title is "The house Jack built", it would be tokenized to the furl "house-jack-built". Now if there is already another node with the same furl, then the current node would be assigned the furl "house-jack-built-1".
Tokenizing was an easy decision. Simply pass the title through the Lucene StandardAnalyzer (Alfresco provides a variant called AlfrescoStandardAnalyser that I used) and join the tokens with "-".
To look up the furls already assigned to existing nodes, I could go a number of ways. The simplest way would have been to use Alfresco's SearchService and NodeService to collect all the furls from all the nodes into a Set and then use that to determine if the new furl has collisions, and if so, disambiguate. This would have to be done each time our Behaviour caught an update event. This was not a very performant option, so I decided against it.
The second approach is a variant of the first. Instead of doing this accumulation each time, we do it once on startup, then maintain the Set for the lifetime of the web application. This would be fine, unless at some point we decide to load balance across multiple Alfresco instances. At that point we would have to worry about replicating the Set across multiple JVMs (probably using distributed cache). This is doable, but there is a simpler way.
The third approach (which I ended up choosing), was to have an extra database table that maintains the collection for me. Lookup in this case is just that - a SQL SELECT. The downside is that I have to now introduce a new database table and figure out how to use the built in Hibernate SessionFactory to work with this table. This turned out to be simpler than I had expected.
FURL Structure
The diagram below shows the structure of a FURL in our hypothetical system. The first part of the FURL is the content type, which roughly corresponds to the Spring controller that will be used to render the page. The second part is a subtype, which in our case is the blogger's username, but could be something else depending on the content type (you can extract the appropriate property depending on the content type in the behavior code below). The last part is the alias, or the tokenized version of the title. If you look at the PathAliasDao code below, you will realize the it will allow the same alias for different bloggers or for different content types (thus maximizing the SEO benefit of good titles throughout your site). The last part is a suffix which I kept because the URLs in the XML feeds contained them.
http://www.mycompany.com/myapp/post/happy/house-jack-built.html -+-- --+-- --------+------- -+-- | | | | content type --------------------------------+ | | | sub type --------------------------------------+ | | alias --------------------------------------------------+ | suffix ------------------------------------------------------------+
We also remove the corresponding FURL entry if the document is deleted from the CMS. This frees up potentially good aliases to be reused.
Logging Setup
For development, I needed to enable logging for my new Behaviour (the parts of Alfresco I am working with for my customization seems to have been developed by people in the en_UK locale, which is reflected in the class names - not that its a huge problem or anything, I mention it only because people used to en_US spellings may trip up occasionally on these, although you will probably soon get used to it). To do this, I added a line to WEB-INF/classes/log4j.properties in the exploded Alfresco webapp, like so:
1 2 3 4 | # Source: ~/Library/Tomcat/alfresco/webapps/WEB-INF/classes/log4j.properties
...
# MyCompany
log4j.logger.com.mycompany=DEBUG
|
I don't know of a good way to do this via Alfresco's extension mechanism. One way I thought of was to copy the log4j.properties to my extension project, and rig up ant deploy to overwrite the webapp version. Not perfect, because you cannot "back out" your changes anymore short of exploding the original WAR file again, but still keeps artifacts changed by you under your source code repository's control. I haven't done this yet though, but I think I will have to at some point.
Hooking up with Hibernate
Hibernate is an ORM, ie, it maps relational database tables to Java objects and back. So we first need to define our table, then our object, and then the mapping XML file. Here is what my table looks like:
mysql> desc MY_PATH_ALIAS; +---------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+--------------+------+-----+---------+-------+ | ID | varchar(64) | NO | | NULL | | | TYPE | varchar(32) | NO | | NULL | | | SUBTYPE | varchar(32) | NO | | NULL | | | ALIAS | varchar(128) | NO | | NULL | | +---------+--------------+------+-----+---------+-------+
The Java class is a POJO which has these fields, and the associated getters and setters. I have removed the getters and setters for brevity, please use your IDE to fill them in for you.
1 2 3 4 5 6 7 8 9 10 11 | // Source: src/java/com/mycompany/alfresco/extension/path/PathAlias.java
package com.mycompany.alfresco.extension.path;
public class PathAlias {
private String id;
private String type;
private String subType;
private String alias;
...
}
|
Finally, we have the mapping file which relates the MY_PATH_ALIAS table to an instance of a PathAlias object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <?xml version="1.0" encoding="UTF-8"?>
<!-- Source: src/java/com/mycompany/alfresco/extension/path/PathAlias.hbm.xml -->
<!DOCTYPE hibernate-mapping PUBLIC
'-//Hibernate/Hibernate Mapping DTD 3.0//EN'
'http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd'>
<hibernate-mapping package="com.mycompany.alfresco.extension.path">
<class name="PathAlias" table="MY_PATH_ALIAS">
<id name="id" column="ID" type="string">
<generator class="assigned"/>
</id>
<property name="type" column="TYPE" type="string"/>
<property name="subType" column="SUBTYPE" type="string"/>
<property name="alias" column="ALIAS" type="string"/>
</class>
</hibernate-mapping>
|
We now need to add the mapping to Hibernate's session factory, so it knows about this table when we try to do database operations with the PathAlias object. I initially thought about using Hibernate @Entity annotations, since Alfresco uses Hibernate3 and annotations are supported. However, the session factory that is used by Alfresco and the JBPM extension use LocalSessionFactory which needs the hbm.xml mapping files. So anyway, to make Hibernate recognize this mapping, I added it in to the WEB-INF/classes/alfresco/hibernate-context.xml file as shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>
<!-- Source: ~/Library/Tomcat/webapps/alfresco/WEB-INF/classes/alfresco/hibernate-context.xml -->
<beans>
...
<bean id="sessionFactoryBase" abstract="true">
...
<property name="mappingResources">
<list>
...
<!-- MyCompany PATH "module" -->
<value>com/mycompany/alfresco/extension/path/PathAlias.hbm.xml</value>
</list>
</property>
...
</beans>
|
I should have probably created my own mycompany-hibernate-context.xml file in the config directory and only overriden the sessionFactory bean. But that has its own set of issues with regard to portability, since I now override a bean with references to the hbm.xml files corresponding to the base Alfresco and JBPM persistable objects. In any case, I haven't thought this through completely, if you have suggestions about best practices in this sort of situation, please let me know.
Data Access: PathAliasDao
The PathAliasDao connects the Alfresco Behaviour to the database. It provides methods to save and remove a PathAlias object, as well as a method to get back an unambiguous friendly URL for a given node. Here is the code. The save() and remove() methods are self explanatory, but I will describe the getFurl() method in a little more detail below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | // Source: src/java/com/mycompany/alfresco/extension/path/PathAliasDao.java
package com.mycompany.alfresco.extension.path;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser;
import org.apache.commons.lang.StringUtils;
import org.apache.log4j.Logger;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.springframework.orm.hibernate3.support.HibernateDaoSupport;
public class PathAliasDao extends HibernateDaoSupport {
private final Logger logger = Logger.getLogger(getClass());
private AlfrescoStandardAnalyser analyzer;
public void setAnalyzer(AlfrescoStandardAnalyser analyzer) {
this.analyzer = analyzer;
}
public void save(PathAlias pathAlias) {
getHibernateTemplate().save(pathAlias);
}
public void remove(String nodeRef) {
PathAlias pathAlias = new PathAlias();
pathAlias.setId(nodeRef);
getHibernateTemplate().delete(pathAlias);
}
public String getFurl(String nodeRefId, String type,
String owner, String title) {
return StringUtils.join(new String[] {
type,
owner,
getUniqueAlias(nodeRefId, type, owner, title)
}, "/") + ".html";
}
@SuppressWarnings("unchecked")
private String getUniqueAlias(String nodeRefId, String type,
String subType, String title) {
List<String> titleTokens = new ArrayList<String>();
TokenStream tokenStream = null;
try {
tokenStream = analyzer.tokenStream("XX", new StringReader(title));
Token token = null;
while ((token = tokenStream.next()) != null) {
titleTokens.add(token.termText());
}
} catch (IOException e) {
throw new RuntimeException("Error tokenizing title: " + title, e);
} finally {
if (tokenStream != null) {
try { tokenStream.close(); } catch (IOException e) {}
}
}
if (titleTokens.size() == 0) {
logger.warn(
"furl not created, please fix title so its not all stopwords");
return null;
}
String originalAlias = StringUtils.join(titleTokens.iterator(), "-");
int i = 0;
for (;;) {
String uniquAlias = (i == 0) ? originalAlias : (originalAlias + "-" + i);
List<PathAlias> list =
getHibernateTemplate().find(
"from PathAlias where type = ? and subType = ? and alias=?",
new Object[] {type, subType, uniquAlias});
if (list.size() > 0) {
i++;
} else {
PathAlias pathAlias = new PathAlias();
pathAlias.setId(nodeRefId);
pathAlias.setSubType(subType);
pathAlias.setType(type);
pathAlias.setAlias(uniquAlias);
save(pathAlias);
originalAlias = uniquAlias;
break;
}
}
return originalAlias;
}
}
|
Importing Existing Aliases
The posts we imported already have the friendly URLs set in the source XMLs (these are Atom feeds from Blogger), so we don't want to reassign them. So I wrote an importer which finds the Posts, reads its "furl" property and populates the MY_PATH_ALIAS table appropriately. It is similar to the other importers I have written before, except it references and uses the PathAliasDao. Here is the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | // Source: src/java/com/mycompany/alfresco/extension/loaders/PathAliasImporter.java
package com.mycompany.alfresco.extension.loaders;
import java.io.Serializable;
import java.util.List;
import java.util.Map;
import javax.transaction.UserTransaction;
import org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser;
import org.alfresco.service.ServiceRegistry;
import org.alfresco.service.cmr.repository.ChildAssociationRef;
import org.alfresco.service.cmr.repository.NodeRef;
import org.alfresco.service.cmr.repository.NodeService;
import org.alfresco.service.cmr.repository.StoreRef;
import org.alfresco.service.cmr.search.ResultSet;
import org.alfresco.service.cmr.search.SearchService;
import org.alfresco.service.cmr.security.AuthenticationService;
import org.alfresco.service.namespace.QName;
import org.alfresco.service.transaction.TransactionService;
import org.alfresco.util.ApplicationContextHelper;
import org.apache.commons.lang.StringUtils;
import org.hibernate.SessionFactory;
import org.junit.Test;
import org.springframework.context.ApplicationContext;
import com.mycompany.alfresco.extension.model.MyContentModel;
import com.mycompany.alfresco.extension.path.PathAlias;
import com.mycompany.alfresco.extension.path.PathAliasDao;
/**
* Imports existing furl data from posts into PATH_ALIAS table.
*/
public class PathAliasImporter {
private ServiceRegistry serviceRegistry;
private PathAliasDao pathAliasDao;
public void init() throws Exception {
ApplicationContext ctx = ApplicationContextHelper.getApplicationContext();
this.serviceRegistry =
(ServiceRegistry) ctx.getBean(ServiceRegistry.SERVICE_REGISTRY);
this.pathAliasDao = new PathAliasDao();
pathAliasDao.setAnalyzer(new AlfrescoStandardAnalyser());
pathAliasDao.setSessionFactory(
(SessionFactory) ctx.getBean("sessionFactory"));
}
public void importAll() throws Exception {
AuthenticationService authService =
serviceRegistry.getAuthenticationService();
authService.authenticate("admin", "admin".toCharArray());
String ticket = authService.getCurrentTicket();
TransactionService txService = serviceRegistry.getTransactionService();
UserTransaction tx = txService.getUserTransaction();
tx.begin();
try {
SearchService searchService = serviceRegistry.getSearchService();
NodeService nodeService = serviceRegistry.getNodeService();
ResultSet resultSet = null;
try {
resultSet = searchService.query(
StoreRef.STORE_REF_WORKSPACE_SPACESSTORE,
SearchService.LANGUAGE_LUCENE,
"TYPE:\"" + MyContentModel.TYPE_POST.toString() + "\"");
List<ChildAssociationRef> carefs = resultSet.getChildAssocRefs();
for (ChildAssociationRef caref : carefs) {
NodeRef postRef = caref.getChildRef();
Map<QName,Serializable> props = nodeService.getProperties(postRef);
String id = postRef.getId();
String furl = (String) props.get(MyContentModel.PROP_FURL);
String[] parts = StringUtils.split(furl, "/");
String subType = parts[1];
String alias =
StringUtils.substring(parts[2], 0, parts[2].lastIndexOf('.'));
System.out.println("Saving furl: " + furl);
PathAlias pathAlias = new PathAlias();
pathAlias.setId(id);
pathAlias.setSubType(subType);
pathAlias.setType(MyContentModel.TYPE_POST_STR);
pathAlias.setAlias(alias);
pathAliasDao.save(pathAlias);
}
} finally {
if (resultSet != null) {
resultSet.close();
}
}
tx.commit();
} catch (Exception e) {
tx.rollback();
throw e;
}
authService.invalidateTicket(ticket);
authService.clearCurrentSecurityContext();
}
@Test
public void testImportAll() throws Exception {
PathAliasImporter importer = new PathAliasImporter();
importer.init();
importer.importAll();
}
}
|
The Behaviour class: PathAliasSetter
Finally, we come to our Behaviour class. As you can see, it is bound to node update and delete. There is a bit of boilerplate in the init() method where it declares the two Behaviours and binds to the my:PublishableDoc content type (so all publishable documents and its subtypes will have this behaviour). The code for onUpdateNode and onDeleteNode is fairly self-explanatory. There are checks to make sure that a node which already has a furl assigned does not have it reassigned (for example, if the title changes), and that the furl assignment happens only at the stage where we have enough data to drive it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | // Source: src/java/com/mycompany/alfresco/extension/path/PathAliasSetter.java
package com.mycompany.alfresco.extension.path;
import java.io.Serializable;
import java.util.Collection;
import java.util.Map;
import org.alfresco.model.ContentModel;
import org.alfresco.repo.node.NodeServicePolicies.OnDeleteNodePolicy;
import org.alfresco.repo.node.NodeServicePolicies.OnUpdateNodePolicy;
import org.alfresco.repo.policy.Behaviour;
import org.alfresco.repo.policy.JavaBehaviour;
import org.alfresco.repo.policy.PolicyComponent;
import org.alfresco.repo.policy.Behaviour.NotificationFrequency;
import org.alfresco.service.cmr.repository.ChildAssociationRef;
import org.alfresco.service.cmr.repository.NodeRef;
import org.alfresco.service.cmr.repository.NodeService;
import org.alfresco.service.namespace.NamespaceService;
import org.alfresco.service.namespace.QName;
import org.apache.commons.lang.StringUtils;
import org.apache.log4j.Logger;
import com.mycompany.alfresco.extension.model.MyContentModel;
/**
* Behavior to update the furl for publishable docs. The furl is
* only updated if it is null. It is fired on node update and delete.
*/
public class PathAliasSetter implements OnUpdateNodePolicy, OnDeleteNodePolicy {
private final Logger logger = Logger.getLogger(getClass());
private PolicyComponent policyComponent;
private NodeService nodeService;
private PathAliasDao pathAliasDao;
private Behaviour onUpdateNode;
private Behaviour onDeleteNode;
public void setPolicyComponent(PolicyComponent policyComponent) {
this.policyComponent = policyComponent;
}
public void setNodeService(NodeService nodeService) {
this.nodeService = nodeService;
}
public void setPathAliasDao(PathAliasDao pathAliasDao) {
this.pathAliasDao = pathAliasDao;
}
public void init() {
logger.info("Initializing PathAliasSetter behavior...");
this.onUpdateNode = new JavaBehaviour(
this, "onUpdateNode", NotificationFrequency.EVERY_EVENT);
this.onDeleteNode = new JavaBehaviour(
this, "onDeleteNode", NotificationFrequency.EVERY_EVENT);
// bind behavior to node policy
this.policyComponent.bindClassBehaviour(
QName.createQName(NamespaceService.ALFRESCO_URI, "onUpdateNode"),
MyContentModel.TYPE_PUBLISHABLE_DOC, this.onUpdateNode);
this.policyComponent.bindClassBehaviour(
QName.createQName(NamespaceService.ALFRESCO_URI, "onDeleteNode"),
MyContentModel.TYPE_PUBLISHABLE_DOC, this.onDeleteNode);
}
@SuppressWarnings("unchecked")
@Override
public void onUpdateNode(NodeRef nodeRef) {
Map<QName,Serializable> props = nodeService.getProperties(nodeRef);
String type = nodeService.getType(nodeRef).getLocalName();
// make sure that we are in the last step of the WCM content wizard
Collection<String> pubState =
(Collection<String>) props.get(MyContentModel.PROP_PUBSTATE);
if (pubState != null && pubState.size() == 1) {
String furl = (String) props.get(MyContentModel.PROP_FURL);
if (StringUtils.isEmpty(furl)) {
String title = (String) props.get(ContentModel.PROP_TITLE);
String subType = (String) props.get(ContentModel.PROP_CREATOR);
String path =
pathAliasDao.getFurl(nodeRef.getId(), type, subType, title);
nodeService.setProperty(nodeRef, MyContentModel.PROP_FURL, path);
}
}
}
@Override
public void onDeleteNode(ChildAssociationRef childAssocRef,
boolean archived) {
NodeRef nodeRef = childAssocRef.getChildRef();
pathAliasDao.remove(nodeRef.getId());
}
}
|
Alfresco is based on Spring, so the PathAliasSetter is a Spring bean, and needs to be defined in an extension context. It is passed in a PathAliasDao and a AlfrescoStandardAnalyser reference, as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | <?xml version="1.0" encoding="UTF-8"?>
<!-- Source: config/alfresco/extension/mycompany-behaviour-context.xml -->
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN'
'http://www.springframework.org/dtd/spring-beans.dtd'>
<beans>
<bean id="pathAliasSetter"
class="com.mycompany.alfresco.extension.path.PathAliasSetter"
init-method="init"
depends-on="mycompany.dictionaryBootstrap">
<property name="policyComponent" ref="policyComponent"/>
<property name="nodeService" ref="nodeService"/>
<property name="pathAliasDao" ref="pathAliasDao"/>
</bean>
<bean id="pathAliasDao"
class="com.mycompany.alfresco.extension.path.PathAliasDao">
<property name="analyzer" ref="alfrescoStandardAnalyser"/>
<property name="sessionFactory" ref="sessionFactory"/>
</bean>
<bean id="alfrescoStandardAnalyser"
class="org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser"/>
</beans>
|
The init-method tells Spring to run the boilerplate in the PathAliasSetter's init() method. The depends-on registers a dependency with our custom content model, without which Alfresco will complain that the my:publishableDoc is not a valid type. This is because the custom content types are not real Java classes that can be injected, its a mapping that has to be populated before you can run any code against them. I had never had to use the depends-on attribute before, but now I see its use.
Testing the Behaviour
To test this behaviour, I deployed the code using "ant deploy" to the exploded Alfresco webapp, and restarted Alfresco. I then logged in as 'happy' and added a couple of Posts using the Add Content link. In both cases my title was "This is a test title", which resulted in the following entries in the log file. As you can see, the first post was assigned a FURL of post/happy/test-title.html and the second post/happy/test-title-1.html. The database table was also populated correctly.
I also tested with trying to update an existing Post, and the FURL was not generated.
1 2 3 4 5 6 7 8 9 10 11 | ...
13:42:43,429 DEBUG [com.mycompany] preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@67e8a1f6, name=log4j:logger=com.mycompany
13:42:46,627 INFO [extension.path.PathAliasSetter] Initializing PathAliasSetter behavior...
...
Jun 16, 2010 1:44:21 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 113564 ms
...
13:55:22,710 User:happy DEBUG [extension.path.PathAliasSetter] path=post/happy/test-title.html
...
13:55:59,697 User:happy DEBUG [extension.path.PathAliasSetter] path=post/happy/test-title-1.html
...
|
Conclusions
While the code above mimics Drupal's PATH module, the approaches differ. With Drupal, one picks tokens out of the node edit form and strings them up to form the FURL pattern to be generated. So its easier for an administrator to customize the behavior. But considering that the URL structure for a given content type would almost never change for a website, the ability to define it without any programming probably isn't that important.
The Alfresco approach requires programming to achieve this, but its not very hard to do if you are a Java programmer and if you are familiar with Spring (and Hibernate in this case). Perosnally, I prefer this approach - as a programmer, I prefer working at the code level rather than the (undoubtedly more popular) module level.
Thanks so much for this very informative post. Do you have plans to contribute it as a project to the Alfresco forge?
ReplyDeleteKind regards,
Nancy Garrity
Alfresco Community Manager
Thanks Nancy. I didn't actually have any plans of doing this, since its basically a customization rather than a reusable module, but I figured that if you thought it was useful, then it may be for others too - so I have added it to AlfrescoForge as a code snippet.
ReplyDeleteOne minor nit...since you are community manager there. Is there a place in AlfrescoForge where you can point to external content such as mine, rather than have me go in and contribute? I ask this only because the contribution process takes time and (duplication of) effort, which may dissuade many contributors from doing this. What do you think?