Salmon Run: ontology

Showing posts with label ontology. Show all posts

Saturday, June 13, 2009

Ontology Rules with Prolog

Over the years, I've had an on-again, off-again interest in Rules Engines. However, as Martin Fowler points out, it is often more pragmatic to build a custom engine. A custom engine can be as simple as a properties file modelled after an awk script (ie, {pattern => action} pairs). More complex rules, ie multiple pattern matches in a certain sequence leading to a single complex action, can also be modeled by doing a Java variant of the awk strategy, ie {Predicate => Closure}. Where Rules Engines shine, however, is when you need to do rule chaining or when the structure of the rules themselves (rather than their values) change very rapidly.

Motivation

I actually set out to learn Jena Rules using the Semantic Web Programming book as a guide. Midway through that exercise, it occurred to me that Prolog would be a cleaner and almost drop-in replacement to the rather verbose Turtle syntax. Apparently the Semantic Web community thinks otherwise, since Turtle stands for Terse RDF Triple language. I haven't actually used Prolog before this, although I've read code snippets in articles once or twice (but not recently), so the realization was almost like an epiphany.

Which Prolog?

I initially download GNU-Prolog because it was available from the yum repository, but then I decided to go with SWI-Prolog, because there is a Netbeans plugin available for it, and because it offers a Java-Prolog Interface (JPL) (haven't tried this yet). Because SWI-Prolog did not have an RPM for my AMD-64 laptop, I had to build it from source, but I did not have any problems doing that.

Learning Prolog

There are quite a few Prolog tutorials available on the Web, but most focus on trying to use it as a general-purpose programming language. Since I intended to use Prolog only for its logic programming facilities, I found the Learn Prolog Now! and Adventure in Prolog online books more suitable. The first one is based on SWI-Prolog and the second on Amzi! Prolog, but examples from both worked fine for me.

The Fact Base

A Prolog program consists of facts, rules and queries. In order to keep my fact base similar to the ontology model, I decided to model my facts as triples (isTriple/3 in Prolog, since it takes three arguments), as shown below. Each of the subject, predicate and object can either be an Atom or a Compound Term (you have to make this decision at modeling time). I've just used Atoms in my example.

isTriple(subject, predicate, object).
% if we want predicates to have a property such as weight, we
% can model it as a compound term as shown below:
isTriple(subject, predicate(name, weight), object).

I used a simple Java program to generate my initial fact base of about 500+ triples from the sample wine.rdf file. It uses Jena to parse the file and write out the facts into a flat file. Unlike my previous usage, where I tried to map inverse relationships using an Enum, this time I only consider the relationships that exist in the wine.rdf file itself, and use rules to build the inverse relations. Since my subject and object names start with upper-case, I prepended an 'a' to make it conform to Prolog's syntax rules. You can run this with a main() class or write a unit test. I used a unit test, but I am not showing this since its so trivial.

// Source: src/main/java/net/sf/jtmt/inferencing/prolog/Owl2PrologFactGenerator.java
package net.sf.jtmt.inferencing.prolog;

import java.io.FileWriter;
import java.io.PrintWriter;

import org.apache.commons.lang.StringUtils;

import com.hp.hpl.jena.graph.Node;
import com.hp.hpl.jena.graph.Node_URI;
import com.hp.hpl.jena.graph.Triple;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;

/**
 * Reads an OWL file representing an ontology, and outputs a Prolog
 * fact base.
 */
public class Owl2PrologFactGenerator {

  private String inputOwlFilename;
  private String outputPrologFilename;
  
  public void setInputOwlFilename(String inputOwlFilename) {
    this.inputOwlFilename = inputOwlFilename;
  }

  public void setOutputPrologFilename(String outputPrologFilename) {
    this.outputPrologFilename = outputPrologFilename;
  }

  public void generate() throws Exception {
    PrintWriter prologWriter = 
      new PrintWriter(new FileWriter(outputPrologFilename), true);
    Model model = ModelFactory.createDefaultModel();
    model.read(inputOwlFilename);
    StmtIterator sit = model.listStatements();
    while (sit.hasNext()) {
      Statement st = sit.next();
      Triple triple = st.asTriple();
      String prologFact = getPrologFact(triple);
      if (StringUtils.isNotEmpty(prologFact)) {
        prologWriter.println(getPrologFact(triple));
      }
    }
    model.close();
    prologWriter.flush();
    prologWriter.close();
  }

  private String getPrologFact(Triple triple) {
    StringBuilder buf = new StringBuilder();
    Node subject = triple.getSubject();
    Node object = triple.getObject();
    if ((subject instanceof Node_URI) &&
        (object instanceof Node_URI)) {
      buf.append("isTriple(a").
      append(triple.getSubject().getLocalName()).
      append(",").
      append(triple.getPredicate().getLocalName()).
      append(",a").
      append(triple.getObject().getLocalName()).
      append(").");
    }
    return buf.toString();
  }
}

My output file contains the fact base in Prolog syntax. Here is a partial listing, to show you how it looks. The full source file (including the rules and the testing function, described below) is available here if you want it.

% Source: src/main/prolog/net/sf/jtmt/inferencing/prolog/wine_facts.pro
% ...
isTriple(aCorbansPrivateBinSauvignonBlanc,hasBody,aFull).
isTriple(aCorbansPrivateBinSauvignonBlanc,hasFlavor,aStrong).
isTriple(aCorbansPrivateBinSauvignonBlanc,hasSugar,aDry).
isTriple(aCorbansPrivateBinSauvignonBlanc,hasMaker,aCorbans).
isTriple(aCorbansPrivateBinSauvignonBlanc,locatedIn,aNewZealandRegion).
isTriple(aCorbansPrivateBinSauvignonBlanc,type,aSauvignonBlanc).
isTriple(aSevreEtMaineMuscadet,hasMaker,aSevreEtMaine).
isTriple(aSevreEtMaineMuscadet,type,aMuscadet).
isTriple(aWineFlavor,subClassOf,aWineTaste).
isTriple(aWineFlavor,type,aClass).
isTriple(aEdnaValleyRegion,locatedIn,aCaliforniaRegion).
isTriple(aEdnaValleyRegion,type,aRegion).
...

Adding Rules

The first step is adding the inverse relationships using Prolog rules. This is quite simple, as shown below:

% Source: src/main/prolog/net/sf/jtmt/inferencing/prolog/wine_facts.pro
% ...
% --------------------------------------------------------------
%         rules to augment the generated facts.
% --------------------------------------------------------------

% rules to generate inverse relationships where applicable
isTriple(Subject, isVintageYearOf, Object) :-
    isTriple(Object, hasVintageYear, Subject).
isTriple(Subject, regionContains, Object) :-
    isTriple(Object, locatedIn, Subject).
isTriple(Subject, mainIngredient, Object) :-
    isTriple(Object, mainIngredient, Subject).
isTriple(Subject, isFlavorOf, Object) :-
    isTriple(Object, hasFlavor, Subject).
isTriple(Subject, isColorOf, Object) :-
    isTriple(Object, hasColor, Subject).
isTriple(Subject, isSugarContentOf, Object) :-
    isTriple(Object, hasSugar, Subject).
isTriple(Subject, isBodyOf, Object) :-
    isTriple(Object, hasBody, Subject).
isTriple(Subject, madeBy, Object) :-
    isTriple(Object, hasMaker, Subject).
isTriple(Subject, hasInstance, Object) :-
    isTriple(Object, type, Subject).
isTriple(Subject, superClassOf, Object) :-
    isTriple(Object, subClassOf, Subject).

Nothing fancy here, as you can see - we just create new isTriple rules by switching the subject and object around, and replacing the predicate with its inverse. These are simple examples of generating relationships algebrically from existing ones, we have slightly more complex examples later. Trying out some of these rules in the SWI-Prolog listener (AKA interactive shell in Python, or REPL in Lisp) shows that they work. Note that the last false indicates that there are no more matches for this rule.

sujit@sirocco:~$ pl
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 5.6.64)
Copyright (c) 1990-2008 University of Amsterdam.
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.

?- consult('wine_facts.pro').
% wine_facts.pro compiled 0.02 sec, 109,832 bytes
true.

?- isTriple(aCongressSpringsSemillon, hasSugar, aDry).
true ;
false.

?- isTriple(aDry, isSugarContentOf, aCongressSpringsSemillon).
true ;
false.

I then decided to add relations which don't already exist, using slightly more complex rules (involving recursion) to generate relationships from existing ones. Here is the snippet for these rules from my wine_facts.pro file.

% Source: src/main/prolog/net/sf/jtmt/inferencing/prolog/wine_facts.pro
% ...
% rule to find all wines produced by a given region (region can be at any
% level, ie. country (USRegion), state (CaliforniaRegion), or location within
% state (SantaCruzMountainsRegion). Only wines should be listed. We do this
% by ensuring that a Wine has a valid maker.
isTriple(Region, produces, Wine) :- isTriple(Region, regionContains, Wine),
                                    isTriple(Wine, hasMaker, _).
isTriple(Region, produces, Wine) :- isTriple(Region, regionContains, X),
                                    isTriple(X, produces, Wine),
                                    isTriple(Wine, hasMaker, _).
                                    
% rule to find out the region for which the wine is produced. Only the
% regions should be listed. We do this by ensuring that a Region has type
% aRegion.
isTriple(Wine, producedBy, Region) :- isTriple(Region, regionContains, Wine),
                                      isTriple(Region, type, aRegion).
isTriple(Wine, producedBy, Region) :- isTriple(Region, regionContains, X),
                                      isTriple(X, produces, Wine),
                                      isTriple(X, type, aRegion).

As before we can test these rules from the SWI-Prolog shell. However, I also built a little Prolog function that allows you to do Query-By-Example.

% Source: src/main/prolog/net/sf/jtmt/inferencing/prolog/wine_facts.pro
% --------------------------------------------------------------
%     simple query-by-example testing tool
% --------------------------------------------------------------
test(Subject,Predicate,Object) :- isTriple(Subject, Predicate, Object),
    tab(2), write('('), write(Subject),
    write(','), write(Predicate),
    write(','), write(Object),
    write(')'), nl, fail.

Running my test cases (commented out in the source file, since I could not get them to work in batch mode) in the SWI-Prolog listener returns the following (expected) results.

?- consult('wine_facts.pro').
% wine_facts.pro compiled 0.02 sec, 109,832 bytes
true.

?- test(aUSRegion, produces, X).
  (aUSRegion,produces,aMountEdenVineyardEstatePinotNoir)
  (aUSRegion,produces,aMountEdenVineyardEdnaValleyChardonnay)
  (aUSRegion,produces,aFormanChardonnay)
  (aUSRegion,produces,aWhitehallLaneCabernetFranc)
  (aUSRegion,produces,aFormanCabernetSauvignon)
  (aUSRegion,produces,aElyseZinfandel)
  (aUSRegion,produces,aSeanThackreySiriusPetiteSyrah)
  (aUSRegion,produces,aPageMillWineryCabernetSauvignon)
  (aUSRegion,produces,aBancroftChardonnay)
  (aUSRegion,produces,aSaucelitoCanyonZinfandel)
  (aUSRegion,produces,aSaucelitoCanyonZinfandel1998)
  (aUSRegion,produces,aMariettaPetiteSyrah)
  (aUSRegion,produces,aMariettaZinfandel)
  (aUSRegion,produces,aGaryFarrellMerlot)
  (aUSRegion,produces,aPeterMccoyChardonnay)
  (aUSRegion,produces,aMariettaOldVinesRed)
  (aUSRegion,produces,aCotturiZinfandel)
  (aUSRegion,produces,aMariettaCabernetSauvignon)
  (aUSRegion,produces,aVentanaCheninBlanc)
  (aUSRegion,produces,aLaneTannerPinotNoir)
  (aUSRegion,produces,aFoxenCheninBlanc)
  (aUSRegion,produces,aSantaCruzMountainVineyardCabernetSauvignon)
  (aUSRegion,produces,aStGenevieveTexasWhite)
false.

?- test(X, produces, aLaneTannerPinotNoir).
  (aSantaBarbaraRegion,produces,aLaneTannerPinotNoir)
  (aCaliforniaRegion,produces,aLaneTannerPinotNoir)
  (aUSRegion,produces,aLaneTannerPinotNoir)
false.

?- test(aTexasRegion, produces, X).
  (aTexasRegion,produces,aStGenevieveTexasWhite)
false.

?- test(X, produces, aStGenevieveTexasWhite).
  (aCentralTexasRegion,produces,aStGenevieveTexasWhite)
  (aTexasRegion,produces,aStGenevieveTexasWhite)
  (aUSRegion,produces,aStGenevieveTexasWhite)
false.

?- test(X, producedBy, aUSRegion).
  (aCaliforniaRegion,producedBy,aUSRegion)
  (aTexasRegion,producedBy,aUSRegion)
  (aMountEdenVineyardEstatePinotNoir,producedBy,aUSRegion)
  (aMountEdenVineyardEdnaValleyChardonnay,producedBy,aUSRegion)
  (aFormanChardonnay,producedBy,aUSRegion)
  (aWhitehallLaneCabernetFranc,producedBy,aUSRegion)
  (aFormanCabernetSauvignon,producedBy,aUSRegion)
  (aElyseZinfandel,producedBy,aUSRegion)
  (aSeanThackreySiriusPetiteSyrah,producedBy,aUSRegion)
  (aPageMillWineryCabernetSauvignon,producedBy,aUSRegion)
  (aBancroftChardonnay,producedBy,aUSRegion)
  (aSaucelitoCanyonZinfandel,producedBy,aUSRegion)
  (aSaucelitoCanyonZinfandel1998,producedBy,aUSRegion)
  (aMariettaPetiteSyrah,producedBy,aUSRegion)
  (aMariettaZinfandel,producedBy,aUSRegion)
  (aGaryFarrellMerlot,producedBy,aUSRegion)
  (aPeterMccoyChardonnay,producedBy,aUSRegion)
  (aMariettaOldVinesRed,producedBy,aUSRegion)
  (aCotturiZinfandel,producedBy,aUSRegion)
  (aMariettaCabernetSauvignon,producedBy,aUSRegion)
  (aVentanaCheninBlanc,producedBy,aUSRegion)
  (aLaneTannerPinotNoir,producedBy,aUSRegion)
  (aFoxenCheninBlanc,producedBy,aUSRegion)
  (aSantaCruzMountainVineyardCabernetSauvignon,producedBy,aUSRegion)
  (aStGenevieveTexasWhite,producedBy,aUSRegion)
false.

?- test(X, producedBy, aTexasRegion).
  (aCentralTexasRegion,producedBy,aTexasRegion)
  (aStGenevieveTexasWhite,producedBy,aTexasRegion)
false.

?- test(aLaneTannerPinotNoir, producedBy, X).
  (aLaneTannerPinotNoir,producedBy,aSantaBarbaraRegion)
  (aLaneTannerPinotNoir,producedBy,aCaliforniaRegion)
  (aLaneTannerPinotNoir,producedBy,aUSRegion)
false.

?- test(aStGenevieveTexasWhite, producedBy, X).
  (aStGenevieveTexasWhite,producedBy,aCentralTexasRegion)
  (aStGenevieveTexasWhite,producedBy,aTexasRegion)
  (aStGenevieveTexasWhite,producedBy,aUSRegion)
false.

Conclusion

I found this article (PDF) describing an attempt to model OWL rules using Prolog, so perhaps this idea is not as novel as it seemed to me at first. Prolog uses backward inferencing, which means that the rule based facts are recomputed on demand, rather than at the point of being asserted into the factbase. For a query-heavy system, which most rule based systems tend to be, this can have an impact on performance. But I think an application built around Prolog's rule engine can get around this by identifying a fact based on its origin, and generating and caching rule based facts at the point of assertion. If a rule is dropped or modified, the facts based on that rule could be recomputed and cached automatically.

In terms of simplicity of syntax alone, I think a Prolog based rule definition system would be a welcome addition to the Semantic Web Programmer's toolkit. The pattern-based query by example I have described is also likely to be much simpler and easier to use than the more imperative SPARQL query language used to query OWL ontologies.

However, I do plan to learn how to build rules using the tools and languages in the Jena framework, just because it is what I am more likely to use in a typical Semantic Web development environment.

Wednesday, June 03, 2009

A Custom Traverser for Neo4J

In my previous post, I used Jena to parse some sample OWL files, store the triples in Neo4J's graph database, and then query the database with the Neo4J API. Querying involves locating a given node in the graph, then navigating along one or more known relationship types. Neo4J has an elegant Traverser (see Javadocs) mechanism that allows you to specify traversal properties as function objects in the Node.traverse() call, and I was able to use this for my NeoOntologyNavigator.getNeighborsRelatedBy() method.

I also wanted to build something along the lines of a graph browser. This involves locating a node, and listing out all its immediate neighbors sorted (descending) by relationship weights. This could be used to power a web-based ontology browser, where each of the neighbor nodes would be hyperlinked, so clicking on one of these would show you to the neighbors of that node.

I wasn't able to figure out how to use the Traverser API to do this the first time around, so I went with the manual approach (see NeoOntologyNavigator.getAllNeighbors() in the previous post. However, as Peter Neubauer initially hinted at, it is possible to use the Traverser to do what I want, albeit in a slightly convoluted way, which is described below.

One caveat - Peter later responded to my reply to his comment, saying that direct usage of the Traverser mechanism to do what I want is indeed not possible in the current version (1.0-b8), but such a mechanism is planned in a future release of Neo4J. So you probably don't want to read too much into this post, the code below is perhaps at best a workaround for the current Neo4J version.

The "custom" part of my Traverser is really a custom ReturnableEvaluator. Each node that is traversed by Node.traverse() is passed to the ReturnableEvaluator to determine if the node should be included in the List of traversed Node returned by the Traverser's Iterator. So our custom ReturnableEvaluator checks to see if this node is "valid" for inclusion in the browse results (ie, no nodes navigable by weightless relationships and no self loops), and if valid, accumulates the Node into an internal data structure and returns true. Once the traversal is complete, the internal data structure is queried to yield a Map of List of Nodes, keyed by relationship name. Here is the code for the custom ReturnableEvaluator.

// Source: src/main/java/net/sf/jtmt/ontology/graph/BrowserReturnableEvaluator.java
package net.sf.jtmt.ontology.graph;

import java.util.ArrayList;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

import org.neo4j.api.core.Node;
import org.neo4j.api.core.Relationship;
import org.neo4j.api.core.ReturnableEvaluator;
import org.neo4j.api.core.TraversalPosition;

/**
 * Returnable Evaluator implementation that stores traversed nodes
 * in a data structure which is available to the client.
 */
public class BrowserReturnableEvaluator implements ReturnableEvaluator {

  private Node startNode;
  private TreeMap<String,ArrayList<WeightedNode>> neighbors;
  
  private class WeightedNode implements Comparable<WeightedNode> {
    public Node node;
    public Float weight;
    
    public WeightedNode(Node node, Float weight) {
      this.node = node;
      this.weight = weight;
    }
    
    public int compareTo(WeightedNode that) {
      return (that.weight.compareTo(this.weight));
    }
  };

  public BrowserReturnableEvaluator(Node startNode) {
    this.startNode = startNode;
    this.neighbors = 
      new TreeMap<String,ArrayList<WeightedNode>>();
  }
  
  public boolean isReturnableNode(TraversalPosition pos) {
    // if related to self, don't include in traversal results
    Node currentNode = pos.currentNode();
    if (startNode.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME).equals(
      currentNode.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME))) {
      return false;
    }
    // if relationship weight is 0.0F, don't include in traversal results
    Relationship lastRel = pos.lastRelationshipTraversed();
    Float relWeight = (Float) lastRel.getProperty(
      NeoOntologyNavigator.FIELD_RELATIONSHIP_WEIGHT);
    if (relWeight <= 0.0F) {
      return false;
    }
    String relName = (String) lastRel.getProperty(
      NeoOntologyNavigator.FIELD_RELATIONSHIP_NAME);
    // accumulate into our neighbor data structure
    ArrayList<WeightedNode> nodes;
    if (neighbors.containsKey(relName)) {
      nodes = neighbors.get(relName);
    } else {
      nodes = new ArrayList<WeightedNode>();
    }
    nodes.add(new WeightedNode(currentNode, relWeight));
    neighbors.put(relName, nodes);
    // include in traversal results
    return true;
  }

  public Map<String,List<Node>> getNeighbors() {
    Map<String,List<Node>> neighborsMap = 
      new LinkedHashMap<String,List<Node>>();
    for (String relName : neighbors.keySet()) {
      List<WeightedNode> weightedNodes = neighbors.get(relName);
      Collections.sort(weightedNodes);
      List<Node> relatedNodes = new ArrayList<Node>();
      for (WeightedNode weightedNode : weightedNodes) {
        relatedNodes.add(weightedNode.node);
      }
      neighborsMap.put(relName, relatedNodes);
    }
    return neighborsMap;
  }
}

The new version of NeoOntologyNavigator.getAllNeighbors() is shown below. It uses the new custom ReturnableEvaluator to do the traversal. Since we want all relationship types to be traversed, we pass them all to the Node.traverse() method call.

// Source: src/main/java/net/sf/jtmt/ontology/graph/NeoOntologyNavigator.java
...
public class NeoOntologyNavigator {
  ...
  /**
   * Return a Map of relationship names to a List of nodes connected
   * by that relationship. The keys are sorted by name, and the list
   * of node values are sorted by the incoming relation weights.
   * @param node the root Node.
   * @return a Map of String to Node List of neighbors.
   */
  public Map<String,List<Node>> getAllNeighbors(Node node)
      throws Exception {
    BrowserReturnableEvaluator browserReturnableEvaluator = 
      new BrowserReturnableEvaluator(node);
    Transaction tx = neoService.beginTx();
    try {
      // set up the data structure for all outgoing relationships
      OntologyRelationshipType[] relTypes = 
        OntologyRelationshipType.values();
      Object[] typeAndDirection = new Object[relTypes.length * 2];
      for (int i = 0; i < typeAndDirection.length; i++) {
        if (i % 2 == 0) {
          // relationship type slot
          typeAndDirection[i] = relTypes[i / 2];
        } else {
          // direction slot
          typeAndDirection[i] = Direction.OUTGOING;
        }
      }
      Traverser traverser = node.traverse(Order.BREADTH_FIRST, 
        StopEvaluator.DEPTH_ONE, 
        browserReturnableEvaluator, 
        typeAndDirection);
      for (Iterator<Node> it = traverser.iterator(); it.hasNext();) {
        // just eat up the nodes returned by the traverser, we are
        // really interested in the data structure.
        it.next();
      }
      // get at the accumulated data structure and return it
      tx.success();
      return browserReturnableEvaluator.getNeighbors();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
  }
  ...
}

And this yields the following identical output for our getAllNeighbors() test case described in the previous post, like this:

query> show neighbors for SchlossVolradTrochenbierenausleseRiesling
--- HAS_BODY ---
Full
--- HAS_FLAVOR ---
Moderate
--- HAS_MAKER ---
SchlossVolrad
--- HAS_SUGAR ---
Sweet
--- IS_INSTANCE_OF ---
SweetRiesling
--- LOCATED_IN ---
GermanyRegion

The data in the graph db does not exercise the custom traversal code fully, so I decided to run it against a test graph of one node with weighted relationships to a bunch of other nodes. The test case is inspired by Burger King's attempt to pair soft drinks with their burgers, which I also noticed on a recent trip there with the kids.

Here is the test case that builds up the database and traverses it with the custom ReturnableEvaluator.

// Source: src/test/java/net/sf/jtmt/ontology/graph/BrowserReturnableEvaluatorTest.java
package net.sf.jtmt.ontology.graph;

import java.util.Iterator;
import java.util.List;
import java.util.Map;

import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.neo4j.api.core.Direction;
import org.neo4j.api.core.EmbeddedNeo;
import org.neo4j.api.core.NeoService;
import org.neo4j.api.core.Node;
import org.neo4j.api.core.Relationship;
import org.neo4j.api.core.RelationshipType;
import org.neo4j.api.core.StopEvaluator;
import org.neo4j.api.core.Transaction;
import org.neo4j.api.core.Traverser;
import org.neo4j.api.core.Traverser.Order;

/**
 * Test to demonstrate sorting by relationship weights.
 */
public class BrowserReturnableEvaluatorTest {

  private static final Object[][] QUADS = new Object[][] {
    new Object[] {"coke", RelTypes.GOES_WITH, 10.0F, "whopper"},
    new Object[] {"coke", RelTypes.GOES_WITH, 10.0F, "doubleWhopper"},
    new Object[] {"coke", RelTypes.GOES_WITH, 5.0F, "tripleWhopper"},
    new Object[] {"coke", RelTypes.HAS_INGREDIENTS, 10.0F, "water"},
    new Object[] {"coke", RelTypes.HAS_INGREDIENTS, 9.0F, "sugar"},
    new Object[] {"coke", RelTypes.HAS_INGREDIENTS, 2.0F, "carbonDioxide"},
    new Object[] {"coke", RelTypes.HAS_INGREDIENTS, 5.0F, "secretRecipe"}
  };
  
  private enum RelTypes implements RelationshipType {
    GOES_WITH,
    HAS_INGREDIENTS
  };
  
  private static NeoService neoService;
  private static Node coke;
  
  @BeforeClass
  public static void setupBeforeClass() throws Exception {
    // load up the test data
    neoService = new EmbeddedNeo("/tmp/neotest");
    Transaction tx = neoService.beginTx();
    try {
      // drink nodes
      coke = neoService.createNode();
      coke.setProperty("name", "coke");
      for (Object[] quad : QUADS) {
        Node objectNode = neoService.createNode();
        objectNode.setProperty("name", (String) quad[3]);
        Relationship rel = 
          coke.createRelationshipTo(objectNode, (RelationshipType) quad[1]);
        rel.setProperty("name", ((RelationshipType) quad[1]).name());
        rel.setProperty("weight", (Float) quad[2]);
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
  }
  
  @AfterClass
  public static void teardownAfterClass() throws Exception {
    if (neoService != null) {
      neoService.shutdown();
    }
  }
  
  @Test
  public void testCustomEvaluator() throws Exception {
    Transaction tx = neoService.beginTx();
    try {
      BrowserReturnableEvaluator customReturnEvaluator = 
        new BrowserReturnableEvaluator(coke);
      Traverser traverser = coke.traverse(
        Order.BREADTH_FIRST, 
        StopEvaluator.DEPTH_ONE, 
        customReturnEvaluator, 
        RelTypes.GOES_WITH, Direction.OUTGOING, 
        RelTypes.HAS_INGREDIENTS, Direction.OUTGOING);
      for (Iterator<Node> it = traverser.iterator(); it.hasNext();) {
        it.next();
      }
      Map<String,List<Node>> neighbors =
        customReturnEvaluator.getNeighbors();
      for (String relName : neighbors.keySet()) {
        System.out.println("-- " + relName + " --");
        List<Node> relatedNodes = neighbors.get(relName);
        for (Node relatedNode : relatedNodes) {
          System.out.println(relatedNode.getProperty("name"));
        }
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
  }
}

And the output. I was checking specifically for (1) whether all related nodes are shown, (2) whether the output is sorted by relationship name, and (3) whether the related nodes are ordered correctly by relationship weight. As you can see, it does.

-- GOES_WITH --
whopper
doubleWhopper
tripleWhopper
-- HAS_INGREDIENTS --
water
sugar
secretRecipe
carbonDioxide

It took me a while to figure out how to use the Traverser mechanism to solve the problem described, so hopefully I've saved you some time if you have a similar problem. With the upcoming enhancements to the Traverser API as described by Peter in the comments on the previous post, this approach may not be needed in the future. Also, the approach is probably not ideal, since the idea behind a Traverser is to traverse rather than accumulate. But it may not be a problem if your graph is not too dense, and you want to solve a similar problem with the current version of Neo4J.

Of course, I am by no means an expert on Neo4J, so if you have ideas on achieving the same result in a simpler way, would love to hear from you.

Saturday, May 30, 2009

Using Neo4J to load and query OWL ontologies

I've written previously about modeling, storing and navigating through ontologies (you can see them here, here, here and here). These were all based on ideas on how I could improve upon ontology systems I had previously encountered at work. As I have no formal background in Semantic Web Programming, most of these implementations were based on tools that I was already familiar with or wanted to get familiar with.

I recently bought a book on Semantic Web Programming (see my review on Amazon here), and I must say it opened up a whole new world for me. Among other things, the book has a very good coverage of Jena, a Semantic Web Framework for Java, something I had been meaning to take a look at for a while.

Somewhat unrelated, I also came across Neo4J, a graph database, and it seemed to be a good fit as a data store for an ontology. Prior to this, the ontologies I have seen were stored in a relational database, which was then converted into an in-memory graph, then serialized out to disk using Java serialization for use by applications. This means that the serialized version is a point-in-time snapshot, not a true copy of the ontology. Depending on how frequently the ontology is updated, this may not be a big deal. But if the ontology is stored in a graph database to begin with, then the backend could continue to update the database, and the application would always see the current ontology. Makes things much cleaner in my opinion.

So I decided to take the OWL file for a sample Wine and Food ontology, and parse it using Jena, then load it into the Neo graph database, and run a few queries against it, to familiarize myself with the Jena and Neo APIs. This post is a result of that effort.

Load Phase

The code for the data loader is shown below. It uses Jena to parse the wine.rdf and food.rdf files and write it out into a Neo graph database. The Jena parser parses the files into a Collection of Statement objects, and exposes an Iterator to get at them. Each statement is a (subject, predicate, object) Triple, which correspond to the start node, relationship and end node in a graph database.

In keeping with the best practices described in the Neo4J Guide (PDF), I also added a pseudo-node representing the start node (also known as reference node) of the graph, and a pseudo-node for each OWL file. The reference node points to the OWL file pseudo nodes, and each of the file nodes point to the nodes from the statements extracted from that file.

To query the database given a node name, I used Neo's LuceneIndexService to create a lookup table, which points to the Node. In addition, I wanted to assign weights to each relationship, so I added in a property.

// Source: src/main/java/net/sf/jtmt/ontology/graph/loaders/Owl2NeoLoader.java
package net.sf.jtmt.ontology.graph.loaders;

import net.sf.jtmt.ontology.graph.OntologyRelationshipType;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.neo4j.api.core.Direction;
import org.neo4j.api.core.EmbeddedNeo;
import org.neo4j.api.core.NeoService;
import org.neo4j.api.core.NotFoundException;
import org.neo4j.api.core.Relationship;
import org.neo4j.api.core.Transaction;
import org.neo4j.util.index.IndexService;
import org.neo4j.util.index.LuceneIndexService;

import com.hp.hpl.jena.graph.Node;
import com.hp.hpl.jena.graph.Node_URI;
import com.hp.hpl.jena.graph.Triple;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;

/**
 * Parses an OWL RDF file and populates a graph database directly.
 */
public class Owl2NeoLoader {

  private static final String FIELD_ENTITY_NAME = "name";
  private static final String FIELD_ENTITY_TYPE = "type";
  private static final String FIELD_RELATIONSHIP_NAME = "name";
  private static final String FIELD_RELATIONSHIP_WEIGHT = "weight";
  
  private final Log log = LogFactory.getLog(getClass());
  
  private String filePath;
  private String dbPath;
  private String ontologyName;
  private String refNodeName;
  
  public void setFilePath(String filePath) {
    this.filePath = filePath;
  }
  
  public void setDbPath(String dbPath) {
    this.dbPath = dbPath;
  }
  
  public void setOntologyName(String ontologyName) {
    this.ontologyName = ontologyName;
  }
  
  public void setRefNodeName(String refNodeName) {
    this.refNodeName = refNodeName;
  }
  
  public void load() throws Exception {
    NeoService neoService = null;
    IndexService indexService = null;
    try {
      // set up an embedded instance of neo database
      neoService = new EmbeddedNeo(dbPath);
      // set up index service for looking up node by name
      indexService = new LuceneIndexService(neoService);
      // set up top-level pseudo nodes for navigation
      org.neo4j.api.core.Node refNode = getReferenceNode(neoService);
      org.neo4j.api.core.Node fileNode = getFileNode(neoService, refNode);
      // parse the owl rdf file
      Model model = ModelFactory.createDefaultModel();
      model.read("file://" + filePath);
      // iterate through all triples in the file, and set up corresponding
      // nodes in the neo database.
      StmtIterator it = model.listStatements();
      while (it.hasNext()) {
        Statement st = it.next();
        Triple triple = st.asTriple();
        insertIntoDb(neoService, indexService, fileNode, triple);
      }
    } finally {
      if (indexService != null) {
        indexService.shutdown();
      }
      if (neoService != null) {
        neoService.shutdown();
      }
    }
  }

  /**
   * Get the reference node if already available, otherwise create it.
   * @param neoService the reference to the Neo service.
   * @return a Neo4j Node object reference to the reference node.
   * @throws Exception if thrown.
   */
  private org.neo4j.api.core.Node getReferenceNode(NeoService neoService) 
      throws Exception { 
    org.neo4j.api.core.Node refNode = null;
    Transaction tx = neoService.beginTx(); 
    try {
      refNode = neoService.getReferenceNode();
      if (! refNode.hasProperty(FIELD_ENTITY_NAME)) {
        refNode.setProperty(FIELD_ENTITY_NAME, refNodeName);
        refNode.setProperty(FIELD_ENTITY_TYPE, "Thing");
      }
      tx.success();
    } catch (NotFoundException e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
    return refNode;
  }

  /**
   * Creates a single node for the file. This method is called once
   * per file, and the node should not exist in the Neo4j database.
   * So there is no need to check for existence of the node. Once
   * the node is created, it is connected to the reference node.
   * @param neoService the reference to the Neo service.
   * @param refNode the reference to the reference node.
   * @return the "file" node representing the entry-point into the
   * entities described by the current OWL file.
   * @throws Exception if thrown.
   */
  private org.neo4j.api.core.Node getFileNode(NeoService neoService,
      org.neo4j.api.core.Node refNode) throws Exception {
    org.neo4j.api.core.Node fileNode = null;
    Transaction tx = neoService.beginTx();
    try {
      fileNode = neoService.createNode();
      fileNode.setProperty(FIELD_ENTITY_NAME, ontologyName);
      fileNode.setProperty(FIELD_ENTITY_TYPE, "Class");
      Relationship rel = refNode.createRelationshipTo(
        fileNode, OntologyRelationshipType.CATEGORIZED_AS);
      logTriple(refNode, 
        OntologyRelationshipType.CATEGORIZED_AS, fileNode);
      rel.setProperty(
        FIELD_RELATIONSHIP_NAME, 
        OntologyRelationshipType.CATEGORIZED_AS.name());
      rel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 0.0F);
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
    return fileNode;
  }

  /**
   * Inserts selected entities and relationships from Triples extracted
   * from the OWL document by the Jena parser. Only entities which have
   * a non-blank node for the subject and object are used. Further, only
   * relationship types listed in OntologyRelationshipTypes enum are 
   * considered. In addition, if the enum specifies that certain 
   * relationship types have an inverse, the inverse relation is also
   * created here.
   * @param neoService a reference to the Neo service.
   * @param indexService a reference to the Index service (for looking
   * up Nodes by name).
   * @param fileNode a reference to the Node that is an entry point into
   * this ontology. This node will connect to both the subject and object 
   * nodes of the selected triples via a CONTAINS relationship. 
   * @param triple a reference to the Triple extracted by the Jena parser.
   * @throws Exception if thrown.
   */
  private void insertIntoDb(NeoService neoService, 
      IndexService indexService,
      org.neo4j.api.core.Node fileNode, 
      Triple triple) throws Exception {
    Node subject = triple.getSubject();
    Node predicate = triple.getPredicate();
    Node object = triple.getObject();
    if ((subject instanceof Node_URI) &&
        (object instanceof Node_URI)) {
      // get or create the subject and object nodes
      org.neo4j.api.core.Node subjectNode = 
        getEntityNode(neoService, indexService, subject);
      org.neo4j.api.core.Node objectNode =
        getEntityNode(neoService, indexService, object);
      if (subjectNode == null || objectNode == null) {
        return;
      }
      Transaction tx = neoService.beginTx();
      try {
        // hook up both nodes to the fileNode
        if (! isConnected(neoService, fileNode, 
            OntologyRelationshipType.CONTAINS, 
            Direction.OUTGOING, subjectNode)) {
          logTriple(fileNode, 
            OntologyRelationshipType.CONTAINS, subjectNode);
          Relationship rel = fileNode.createRelationshipTo(
            subjectNode, OntologyRelationshipType.CONTAINS);
          rel.setProperty(FIELD_RELATIONSHIP_NAME, 
            OntologyRelationshipType.CONTAINS.name());
          rel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 0.0F);
        }
        if (! isConnected(neoService, fileNode, 
            OntologyRelationshipType.CONTAINS, 
            Direction.OUTGOING, objectNode)) {
          logTriple(fileNode, 
            OntologyRelationshipType.CONTAINS, objectNode);
          Relationship rel = fileNode.createRelationshipTo(
            objectNode, OntologyRelationshipType.CONTAINS);
          rel.setProperty(
            FIELD_RELATIONSHIP_NAME, 
            OntologyRelationshipType.CONTAINS.name());
          rel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 0.0F);
        }
        // hook up subject and object via predicate
        OntologyRelationshipType type = 
          OntologyRelationshipType.fromName(predicate.getLocalName());
        if (type != null) {
          logTriple(subjectNode, type, objectNode);
          Relationship rel = subjectNode.createRelationshipTo(
              objectNode, type);
          rel.setProperty(FIELD_RELATIONSHIP_NAME, type.name());
          rel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 1.0F);
        }
        // create reverse relationship
        OntologyRelationshipType inverseType = 
          OntologyRelationshipType.inverseOf(predicate.getLocalName());
        if (inverseType != null) {
          logTriple(objectNode, inverseType, subjectNode);
          Relationship inverseRel = objectNode.createRelationshipTo(
            subjectNode, inverseType);
          inverseRel.setProperty(
            FIELD_RELATIONSHIP_NAME, inverseType.name());
          inverseRel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 1.0F);
        }
        tx.success();
      } catch (Exception e) {
        tx.failure();
        throw e;
      } finally {
        tx.finish();
      }
    } else {
      return;
    }
  }

  /**
   * Loops through the relationships and returns true if the source
   * and target nodes are connected using the specified relationship
   * type and direction.
   * @param neoService a reference to the NeoService.
   * @param sourceNode the source Node object.
   * @param relationshipType the type of relationship.
   * @param direction the direction of the relationship.
   * @param targetNode the target Node object.
   * @return true or false.
   * @throws Exception if thrown.
   */
  private boolean isConnected(NeoService neoService, 
      org.neo4j.api.core.Node sourceNode,
      OntologyRelationshipType relationshipType, Direction direction,
      org.neo4j.api.core.Node targetNode) throws Exception {
    boolean isConnected = false;
    Transaction tx = neoService.beginTx();
    try {
      for (Relationship rel : sourceNode.getRelationships(
          relationshipType, direction)) {
        org.neo4j.api.core.Node endNode = rel.getEndNode();
        if (endNode.getProperty(FIELD_ENTITY_NAME).equals(
            targetNode.getProperty(FIELD_ENTITY_NAME))) {
          isConnected = true;
          break;
        }
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
    return isConnected;
  }

  private org.neo4j.api.core.Node getEntityNode(NeoService neoService,
      IndexService indexService, Node entity) throws Exception {
    String uri = ((Node_URI) entity).getURI();
    if (uri.indexOf('#') == -1) {
      return null;
    }
    String[] parts = StringUtils.split(uri, "#");
    String type = parts[0].substring(0, parts[0].lastIndexOf('/'));
    Transaction tx = neoService.beginTx();
    try {
      org.neo4j.api.core.Node entityNode = 
        indexService.getSingleNode(FIELD_ENTITY_NAME, parts[1]);
      if (entityNode == null) {
        entityNode = neoService.createNode();
        entityNode.setProperty(FIELD_ENTITY_NAME, parts[1]);
        entityNode.setProperty(FIELD_ENTITY_TYPE, type);
        indexService.index(entityNode, FIELD_ENTITY_NAME, parts[1]);
      }
      tx.success();
      return entityNode;
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
  }
  
  /**
   * Convenience method to log the triple when it is inserted into the
   * database.
   * @param sourceNode the subject of the triple.
   * @param ontologyRelationshipType the predicate of the triple.
   * @param targetNode the object of the triple.
   */
  private void logTriple(org.neo4j.api.core.Node sourceNode, 
      OntologyRelationshipType ontologyRelationshipType, 
      org.neo4j.api.core.Node targetNode) {
    log.info("(" + sourceNode.getProperty(FIELD_ENTITY_NAME) +
      "," + ontologyRelationshipType.name() + 
      "," + targetNode.getProperty(FIELD_ENTITY_NAME) + ")");
  }
}

The relationship types are listed in the OntologyRelationshipType enum below. The types were found manually by first parsing the Statement objects and finding unique relationships. So it is likely that this enum will need to be expanded if other OWL files need to be parsed.

In addition, I also added in inverse relationships which are not available in the OWL file. Here is the code for OntologyRelationshipType.java.

// Source: src/main/java/net/sf/jtmt/ontology/graph/OntologyRelationshipType.java
package net.sf.jtmt.ontology.graph;

import org.neo4j.api.core.RelationshipType;

/**
 * Relationships exposed by the taxonomy.
 */
public enum OntologyRelationshipType implements RelationshipType {
  CATEGORIZED_AS(null, null),  // pseudo-rel
  CONTAINS(null, null),        // pseudo-rel
  ADJACENT_REGION("adjacentRegion", "adjacentRegion"),
  HAS_VINTAGE_YEAR("hasVintageYear", "isVintageYearOf"),
  LOCATED_IN("locatedIn", "regionContains"),
  MADE_FROM_GRAPE("madeFromGrape", "mainIngredient"),
  HAS_FLAVOR("hasFlavor", "isFlavorOf"),
  HAS_COLOR("hasColor", "isColorOf"),
  HAS_SUGAR("hasSugar", "isSugarContentOf"),
  HAS_BODY("hasBody", "isBodyOf"),
  HAS_MAKER("hasMaker", "madeBy"),
  IS_INSTANCE_OF("type", "hasInstance"),
  SUBCLASS_OF("subClassOf", "superClassOf"),
  DISJOINT_WITH("disjointWith", "disjointWith"),
  DIFFERENT_FROM("differentFrom", "differentFrom"),
  DOMAIN("domain", null),
  IS_VINTAGE_YEAR_OF("isVintageYearOf", "hasVintageYear"),
  REGION_CONTAINS("regionContains", "locatedIn"),
  MAIN_INGREDIENT("mainIngredient", "madeFromGrape"),
  IS_FLAVOR_OF("isFlavorOf", "hasFlavor"),
  IS_COLOR_OF("isColorOf", "hasColor"),
  IS_SUGAR_CONTENT_OF("isSugarContentOf", "hasSugar"),
  IS_BODY_OF("isBodyOf", "hasBody"),
  MADE_BY("madeBy", "hasMaker"),
  HAS_INSTANCE("hasInstance", "type"),
  SUPERCLASS_OF("superClassOf", "subClassOf");

  private String name;
  private String inverseName;
  
  OntologyRelationshipType(String name, String inverseName) {
    this.name = name;
    this.inverseName = inverseName;
  }
   
  public static OntologyRelationshipType fromName(String name) {
    for (OntologyRelationshipType type : values()) {
      if (name.equals(type.name)) {
        return type;
      }
    }
    return null;
  }
  
  public static OntologyRelationshipType inverseOf(String name) {
    OntologyRelationshipType rel = fromName(name);
    if (rel != null && rel.inverseName != null) {
      return fromName(rel.inverseName);
    } else {
      return null;
    }
  }
}

The loader operates on a single OWL file at a time. To run it, I use the following JUnit test class.

// Source: src/test/java/net/sf/jtmt/ontology/graph/Owl2NeoLoaderTest.java
package net.sf.jtmt.ontology.graph;

import net.sf.jtmt.ontology.graph.loaders.Owl2NeoLoader;

import org.apache.commons.io.FilenameUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.Test;

/**
 * Test case for Owl2NeoLoader.
 */
public class Owl2NeoLoaderTest {

  private static final String ROOT_NAME = "ConsumableThing";
  
  private final Log log = LogFactory.getLog(getClass());
  
  private static final String[][] SUB_ONTOLOGIES = new String[][] {
    new String[] {"wine.rdf", "Wine"},
    new String[] {"food.rdf", "EdibleThing"}
  };
  
  @Test
  public void testLoading() throws Exception {
    for (String[] subOntology : SUB_ONTOLOGIES) {
      log.info("Now processing " + subOntology[0]);
      Owl2NeoLoader loader = new Owl2NeoLoader();
      loader.setRefNodeName(ROOT_NAME);
      loader.setFilePath(FilenameUtils.concat(
        "/home/sujit/src/jtmt/src/main/resources", subOntology[0]));
      loader.setDbPath("/tmp/neodb");
      loader.setOntologyName(subOntology[1]);
      loader.load();
    }
  }
}

The loader also prints out the triples as it writes them. A partial log (minus the date/time/source data) is shown below.

...
(CorbansPrivateBinSauvignonBlanc,HAS_SUGAR,Dry)
(Dry,IS_SUGAR_CONTENT_OF,CorbansPrivateBinSauvignonBlanc)
(Wine,CONTAINS,Corbans)
(CorbansPrivateBinSauvignonBlanc,HAS_MAKER,Corbans)
(Corbans,MADE_BY,CorbansPrivateBinSauvignonBlanc)
(Wine,CONTAINS,NewZealandRegion)
...

Query Phase

To test out the loading, I used the same queries that I did previously, using JGraphT against a Prevayler backed in-memory graph. I decided to build a query class which encapsulates the Neo4J query code. Here is the code for the query component.

// Source: src/main/java/net/sf/jtmt/ontology/graph/NeoOntologyNavigator.java
package net.sf.jtmt.ontology.graph;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.SortedMap;
import java.util.TreeMap;

import org.apache.commons.collections15.MultiMap;
import org.apache.commons.collections15.multimap.MultiHashMap;
import org.neo4j.api.core.Direction;
import org.neo4j.api.core.EmbeddedNeo;
import org.neo4j.api.core.NeoService;
import org.neo4j.api.core.Node;
import org.neo4j.api.core.Relationship;
import org.neo4j.api.core.ReturnableEvaluator;
import org.neo4j.api.core.StopEvaluator;
import org.neo4j.api.core.Transaction;
import org.neo4j.api.core.Traverser;
import org.neo4j.api.core.Traverser.Order;
import org.neo4j.util.index.IndexService;
import org.neo4j.util.index.LuceneIndexService;

/**
 * Provides methods to locate nodes and find neighbors in the Neo
 * graph database.
 */
public class NeoOntologyNavigator {

  public static final String FIELD_ENTITY_NAME = "name";
  public static final String FIELD_RELATIONSHIP_NAME = "name";
  public static final String FIELD_RELATIONSHIP_WEIGHT = "weight";

  private class WeightedNode {
    public Node node;
    public Float weight;
    public WeightedNode(Node node, Float weight) {
      this.node = node;
      this.weight = weight;
    }
  };
  
  private String neoDbPath;
  
  private NeoService neoService;
  private IndexService indexService;
  
  /**
   * Ctor for NeoOntologyNavigator
   * @param dbPath the path to the neo database.
   */
  public NeoOntologyNavigator(String dbPath) {
    super();
    this.neoDbPath = dbPath;
  }
  
  /**
   * The init() method should be called by client after instantiation.
   */
  public void init() {
    this.neoService = new EmbeddedNeo(neoDbPath);
    this.indexService = new LuceneIndexService(neoService);
  }
  
  /**
   * The destroy() method should be called by client on shutdown.
   */
  public void destroy() {
    indexService.shutdown();
    neoService.shutdown();
  }
  
  /**
   * Gets the reference to the named Node. Returns null if the node
   * is not found in the database.
   * @param nodeName the name of the node to lookup.
   * @return the reference to the Node, or null if not found.
   * @throws Exception if thrown.
   */
  public Node getByName(String nodeName) throws Exception {
    Transaction tx = neoService.beginTx();
    try {
      Node node = indexService.getSingleNode(FIELD_ENTITY_NAME, nodeName);
      tx.success();
      return node;
    } catch (Exception e) {
      tx.failure();
      throw(e);
    } finally {
      tx.finish();
    }
  }

  /**
   * Return a Map of relationship names to a List of nodes connected
   * by that relationship. The keys are sorted by name, and the list
   * of node values are sorted by the incoming relation weights.
   * @param node the root Node.
   * @return a Map of String to Node List of neighbors.
   */
  public Map<String,List<Node>> getAllNeighbors(Node node)
      throws Exception {
    MultiMap<String,WeightedNode> neighbors = 
      new MultiHashMap<String,WeightedNode>();
    Transaction tx = neoService.beginTx();
    try {
      String nodeName = (String) node.getProperty(FIELD_ENTITY_NAME);
      for (Relationship relationship : node.getRelationships()) {
        String relName = 
          (String) relationship.getProperty(FIELD_RELATIONSHIP_NAME);
        Float relWeight = 
          (Float) relationship.getProperty(FIELD_RELATIONSHIP_WEIGHT);
        if (relWeight == 0.0F) {
          continue;
        }
        Node neighborNode = relationship.getEndNode();
        // if self-loop, ignore
        String neighborNodeName = 
          (String) neighborNode.getProperty(FIELD_ENTITY_NAME);
        if (nodeName.equals(neighborNodeName)) {
          continue;
        }
        neighbors.put(relName, new WeightedNode(neighborNode, relWeight));
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
    // sort each collection of weighted nodes
    for (String relName : neighbors.keySet()) {
      List<WeightedNode> nodes = 
        (List<WeightedNode>) neighbors.get(relName);
      Collections.sort(nodes, new Comparator<WeightedNode>() {
        public int compare(WeightedNode w1, WeightedNode w2) {
          return w2.weight.compareTo(w1.weight);
        }
      });
    }
    // finally sort the keys and upcast WeightedNodes to Nodes
    SortedMap<String,List<Node>> neighborMap = 
      new TreeMap<String,List<Node>>();
    for (String relName : neighbors.keySet()) {
      Collection<WeightedNode> weightedNodes = neighbors.get(relName);
      List<Node> nodes = new ArrayList<Node>();
      for (WeightedNode weightedNode : weightedNodes) {
        nodes.add(weightedNode.node);
      }
      neighborMap.put(relName, nodes);
    }
    return neighborMap;
  }
  
  /**
   * Returns a List of neighbor nodes that is reachable from the specified
   * Node. No ordering is done (since the Traverser framework does not seem
   * to allow this type of traversal, and we want to use the Traverser here).
   * @param node reference to the base node.
   * @param type the relationship type.
   * @return a List of neighbor nodes.
   */
  public List<Node> getNeighborsRelatedBy(Node node,
      OntologyRelationshipType type) throws Exception {
    List<Node> neighbors = new ArrayList<Node>();
    Transaction tx = neoService.beginTx();
    try {
      Traverser traverser = node.traverse(
        Order.BREADTH_FIRST, 
        StopEvaluator.DEPTH_ONE, 
        ReturnableEvaluator.ALL_BUT_START_NODE, 
        type, 
        Direction.OUTGOING);
      for (Iterator<Node> it = traverser.iterator(); it.hasNext();) {
        Node neighbor = it.next();
        neighbors.add(neighbor);
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw(e);
    } finally {
      tx.success();
    }
    return neighbors;
  }
}

The query client is represented by the JUnit class shown below. Notice that the query client operates at the abstraction of an application, ie there is no Neo4J code in "client code".

// Source: src/test/java/net/sf/jtmt/ontology/graph/NeoOntologyNavigatorTest.java
package net.sf.jtmt.ontology.graph;

import java.util.List;
import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.neo4j.api.core.Node;

/**
 * Test case for NeoDb Navigator.
 */
public class NeoOntologyNavigatorTest {
  
  private final Log log = LogFactory.getLog(getClass());
  private static final String NEODB_PATH = "/tmp/neodb";
  private static NeoOntologyNavigator navigator;
  
  @BeforeClass
  public static void setupBeforeClass() throws Exception {
    navigator = new NeoOntologyNavigator(NEODB_PATH);
    navigator.init();
  }
  
  @AfterClass
  public static void teardownAfterClass() throws Exception {
    navigator.destroy();
  }
  
  @Test
  public void testWhereIsLoireRegion() throws Exception {
    log.info("query> where is LoireRegion?");
    Node loireRegionNode = navigator.getByName("LoireRegion");
    if (loireRegionNode != null) {
      List<Node> locations = navigator.getNeighborsRelatedBy(
        loireRegionNode, OntologyRelationshipType.LOCATED_IN);
      for (Node location : locations) {
        log.info(
          location.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME));
      }
    }
  }
  
  @Test
  public void testWhatRegionsAreInUsRegion() throws Exception {
    log.info("query> what regions are in USRegion?");
    Node usRegion = navigator.getByName("USRegion");
    if (usRegion != null) {
      List<Node> locations = navigator.getNeighborsRelatedBy(
        usRegion, OntologyRelationshipType.REGION_CONTAINS);
      for (Node location : locations) {
        log.info(
          location.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME));
      }
    }
  }
  
  @Test
  public void testWhatAreSweetWines() throws Exception {
    log.info("query> what are Sweet wines?");
    Node sweetNode = navigator.getByName("Sweet");
    if (sweetNode != null) {
      List<Node> sweetWines = navigator.getNeighborsRelatedBy(
        sweetNode, OntologyRelationshipType.IS_SUGAR_CONTENT_OF);
      for (Node sweetWine : sweetWines) {
        log.info(
          sweetWine.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME));
      }
    }
  }

  @Test
  public void testShowNeighborsForAReislingWine() throws Exception {
    log.info("query> show neighbors for SchlossVolradTrochenbierenausleseRiesling");
    Node rieslingNode = 
      navigator.getByName("SchlossVolradTrochenbierenausleseRiesling");
    Map<String,List<Node>> neighbors = 
      navigator.getAllNeighbors(rieslingNode);
    for (String relType : neighbors.keySet()) {
      log.info("--- " + relType + " ---");
      List<Node> relatedNodes = neighbors.get(relType);
      for (Node relatedNode : relatedNodes) {
        log.info(
          relatedNode.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME));
      }
    }
  }
}

The output of the queries is shown below. As you can see, first three are similar to the MySQL/Prevayler/JGraphT version described in my earlier posts. The last one is a dump of a named node, may be useful if we want to build a browsing tool.

query> where is LoireRegion?
FrenchRegion

query> what regions are in USRegion?
TexasRegion
CaliforniaRegion

query> what are Sweet wines?
WhitehallLanePrimavera
SchlossVolradTrochenbierenausleseRiesling
SchlossRothermelTrochenbierenausleseRiesling

query> show neighbors for SchlossVolradTrochenbierenausleseRiesling?
--- HAS_BODY ---
Full
--- HAS_FLAVOR ---
Moderate
--- HAS_MAKER ---
SchlossVolrad
--- HAS_SUGAR ---
Sweet
--- IS_INSTANCE_OF ---
SweetRiesling
--- LOCATED_IN ---
GermanyRegion

I have barely scratched the surface of the Jena API with this, but I think I have exercised quite a bit of the Neo4J API, and I was quite impressed with the latter. One thing I would have liked to have is support for weighted relationships in the Traverser mechanism, so I could sort the relationships by weight, in case of multiple relationships.

My dataset is too small for me to form any opinion about performance and stability, but now that I am familiar with the API, I plan to use Neo4J to hold a (much) larger dataset to see how it compares against our current architecture of RDBMS and serialized graph.

Salmon Run

Saturday, June 13, 2009

Ontology Rules with Prolog

Motivation

Which Prolog?

Learning Prolog

The Fact Base

Adding Rules

Conclusion

Wednesday, June 03, 2009

A Custom Traverser for Neo4J

Saturday, May 30, 2009

Using Neo4J to load and query OWL ontologies

Posts

Labels

Blogs I Read

About me

My Nerd Rating

Visitor Map

Contact Me