Saturday, May 30, 2009

Using Neo4J to load and query OWL ontologies

I've written previously about modeling, storing and navigating through ontologies (you can see them here, here, here and here). These were all based on ideas on how I could improve upon ontology systems I had previously encountered at work. As I have no formal background in Semantic Web Programming, most of these implementations were based on tools that I was already familiar with or wanted to get familiar with.

I recently bought a book on Semantic Web Programming (see my review on Amazon here), and I must say it opened up a whole new world for me. Among other things, the book has a very good coverage of Jena, a Semantic Web Framework for Java, something I had been meaning to take a look at for a while.

Somewhat unrelated, I also came across Neo4J, a graph database, and it seemed to be a good fit as a data store for an ontology. Prior to this, the ontologies I have seen were stored in a relational database, which was then converted into an in-memory graph, then serialized out to disk using Java serialization for use by applications. This means that the serialized version is a point-in-time snapshot, not a true copy of the ontology. Depending on how frequently the ontology is updated, this may not be a big deal. But if the ontology is stored in a graph database to begin with, then the backend could continue to update the database, and the application would always see the current ontology. Makes things much cleaner in my opinion.

So I decided to take the OWL file for a sample Wine and Food ontology, and parse it using Jena, then load it into the Neo graph database, and run a few queries against it, to familiarize myself with the Jena and Neo APIs. This post is a result of that effort.

Load Phase

The code for the data loader is shown below. It uses Jena to parse the wine.rdf and food.rdf files and write it out into a Neo graph database. The Jena parser parses the files into a Collection of Statement objects, and exposes an Iterator to get at them. Each statement is a (subject, predicate, object) Triple, which correspond to the start node, relationship and end node in a graph database.

In keeping with the best practices described in the Neo4J Guide (PDF), I also added a pseudo-node representing the start node (also known as reference node) of the graph, and a pseudo-node for each OWL file. The reference node points to the OWL file pseudo nodes, and each of the file nodes point to the nodes from the statements extracted from that file.

To query the database given a node name, I used Neo's LuceneIndexService to create a lookup table, which points to the Node. In addition, I wanted to assign weights to each relationship, so I added in a property.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
// Source: src/main/java/net/sf/jtmt/ontology/graph/loaders/Owl2NeoLoader.java
package net.sf.jtmt.ontology.graph.loaders;

import net.sf.jtmt.ontology.graph.OntologyRelationshipType;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.neo4j.api.core.Direction;
import org.neo4j.api.core.EmbeddedNeo;
import org.neo4j.api.core.NeoService;
import org.neo4j.api.core.NotFoundException;
import org.neo4j.api.core.Relationship;
import org.neo4j.api.core.Transaction;
import org.neo4j.util.index.IndexService;
import org.neo4j.util.index.LuceneIndexService;

import com.hp.hpl.jena.graph.Node;
import com.hp.hpl.jena.graph.Node_URI;
import com.hp.hpl.jena.graph.Triple;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;

/**
 * Parses an OWL RDF file and populates a graph database directly.
 */
public class Owl2NeoLoader {

  private static final String FIELD_ENTITY_NAME = "name";
  private static final String FIELD_ENTITY_TYPE = "type";
  private static final String FIELD_RELATIONSHIP_NAME = "name";
  private static final String FIELD_RELATIONSHIP_WEIGHT = "weight";
  
  private final Log log = LogFactory.getLog(getClass());
  
  private String filePath;
  private String dbPath;
  private String ontologyName;
  private String refNodeName;
  
  public void setFilePath(String filePath) {
    this.filePath = filePath;
  }
  
  public void setDbPath(String dbPath) {
    this.dbPath = dbPath;
  }
  
  public void setOntologyName(String ontologyName) {
    this.ontologyName = ontologyName;
  }
  
  public void setRefNodeName(String refNodeName) {
    this.refNodeName = refNodeName;
  }
  
  public void load() throws Exception {
    NeoService neoService = null;
    IndexService indexService = null;
    try {
      // set up an embedded instance of neo database
      neoService = new EmbeddedNeo(dbPath);
      // set up index service for looking up node by name
      indexService = new LuceneIndexService(neoService);
      // set up top-level pseudo nodes for navigation
      org.neo4j.api.core.Node refNode = getReferenceNode(neoService);
      org.neo4j.api.core.Node fileNode = getFileNode(neoService, refNode);
      // parse the owl rdf file
      Model model = ModelFactory.createDefaultModel();
      model.read("file://" + filePath);
      // iterate through all triples in the file, and set up corresponding
      // nodes in the neo database.
      StmtIterator it = model.listStatements();
      while (it.hasNext()) {
        Statement st = it.next();
        Triple triple = st.asTriple();
        insertIntoDb(neoService, indexService, fileNode, triple);
      }
    } finally {
      if (indexService != null) {
        indexService.shutdown();
      }
      if (neoService != null) {
        neoService.shutdown();
      }
    }
  }

  /**
   * Get the reference node if already available, otherwise create it.
   * @param neoService the reference to the Neo service.
   * @return a Neo4j Node object reference to the reference node.
   * @throws Exception if thrown.
   */
  private org.neo4j.api.core.Node getReferenceNode(NeoService neoService) 
      throws Exception { 
    org.neo4j.api.core.Node refNode = null;
    Transaction tx = neoService.beginTx(); 
    try {
      refNode = neoService.getReferenceNode();
      if (! refNode.hasProperty(FIELD_ENTITY_NAME)) {
        refNode.setProperty(FIELD_ENTITY_NAME, refNodeName);
        refNode.setProperty(FIELD_ENTITY_TYPE, "Thing");
      }
      tx.success();
    } catch (NotFoundException e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
    return refNode;
  }

  /**
   * Creates a single node for the file. This method is called once
   * per file, and the node should not exist in the Neo4j database.
   * So there is no need to check for existence of the node. Once
   * the node is created, it is connected to the reference node.
   * @param neoService the reference to the Neo service.
   * @param refNode the reference to the reference node.
   * @return the "file" node representing the entry-point into the
   * entities described by the current OWL file.
   * @throws Exception if thrown.
   */
  private org.neo4j.api.core.Node getFileNode(NeoService neoService,
      org.neo4j.api.core.Node refNode) throws Exception {
    org.neo4j.api.core.Node fileNode = null;
    Transaction tx = neoService.beginTx();
    try {
      fileNode = neoService.createNode();
      fileNode.setProperty(FIELD_ENTITY_NAME, ontologyName);
      fileNode.setProperty(FIELD_ENTITY_TYPE, "Class");
      Relationship rel = refNode.createRelationshipTo(
        fileNode, OntologyRelationshipType.CATEGORIZED_AS);
      logTriple(refNode, 
        OntologyRelationshipType.CATEGORIZED_AS, fileNode);
      rel.setProperty(
        FIELD_RELATIONSHIP_NAME, 
        OntologyRelationshipType.CATEGORIZED_AS.name());
      rel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 0.0F);
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
    return fileNode;
  }

  /**
   * Inserts selected entities and relationships from Triples extracted
   * from the OWL document by the Jena parser. Only entities which have
   * a non-blank node for the subject and object are used. Further, only
   * relationship types listed in OntologyRelationshipTypes enum are 
   * considered. In addition, if the enum specifies that certain 
   * relationship types have an inverse, the inverse relation is also
   * created here.
   * @param neoService a reference to the Neo service.
   * @param indexService a reference to the Index service (for looking
   * up Nodes by name).
   * @param fileNode a reference to the Node that is an entry point into
   * this ontology. This node will connect to both the subject and object 
   * nodes of the selected triples via a CONTAINS relationship. 
   * @param triple a reference to the Triple extracted by the Jena parser.
   * @throws Exception if thrown.
   */
  private void insertIntoDb(NeoService neoService, 
      IndexService indexService,
      org.neo4j.api.core.Node fileNode, 
      Triple triple) throws Exception {
    Node subject = triple.getSubject();
    Node predicate = triple.getPredicate();
    Node object = triple.getObject();
    if ((subject instanceof Node_URI) &&
        (object instanceof Node_URI)) {
      // get or create the subject and object nodes
      org.neo4j.api.core.Node subjectNode = 
        getEntityNode(neoService, indexService, subject);
      org.neo4j.api.core.Node objectNode =
        getEntityNode(neoService, indexService, object);
      if (subjectNode == null || objectNode == null) {
        return;
      }
      Transaction tx = neoService.beginTx();
      try {
        // hook up both nodes to the fileNode
        if (! isConnected(neoService, fileNode, 
            OntologyRelationshipType.CONTAINS, 
            Direction.OUTGOING, subjectNode)) {
          logTriple(fileNode, 
            OntologyRelationshipType.CONTAINS, subjectNode);
          Relationship rel = fileNode.createRelationshipTo(
            subjectNode, OntologyRelationshipType.CONTAINS);
          rel.setProperty(FIELD_RELATIONSHIP_NAME, 
            OntologyRelationshipType.CONTAINS.name());
          rel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 0.0F);
        }
        if (! isConnected(neoService, fileNode, 
            OntologyRelationshipType.CONTAINS, 
            Direction.OUTGOING, objectNode)) {
          logTriple(fileNode, 
            OntologyRelationshipType.CONTAINS, objectNode);
          Relationship rel = fileNode.createRelationshipTo(
            objectNode, OntologyRelationshipType.CONTAINS);
          rel.setProperty(
            FIELD_RELATIONSHIP_NAME, 
            OntologyRelationshipType.CONTAINS.name());
          rel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 0.0F);
        }
        // hook up subject and object via predicate
        OntologyRelationshipType type = 
          OntologyRelationshipType.fromName(predicate.getLocalName());
        if (type != null) {
          logTriple(subjectNode, type, objectNode);
          Relationship rel = subjectNode.createRelationshipTo(
              objectNode, type);
          rel.setProperty(FIELD_RELATIONSHIP_NAME, type.name());
          rel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 1.0F);
        }
        // create reverse relationship
        OntologyRelationshipType inverseType = 
          OntologyRelationshipType.inverseOf(predicate.getLocalName());
        if (inverseType != null) {
          logTriple(objectNode, inverseType, subjectNode);
          Relationship inverseRel = objectNode.createRelationshipTo(
            subjectNode, inverseType);
          inverseRel.setProperty(
            FIELD_RELATIONSHIP_NAME, inverseType.name());
          inverseRel.setProperty(FIELD_RELATIONSHIP_WEIGHT, 1.0F);
        }
        tx.success();
      } catch (Exception e) {
        tx.failure();
        throw e;
      } finally {
        tx.finish();
      }
    } else {
      return;
    }
  }

  /**
   * Loops through the relationships and returns true if the source
   * and target nodes are connected using the specified relationship
   * type and direction.
   * @param neoService a reference to the NeoService.
   * @param sourceNode the source Node object.
   * @param relationshipType the type of relationship.
   * @param direction the direction of the relationship.
   * @param targetNode the target Node object.
   * @return true or false.
   * @throws Exception if thrown.
   */
  private boolean isConnected(NeoService neoService, 
      org.neo4j.api.core.Node sourceNode,
      OntologyRelationshipType relationshipType, Direction direction,
      org.neo4j.api.core.Node targetNode) throws Exception {
    boolean isConnected = false;
    Transaction tx = neoService.beginTx();
    try {
      for (Relationship rel : sourceNode.getRelationships(
          relationshipType, direction)) {
        org.neo4j.api.core.Node endNode = rel.getEndNode();
        if (endNode.getProperty(FIELD_ENTITY_NAME).equals(
            targetNode.getProperty(FIELD_ENTITY_NAME))) {
          isConnected = true;
          break;
        }
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
    return isConnected;
  }

  private org.neo4j.api.core.Node getEntityNode(NeoService neoService,
      IndexService indexService, Node entity) throws Exception {
    String uri = ((Node_URI) entity).getURI();
    if (uri.indexOf('#') == -1) {
      return null;
    }
    String[] parts = StringUtils.split(uri, "#");
    String type = parts[0].substring(0, parts[0].lastIndexOf('/'));
    Transaction tx = neoService.beginTx();
    try {
      org.neo4j.api.core.Node entityNode = 
        indexService.getSingleNode(FIELD_ENTITY_NAME, parts[1]);
      if (entityNode == null) {
        entityNode = neoService.createNode();
        entityNode.setProperty(FIELD_ENTITY_NAME, parts[1]);
        entityNode.setProperty(FIELD_ENTITY_TYPE, type);
        indexService.index(entityNode, FIELD_ENTITY_NAME, parts[1]);
      }
      tx.success();
      return entityNode;
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
  }
  
  /**
   * Convenience method to log the triple when it is inserted into the
   * database.
   * @param sourceNode the subject of the triple.
   * @param ontologyRelationshipType the predicate of the triple.
   * @param targetNode the object of the triple.
   */
  private void logTriple(org.neo4j.api.core.Node sourceNode, 
      OntologyRelationshipType ontologyRelationshipType, 
      org.neo4j.api.core.Node targetNode) {
    log.info("(" + sourceNode.getProperty(FIELD_ENTITY_NAME) +
      "," + ontologyRelationshipType.name() + 
      "," + targetNode.getProperty(FIELD_ENTITY_NAME) + ")");
  }
}

The relationship types are listed in the OntologyRelationshipType enum below. The types were found manually by first parsing the Statement objects and finding unique relationships. So it is likely that this enum will need to be expanded if other OWL files need to be parsed.

In addition, I also added in inverse relationships which are not available in the OWL file. Here is the code for OntologyRelationshipType.java.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Source: src/main/java/net/sf/jtmt/ontology/graph/OntologyRelationshipType.java
package net.sf.jtmt.ontology.graph;

import org.neo4j.api.core.RelationshipType;

/**
 * Relationships exposed by the taxonomy.
 */
public enum OntologyRelationshipType implements RelationshipType {
  CATEGORIZED_AS(null, null),  // pseudo-rel
  CONTAINS(null, null),        // pseudo-rel
  ADJACENT_REGION("adjacentRegion", "adjacentRegion"),
  HAS_VINTAGE_YEAR("hasVintageYear", "isVintageYearOf"),
  LOCATED_IN("locatedIn", "regionContains"),
  MADE_FROM_GRAPE("madeFromGrape", "mainIngredient"),
  HAS_FLAVOR("hasFlavor", "isFlavorOf"),
  HAS_COLOR("hasColor", "isColorOf"),
  HAS_SUGAR("hasSugar", "isSugarContentOf"),
  HAS_BODY("hasBody", "isBodyOf"),
  HAS_MAKER("hasMaker", "madeBy"),
  IS_INSTANCE_OF("type", "hasInstance"),
  SUBCLASS_OF("subClassOf", "superClassOf"),
  DISJOINT_WITH("disjointWith", "disjointWith"),
  DIFFERENT_FROM("differentFrom", "differentFrom"),
  DOMAIN("domain", null),
  IS_VINTAGE_YEAR_OF("isVintageYearOf", "hasVintageYear"),
  REGION_CONTAINS("regionContains", "locatedIn"),
  MAIN_INGREDIENT("mainIngredient", "madeFromGrape"),
  IS_FLAVOR_OF("isFlavorOf", "hasFlavor"),
  IS_COLOR_OF("isColorOf", "hasColor"),
  IS_SUGAR_CONTENT_OF("isSugarContentOf", "hasSugar"),
  IS_BODY_OF("isBodyOf", "hasBody"),
  MADE_BY("madeBy", "hasMaker"),
  HAS_INSTANCE("hasInstance", "type"),
  SUPERCLASS_OF("superClassOf", "subClassOf");

  private String name;
  private String inverseName;
  
  OntologyRelationshipType(String name, String inverseName) {
    this.name = name;
    this.inverseName = inverseName;
  }
   
  public static OntologyRelationshipType fromName(String name) {
    for (OntologyRelationshipType type : values()) {
      if (name.equals(type.name)) {
        return type;
      }
    }
    return null;
  }
  
  public static OntologyRelationshipType inverseOf(String name) {
    OntologyRelationshipType rel = fromName(name);
    if (rel != null && rel.inverseName != null) {
      return fromName(rel.inverseName);
    } else {
      return null;
    }
  }
}

The loader operates on a single OWL file at a time. To run it, I use the following JUnit test class.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Source: src/test/java/net/sf/jtmt/ontology/graph/Owl2NeoLoaderTest.java
package net.sf.jtmt.ontology.graph;

import net.sf.jtmt.ontology.graph.loaders.Owl2NeoLoader;

import org.apache.commons.io.FilenameUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.Test;

/**
 * Test case for Owl2NeoLoader.
 */
public class Owl2NeoLoaderTest {

  private static final String ROOT_NAME = "ConsumableThing";
  
  private final Log log = LogFactory.getLog(getClass());
  
  private static final String[][] SUB_ONTOLOGIES = new String[][] {
    new String[] {"wine.rdf", "Wine"},
    new String[] {"food.rdf", "EdibleThing"}
  };
  
  @Test
  public void testLoading() throws Exception {
    for (String[] subOntology : SUB_ONTOLOGIES) {
      log.info("Now processing " + subOntology[0]);
      Owl2NeoLoader loader = new Owl2NeoLoader();
      loader.setRefNodeName(ROOT_NAME);
      loader.setFilePath(FilenameUtils.concat(
        "/home/sujit/src/jtmt/src/main/resources", subOntology[0]));
      loader.setDbPath("/tmp/neodb");
      loader.setOntologyName(subOntology[1]);
      loader.load();
    }
  }
}

The loader also prints out the triples as it writes them. A partial log (minus the date/time/source data) is shown below.

1
2
3
4
5
6
7
8
...
(CorbansPrivateBinSauvignonBlanc,HAS_SUGAR,Dry)
(Dry,IS_SUGAR_CONTENT_OF,CorbansPrivateBinSauvignonBlanc)
(Wine,CONTAINS,Corbans)
(CorbansPrivateBinSauvignonBlanc,HAS_MAKER,Corbans)
(Corbans,MADE_BY,CorbansPrivateBinSauvignonBlanc)
(Wine,CONTAINS,NewZealandRegion)
...

Query Phase

To test out the loading, I used the same queries that I did previously, using JGraphT against a Prevayler backed in-memory graph. I decided to build a query class which encapsulates the Neo4J query code. Here is the code for the query component.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
// Source: src/main/java/net/sf/jtmt/ontology/graph/NeoOntologyNavigator.java
package net.sf.jtmt.ontology.graph;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.SortedMap;
import java.util.TreeMap;

import org.apache.commons.collections15.MultiMap;
import org.apache.commons.collections15.multimap.MultiHashMap;
import org.neo4j.api.core.Direction;
import org.neo4j.api.core.EmbeddedNeo;
import org.neo4j.api.core.NeoService;
import org.neo4j.api.core.Node;
import org.neo4j.api.core.Relationship;
import org.neo4j.api.core.ReturnableEvaluator;
import org.neo4j.api.core.StopEvaluator;
import org.neo4j.api.core.Transaction;
import org.neo4j.api.core.Traverser;
import org.neo4j.api.core.Traverser.Order;
import org.neo4j.util.index.IndexService;
import org.neo4j.util.index.LuceneIndexService;

/**
 * Provides methods to locate nodes and find neighbors in the Neo
 * graph database.
 */
public class NeoOntologyNavigator {

  public static final String FIELD_ENTITY_NAME = "name";
  public static final String FIELD_RELATIONSHIP_NAME = "name";
  public static final String FIELD_RELATIONSHIP_WEIGHT = "weight";

  private class WeightedNode {
    public Node node;
    public Float weight;
    public WeightedNode(Node node, Float weight) {
      this.node = node;
      this.weight = weight;
    }
  };
  
  private String neoDbPath;
  
  private NeoService neoService;
  private IndexService indexService;
  
  /**
   * Ctor for NeoOntologyNavigator
   * @param dbPath the path to the neo database.
   */
  public NeoOntologyNavigator(String dbPath) {
    super();
    this.neoDbPath = dbPath;
  }
  
  /**
   * The init() method should be called by client after instantiation.
   */
  public void init() {
    this.neoService = new EmbeddedNeo(neoDbPath);
    this.indexService = new LuceneIndexService(neoService);
  }
  
  /**
   * The destroy() method should be called by client on shutdown.
   */
  public void destroy() {
    indexService.shutdown();
    neoService.shutdown();
  }
  
  /**
   * Gets the reference to the named Node. Returns null if the node
   * is not found in the database.
   * @param nodeName the name of the node to lookup.
   * @return the reference to the Node, or null if not found.
   * @throws Exception if thrown.
   */
  public Node getByName(String nodeName) throws Exception {
    Transaction tx = neoService.beginTx();
    try {
      Node node = indexService.getSingleNode(FIELD_ENTITY_NAME, nodeName);
      tx.success();
      return node;
    } catch (Exception e) {
      tx.failure();
      throw(e);
    } finally {
      tx.finish();
    }
  }

  /**
   * Return a Map of relationship names to a List of nodes connected
   * by that relationship. The keys are sorted by name, and the list
   * of node values are sorted by the incoming relation weights.
   * @param node the root Node.
   * @return a Map of String to Node List of neighbors.
   */
  public Map<String,List<Node>> getAllNeighbors(Node node)
      throws Exception {
    MultiMap<String,WeightedNode> neighbors = 
      new MultiHashMap<String,WeightedNode>();
    Transaction tx = neoService.beginTx();
    try {
      String nodeName = (String) node.getProperty(FIELD_ENTITY_NAME);
      for (Relationship relationship : node.getRelationships()) {
        String relName = 
          (String) relationship.getProperty(FIELD_RELATIONSHIP_NAME);
        Float relWeight = 
          (Float) relationship.getProperty(FIELD_RELATIONSHIP_WEIGHT);
        if (relWeight == 0.0F) {
          continue;
        }
        Node neighborNode = relationship.getEndNode();
        // if self-loop, ignore
        String neighborNodeName = 
          (String) neighborNode.getProperty(FIELD_ENTITY_NAME);
        if (nodeName.equals(neighborNodeName)) {
          continue;
        }
        neighbors.put(relName, new WeightedNode(neighborNode, relWeight));
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
    // sort each collection of weighted nodes
    for (String relName : neighbors.keySet()) {
      List<WeightedNode> nodes = 
        (List<WeightedNode>) neighbors.get(relName);
      Collections.sort(nodes, new Comparator<WeightedNode>() {
        public int compare(WeightedNode w1, WeightedNode w2) {
          return w2.weight.compareTo(w1.weight);
        }
      });
    }
    // finally sort the keys and upcast WeightedNodes to Nodes
    SortedMap<String,List<Node>> neighborMap = 
      new TreeMap<String,List<Node>>();
    for (String relName : neighbors.keySet()) {
      Collection<WeightedNode> weightedNodes = neighbors.get(relName);
      List<Node> nodes = new ArrayList<Node>();
      for (WeightedNode weightedNode : weightedNodes) {
        nodes.add(weightedNode.node);
      }
      neighborMap.put(relName, nodes);
    }
    return neighborMap;
  }
  
  /**
   * Returns a List of neighbor nodes that is reachable from the specified
   * Node. No ordering is done (since the Traverser framework does not seem
   * to allow this type of traversal, and we want to use the Traverser here).
   * @param node reference to the base node.
   * @param type the relationship type.
   * @return a List of neighbor nodes.
   */
  public List<Node> getNeighborsRelatedBy(Node node,
      OntologyRelationshipType type) throws Exception {
    List<Node> neighbors = new ArrayList<Node>();
    Transaction tx = neoService.beginTx();
    try {
      Traverser traverser = node.traverse(
        Order.BREADTH_FIRST, 
        StopEvaluator.DEPTH_ONE, 
        ReturnableEvaluator.ALL_BUT_START_NODE, 
        type, 
        Direction.OUTGOING);
      for (Iterator<Node> it = traverser.iterator(); it.hasNext();) {
        Node neighbor = it.next();
        neighbors.add(neighbor);
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw(e);
    } finally {
      tx.success();
    }
    return neighbors;
  }
}

The query client is represented by the JUnit class shown below. Notice that the query client operates at the abstraction of an application, ie there is no Neo4J code in "client code".

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
// Source: src/test/java/net/sf/jtmt/ontology/graph/NeoOntologyNavigatorTest.java
package net.sf.jtmt.ontology.graph;

import java.util.List;
import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.neo4j.api.core.Node;

/**
 * Test case for NeoDb Navigator.
 */
public class NeoOntologyNavigatorTest {
  
  private final Log log = LogFactory.getLog(getClass());
  private static final String NEODB_PATH = "/tmp/neodb";
  private static NeoOntologyNavigator navigator;
  
  @BeforeClass
  public static void setupBeforeClass() throws Exception {
    navigator = new NeoOntologyNavigator(NEODB_PATH);
    navigator.init();
  }
  
  @AfterClass
  public static void teardownAfterClass() throws Exception {
    navigator.destroy();
  }
  
  @Test
  public void testWhereIsLoireRegion() throws Exception {
    log.info("query> where is LoireRegion?");
    Node loireRegionNode = navigator.getByName("LoireRegion");
    if (loireRegionNode != null) {
      List<Node> locations = navigator.getNeighborsRelatedBy(
        loireRegionNode, OntologyRelationshipType.LOCATED_IN);
      for (Node location : locations) {
        log.info(
          location.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME));
      }
    }
  }
  
  @Test
  public void testWhatRegionsAreInUsRegion() throws Exception {
    log.info("query> what regions are in USRegion?");
    Node usRegion = navigator.getByName("USRegion");
    if (usRegion != null) {
      List<Node> locations = navigator.getNeighborsRelatedBy(
        usRegion, OntologyRelationshipType.REGION_CONTAINS);
      for (Node location : locations) {
        log.info(
          location.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME));
      }
    }
  }
  
  @Test
  public void testWhatAreSweetWines() throws Exception {
    log.info("query> what are Sweet wines?");
    Node sweetNode = navigator.getByName("Sweet");
    if (sweetNode != null) {
      List<Node> sweetWines = navigator.getNeighborsRelatedBy(
        sweetNode, OntologyRelationshipType.IS_SUGAR_CONTENT_OF);
      for (Node sweetWine : sweetWines) {
        log.info(
          sweetWine.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME));
      }
    }
  }

  @Test
  public void testShowNeighborsForAReislingWine() throws Exception {
    log.info("query> show neighbors for SchlossVolradTrochenbierenausleseRiesling");
    Node rieslingNode = 
      navigator.getByName("SchlossVolradTrochenbierenausleseRiesling");
    Map<String,List<Node>> neighbors = 
      navigator.getAllNeighbors(rieslingNode);
    for (String relType : neighbors.keySet()) {
      log.info("--- " + relType + " ---");
      List<Node> relatedNodes = neighbors.get(relType);
      for (Node relatedNode : relatedNodes) {
        log.info(
          relatedNode.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME));
      }
    }
  }
}

The output of the queries is shown below. As you can see, first three are similar to the MySQL/Prevayler/JGraphT version described in my earlier posts. The last one is a dump of a named node, may be useful if we want to build a browsing tool.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
query> where is LoireRegion?
FrenchRegion

query> what regions are in USRegion?
TexasRegion
CaliforniaRegion

query> what are Sweet wines?
WhitehallLanePrimavera
SchlossVolradTrochenbierenausleseRiesling
SchlossRothermelTrochenbierenausleseRiesling

query> show neighbors for SchlossVolradTrochenbierenausleseRiesling?
--- HAS_BODY ---
Full
--- HAS_FLAVOR ---
Moderate
--- HAS_MAKER ---
SchlossVolrad
--- HAS_SUGAR ---
Sweet
--- IS_INSTANCE_OF ---
SweetRiesling
--- LOCATED_IN ---
GermanyRegion

I have barely scratched the surface of the Jena API with this, but I think I have exercised quite a bit of the Neo4J API, and I was quite impressed with the latter. One thing I would have liked to have is support for weighted relationships in the Traverser mechanism, so I could sort the relationships by weight, in case of multiple relationships.

My dataset is too small for me to form any opinion about performance and stability, but now that I am familiar with the API, I plan to use Neo4J to hold a (much) larger dataset to see how it compares against our current architecture of RDBMS and serialized graph.

33 comments:

  1. Hi there,
    glad to see you like Neo4j! If you want weighted relationships, maybe attaching a "weight" property to the relevant relationships would be the way to go, and considering that property in your "sorting" traverser?

    Cheers

    /peter

    ReplyDelete
  2. Why annotate methods as being @Tests if they're not tests? Very misleading.

    ReplyDelete
  3. You may also wish to look at Allegrograph (franz.com). It provides real time reasoning of your ontology as well as very fast retrieval times. Also eliminates need for java as all can be accomplished in a declarative manner.

    ReplyDelete
  4. @Peter: First off, thanks to you and others on the neo4j team for giving the world a fine piece of software. I did assign a weight property to my relationship node. However, I couldn't figure out how to build a custom traverser (I was using the 4-arg version with the pluggable components) that would take the relationship weight into account. I will look some more, perhaps get some ideas off your mailing list. Thanks for the comment, good to know that this is approach is possible.

    @Anonymous (comment dated 5/31/09): I use the @Test approach to build my "runners" because of its convenience. These /do/ exercise my code, so in a sense, they are tests, although because they don't assert anything, I guess you could argue that they are not "true" tests. Also, this approach is not completely without precedent - when reading the Lucene in Action book, I found the authors had used the same approach that I was using. Sorry if I misled you, and your point about it not being good form is taken.

    @Anonymous (comment dated 6/1/09): Thanks for the pointer, will look at allegrograph - I am guessing (from the name and its maker) that its Lisp based. So when you say "declarative", I take it you mean Lisp code, right?

    ReplyDelete
  5. Sujit,
    you are right, the Traverser right now is not configurable as to which relationships to traverse next given the current traversal context.
    This is one of the features for the next version of the traversal framework, and it will make your usecase and even random walkers and heuristic traversers (picking different relations to traverse on different runs) possible, so things like http://ripple.fortytwo.net/ can be easily implemented on top of Neo4j directly.
    Feel free to file an issue so we won't forget your usecase!

    Cheers

    /peter

    ReplyDelete
  6. Thanks for the info, Peter. I did find a way to use the Traverser mechanism for my neighbor display problem, by using a custom ReturnEvaluator. Probably not the best way to do it, but it does the job. I've described it here. As regards filing an issue request, I think I will wait and see what you do with the Traverser in the next version, from your description, it should fulfil my requirements.

    ReplyDelete
  7. It seams to be a very cool idea to graph non-graphical struktur.
    I' ll check it out.

    ReplyDelete
  8. madhuri gopal6/27/2010 10:56 AM

    dear sir,
    I am a student of anna university in guindy and I am pursueing a project in semantic web.In your code ,I am not able to understand why you are using both jena and neo4j.Can you not create a graph using jena and query it using the inbuilt sparql itself

    ReplyDelete
  9. madhuri gopal6/27/2010 11:00 AM

    dear mr peter neubauer,
    i have downloaded neo4j and have included the lib files in eclipse and created a graph and printed it.all that is fine! but where is this graph stored?for eg if i open db2 or oracle i can see my tables .Similarly where can i see the graph instance i created?

    ReplyDelete
  10. Hi Madhuri, I am using Jena's OWL parser here to quickly bring it into a data structure I can then load into neo4j. There are two reasons I did not try to do a full Jena based solution.

    The first is performance/control - loading it into neo4j as a graph allows me to issue queries (in code) which may or may not be possible or may be too slow with SparQL.

    The second is size - I believe (and I could be wrong) that Jena will expose to SparQL a in-memory data structure. Because this is just a proof of concept for something much larger, I needed to have the data on disk.

    I am posting your question for Peter Neubauer here, hopefully he will provide a more definitive answer, but basically neo4j stores the db (and lucene index) under the file path you choose when you create it. Check out the Owl2NeoLoaderTest where it is setting the dbPath. To access it similar to an Oracle or DB2 database client, I believe you will have to write your own code (although I haven't been following neo4j for a while, so they may have a client for this already).

    ReplyDelete
  11. Hi there:
    I have never used neo4j but have been using virtuoso. Is there any way one could run a reasoner on data loaded into neo4j?
    Regards
    Kalpana

    ReplyDelete
  12. Hi Kalpana, I don't follow the neo4j list closely anymore, but I seem to remember that there was some work being done to support ontological reasoning (I may be wrong). You may want to ask on the neo4j list.

    ReplyDelete
  13. Thank you Sujit Pal for the link. I did exactly the same and it worked out fine. Even if using a custom return elevator isnt the best solution - it still worked flawlessly :)

    kind regards!

    ReplyDelete
  14. Thanks, Geld! I am planning to use Neo4J again for one of my newer projects, looking forward to all the improvements made to Neo4J since I last saw it.

    ReplyDelete
  15. Hi Sujit,

    Wisely, or unwisely, I am porting your code here into a Rexster Extension. ('coz, I like the REST capabilities, and graph independence of the Tinkerpop stack.)

    I'm at the point of saving predicates and beginning to wonder about your reasons for these constraints:

    "Inserts selected entities and relationships from Triples extracted from the OWL document by the Jena parser.

    "Only entities which have a non-blank node for the subject and object are used.

    "Further, only relationship types listed in OntologyRelationshipTypes enum are considered.

    "In addition, if the enum specifies that certain relationship types have an inverse, the inverse relation is also created here."

    Would you mind explaining, please?

    Splendid job (and blog!) by the way. A great leg up for me in what I'm trying to do.

    Thanks,
    Hasan

    ReplyDelete
  16. Thanks Hasan. Don't know much about Rexster/Tinkerpop, so good luck on that fwiw, but here are the answers to your questions:

    > Only entities which have a non-blank node for the subject and object are used.
    This is a relationship with one or both vertices undefined. Its been a while, but I believe I was getting these from JENA, so needed to filter it out.

    > Further, only relationship types listed in OntologyRelationshipTypes enum are considered.
    As before, I think JENA added some internal relationships which I did not care about, so this only returns the relationships I care about.

    > In addition, if the enum specifies that certain relationship types have an inverse, the inverse relation is also created here.
    Sometimes it makes sense to do the inverse relationship also, such as (in this case), vineyards in a region, you may want to know what vineyards are in a region, and conversely what region a vineyard is in.

    ReplyDelete
  17. Thank you very much for your work.

    I am pretty new to ontolgies. Since I am using neo4j on a current project and want to extend it by using a music ontology I am very happy that Google found your work after my search request!

    I am very sure that it will save me quite a lot of work!

    ReplyDelete
  18. Thanks, René, glad the post helped you.

    ReplyDelete
  19. Sujit. Thanks!

    I only just today saw your reply a month late, sorry. (I must not have clicked "Email follow-up")

    I have morfed your "Using Neo4J to load and query OWL ontologies" into "Using Tinkerpop to load and query OWL ontologies"

    Since there is still a lot of your original code, I credit you here :

    https://github.com/martinhbramwell/Monetary-Ontology-Walkabout/tree/master/rexster/extension/example

    You may be interested that I created a crude little RDF_Analyzer that collects and categorizes triples according to Node type. It helped me figure out some of your design decisions.

    I've also published really detailed instructions for creating a virtual private server for continuous integration and test of Rexster Extensions, using Jenkins/Hudson & Fitnesse.

    https://github.com/martinhbramwell/RexExt_ci/wiki

    Regards,
    Hasan

    ReplyDelete
  20. Hi Sandeep, thanks for the kind words, much appreciated. To answer your question, I think your workflow looks very similar to what I did. Protege can save ontologies in RDF format, from which you can use code similar to the one I described to parse out triples using Jena and load them into Neo4J. My implementation used Neo4J in embedded mode but more recent versions (I checked out version 5 about 1.5 years ago) allows REST/HTTP access which may be a better way to do this now.

    ReplyDelete
  21. hi sir,
    i am working on ontology enrichment from unstructured text.
    i am using nltk,jena.
    want to add some new concepts to existing ontology to their correct place.
    can you share some source code for this
    waiting for your reply

    ReplyDelete
  22. Hi, sorry about the delay in responding. As you can see, this is a pretty old post, and Neo4J has changed a lot since that time. Based on your post, I believe what you are looking for should be pretty straightforward using the Neo4J API, but I could be wrong. You can post more details, although I think you may have better luck for a good answer on the Neo4J Mailing list.

    ReplyDelete
  23. Hi Sujit,

    Very wonderful post. This is one fine post I could find on internet related to ontology.

    I need your help:

    I have a very huge owl file and I want to import it into neo4j. I getting stuck in this process because I have less programming experience. Could you please help me by providing some screen shot for the respective process.

    Looking forward to hearing from you.

    Thanks,
    -km-



    ReplyDelete
  24. Thanks for your kind words, Kasif. I really don't know what screenshot will help you, but I think these insights may help you build your own parser and Neo4J loader - JENA can be used to parse an OWL file into a Iterable sequence of triples of (Subject,Predicate,Object). Of these the Subject and Object would represent entities in a relationship (or vertices in a graph), and the Predicate would represent the relationhip binding the entities (or edge in the graph). So when loading each triple, you can check for existence of each entity in the triple and if not insert it and join the two entities. For large OWL files, it may be preferable to do it in two passes - first pass to insert all entities and second pass to relate them.

    ReplyDelete
  25. Excellent post. Thanks for sharing. But the link to the Food & Wine Ontology seems to be broken.

    ReplyDelete
  26. Thank you and you are welcome. Thanks for pointing out the bad links, looks like schemaweb decided to put these files behind authentication. Here are alternative URLs for food.rdf and wine.rdf files (although caveat emptor - I did not test these apart from downloading and eyeballing).

    ReplyDelete
  27. Thanks for this article. But sir, I would like to know, in Jena they expose a Sparql endpoint through which we can query semantic relationships, but using ne04j, is it possible?

    My requirement is to store the graph on a server and then use an endpoint to query. Is this possible using this approach?

    ReplyDelete
  28. You are welcome Krishnakripa. This post is quite old and Neo4j has progressed significantly since then. Neo4j server now offers a SPARQL endpoint, more usage details here. In fact, you don't need to use the approach I've described in the blog post, you can push data into it using Cypher (Neo4j query language) over JSON/HTTP and query it the same way.

    ReplyDelete
  29. Hi Sujit, I am a researcher working with ontologies and was interested in using it with Neo4j and fortunately I found your blogpost to start-with. I was working on the code and found some libraries have been deprecated by neo4j in the recent versions such as org.neo4j.api.core.* is no more available after 2009. Same with EmbeddedNeo and NeoService, they have also been deprecated. Instead of using these I used GraphDatabaseService and GraphDatabaseFactory to set the connection and everything. But as in your code line 102, neoService.getReferenceNode() has also been deprecated and I am not able to find my way out. It would be incredible of you if you could help me out with this. Thanks in advance and truely a great job writing the article :).

    ReplyDelete
  30. Thanks for the kind words Manpreet, I'm glad my post was helpful. As you have pointed out, the post is old and much of the Neo4j API I had used here is deprecated. I believe people still use Neo4j in embedded mode, but its considered more of an "expert level" thing (ie you are expected to keep up with API changes). I would suggest using Neo4j in server mode with the Cypher query language instead - I have used it (in early 2014) to load and query data, which may be helpful as guides. Loading was done using Michael Hunger's batch import tool, so I didn't write any Cypher code for loading, but you can read the code for the tool in Github if you would prefer to do it yourself.

    ReplyDelete
  31. I am attempting to create a full neo4j implementation of the jena api in github. If anyone is interested contact me..

    ReplyDelete
  32. Hi Sujit,Great article.Just what I needed right now.
    Also ,Would you be kind enough to post the JAR files also.The new JAR doesn't contain these classes I guess.

    Thank you.

    ReplyDelete
  33. Thank you, glad it helped. Code for this blog is available in my JTMT project on Sourceforge, including all the JARs I needed (I guess this was during my Ant to Maven transition phase).

    ReplyDelete

Comments are moderated to prevent spam.