Wednesday, June 03, 2009

A Custom Traverser for Neo4J

In my previous post, I used Jena to parse some sample OWL files, store the triples in Neo4J's graph database, and then query the database with the Neo4J API. Querying involves locating a given node in the graph, then navigating along one or more known relationship types. Neo4J has an elegant Traverser (see Javadocs) mechanism that allows you to specify traversal properties as function objects in the Node.traverse() call, and I was able to use this for my NeoOntologyNavigator.getNeighborsRelatedBy() method.

I also wanted to build something along the lines of a graph browser. This involves locating a node, and listing out all its immediate neighbors sorted (descending) by relationship weights. This could be used to power a web-based ontology browser, where each of the neighbor nodes would be hyperlinked, so clicking on one of these would show you to the neighbors of that node.

I wasn't able to figure out how to use the Traverser API to do this the first time around, so I went with the manual approach (see NeoOntologyNavigator.getAllNeighbors() in the previous post. However, as Peter Neubauer initially hinted at, it is possible to use the Traverser to do what I want, albeit in a slightly convoluted way, which is described below.

One caveat - Peter later responded to my reply to his comment, saying that direct usage of the Traverser mechanism to do what I want is indeed not possible in the current version (1.0-b8), but such a mechanism is planned in a future release of Neo4J. So you probably don't want to read too much into this post, the code below is perhaps at best a workaround for the current Neo4J version.

The "custom" part of my Traverser is really a custom ReturnableEvaluator. Each node that is traversed by Node.traverse() is passed to the ReturnableEvaluator to determine if the node should be included in the List of traversed Node returned by the Traverser's Iterator. So our custom ReturnableEvaluator checks to see if this node is "valid" for inclusion in the browse results (ie, no nodes navigable by weightless relationships and no self loops), and if valid, accumulates the Node into an internal data structure and returns true. Once the traversal is complete, the internal data structure is queried to yield a Map of List of Nodes, keyed by relationship name. Here is the code for the custom ReturnableEvaluator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
// Source: src/main/java/net/sf/jtmt/ontology/graph/BrowserReturnableEvaluator.java
package net.sf.jtmt.ontology.graph;

import java.util.ArrayList;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

import org.neo4j.api.core.Node;
import org.neo4j.api.core.Relationship;
import org.neo4j.api.core.ReturnableEvaluator;
import org.neo4j.api.core.TraversalPosition;

/**
 * Returnable Evaluator implementation that stores traversed nodes
 * in a data structure which is available to the client.
 */
public class BrowserReturnableEvaluator implements ReturnableEvaluator {

  private Node startNode;
  private TreeMap<String,ArrayList<WeightedNode>> neighbors;
  
  private class WeightedNode implements Comparable<WeightedNode> {
    public Node node;
    public Float weight;
    
    public WeightedNode(Node node, Float weight) {
      this.node = node;
      this.weight = weight;
    }
    
    public int compareTo(WeightedNode that) {
      return (that.weight.compareTo(this.weight));
    }
  };

  public BrowserReturnableEvaluator(Node startNode) {
    this.startNode = startNode;
    this.neighbors = 
      new TreeMap<String,ArrayList<WeightedNode>>();
  }
  
  public boolean isReturnableNode(TraversalPosition pos) {
    // if related to self, don't include in traversal results
    Node currentNode = pos.currentNode();
    if (startNode.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME).equals(
      currentNode.getProperty(NeoOntologyNavigator.FIELD_ENTITY_NAME))) {
      return false;
    }
    // if relationship weight is 0.0F, don't include in traversal results
    Relationship lastRel = pos.lastRelationshipTraversed();
    Float relWeight = (Float) lastRel.getProperty(
      NeoOntologyNavigator.FIELD_RELATIONSHIP_WEIGHT);
    if (relWeight <= 0.0F) {
      return false;
    }
    String relName = (String) lastRel.getProperty(
      NeoOntologyNavigator.FIELD_RELATIONSHIP_NAME);
    // accumulate into our neighbor data structure
    ArrayList<WeightedNode> nodes;
    if (neighbors.containsKey(relName)) {
      nodes = neighbors.get(relName);
    } else {
      nodes = new ArrayList<WeightedNode>();
    }
    nodes.add(new WeightedNode(currentNode, relWeight));
    neighbors.put(relName, nodes);
    // include in traversal results
    return true;
  }

  public Map<String,List<Node>> getNeighbors() {
    Map<String,List<Node>> neighborsMap = 
      new LinkedHashMap<String,List<Node>>();
    for (String relName : neighbors.keySet()) {
      List<WeightedNode> weightedNodes = neighbors.get(relName);
      Collections.sort(weightedNodes);
      List<Node> relatedNodes = new ArrayList<Node>();
      for (WeightedNode weightedNode : weightedNodes) {
        relatedNodes.add(weightedNode.node);
      }
      neighborsMap.put(relName, relatedNodes);
    }
    return neighborsMap;
  }
}

The new version of NeoOntologyNavigator.getAllNeighbors() is shown below. It uses the new custom ReturnableEvaluator to do the traversal. Since we want all relationship types to be traversed, we pass them all to the Node.traverse() method call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Source: src/main/java/net/sf/jtmt/ontology/graph/NeoOntologyNavigator.java
...
public class NeoOntologyNavigator {
  ...
  /**
   * Return a Map of relationship names to a List of nodes connected
   * by that relationship. The keys are sorted by name, and the list
   * of node values are sorted by the incoming relation weights.
   * @param node the root Node.
   * @return a Map of String to Node List of neighbors.
   */
  public Map<String,List<Node>> getAllNeighbors(Node node)
      throws Exception {
    BrowserReturnableEvaluator browserReturnableEvaluator = 
      new BrowserReturnableEvaluator(node);
    Transaction tx = neoService.beginTx();
    try {
      // set up the data structure for all outgoing relationships
      OntologyRelationshipType[] relTypes = 
        OntologyRelationshipType.values();
      Object[] typeAndDirection = new Object[relTypes.length * 2];
      for (int i = 0; i < typeAndDirection.length; i++) {
        if (i % 2 == 0) {
          // relationship type slot
          typeAndDirection[i] = relTypes[i / 2];
        } else {
          // direction slot
          typeAndDirection[i] = Direction.OUTGOING;
        }
      }
      Traverser traverser = node.traverse(Order.BREADTH_FIRST, 
        StopEvaluator.DEPTH_ONE, 
        browserReturnableEvaluator, 
        typeAndDirection);
      for (Iterator<Node> it = traverser.iterator(); it.hasNext();) {
        // just eat up the nodes returned by the traverser, we are
        // really interested in the data structure.
        it.next();
      }
      // get at the accumulated data structure and return it
      tx.success();
      return browserReturnableEvaluator.getNeighbors();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
  }
  ...
}

And this yields the following identical output for our getAllNeighbors() test case described in the previous post, like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
query> show neighbors for SchlossVolradTrochenbierenausleseRiesling
--- HAS_BODY ---
Full
--- HAS_FLAVOR ---
Moderate
--- HAS_MAKER ---
SchlossVolrad
--- HAS_SUGAR ---
Sweet
--- IS_INSTANCE_OF ---
SweetRiesling
--- LOCATED_IN ---
GermanyRegion

The data in the graph db does not exercise the custom traversal code fully, so I decided to run it against a test graph of one node with weighted relationships to a bunch of other nodes. The test case is inspired by Burger King's attempt to pair soft drinks with their burgers, which I also noticed on a recent trip there with the kids.

Here is the test case that builds up the database and traverses it with the custom ReturnableEvaluator.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
// Source: src/test/java/net/sf/jtmt/ontology/graph/BrowserReturnableEvaluatorTest.java
package net.sf.jtmt.ontology.graph;

import java.util.Iterator;
import java.util.List;
import java.util.Map;

import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.neo4j.api.core.Direction;
import org.neo4j.api.core.EmbeddedNeo;
import org.neo4j.api.core.NeoService;
import org.neo4j.api.core.Node;
import org.neo4j.api.core.Relationship;
import org.neo4j.api.core.RelationshipType;
import org.neo4j.api.core.StopEvaluator;
import org.neo4j.api.core.Transaction;
import org.neo4j.api.core.Traverser;
import org.neo4j.api.core.Traverser.Order;

/**
 * Test to demonstrate sorting by relationship weights.
 */
public class BrowserReturnableEvaluatorTest {

  private static final Object[][] QUADS = new Object[][] {
    new Object[] {"coke", RelTypes.GOES_WITH, 10.0F, "whopper"},
    new Object[] {"coke", RelTypes.GOES_WITH, 10.0F, "doubleWhopper"},
    new Object[] {"coke", RelTypes.GOES_WITH, 5.0F, "tripleWhopper"},
    new Object[] {"coke", RelTypes.HAS_INGREDIENTS, 10.0F, "water"},
    new Object[] {"coke", RelTypes.HAS_INGREDIENTS, 9.0F, "sugar"},
    new Object[] {"coke", RelTypes.HAS_INGREDIENTS, 2.0F, "carbonDioxide"},
    new Object[] {"coke", RelTypes.HAS_INGREDIENTS, 5.0F, "secretRecipe"}
  };
  
  private enum RelTypes implements RelationshipType {
    GOES_WITH,
    HAS_INGREDIENTS
  };
  
  private static NeoService neoService;
  private static Node coke;
  
  @BeforeClass
  public static void setupBeforeClass() throws Exception {
    // load up the test data
    neoService = new EmbeddedNeo("/tmp/neotest");
    Transaction tx = neoService.beginTx();
    try {
      // drink nodes
      coke = neoService.createNode();
      coke.setProperty("name", "coke");
      for (Object[] quad : QUADS) {
        Node objectNode = neoService.createNode();
        objectNode.setProperty("name", (String) quad[3]);
        Relationship rel = 
          coke.createRelationshipTo(objectNode, (RelationshipType) quad[1]);
        rel.setProperty("name", ((RelationshipType) quad[1]).name());
        rel.setProperty("weight", (Float) quad[2]);
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
  }
  
  @AfterClass
  public static void teardownAfterClass() throws Exception {
    if (neoService != null) {
      neoService.shutdown();
    }
  }
  
  @Test
  public void testCustomEvaluator() throws Exception {
    Transaction tx = neoService.beginTx();
    try {
      BrowserReturnableEvaluator customReturnEvaluator = 
        new BrowserReturnableEvaluator(coke);
      Traverser traverser = coke.traverse(
        Order.BREADTH_FIRST, 
        StopEvaluator.DEPTH_ONE, 
        customReturnEvaluator, 
        RelTypes.GOES_WITH, Direction.OUTGOING, 
        RelTypes.HAS_INGREDIENTS, Direction.OUTGOING);
      for (Iterator<Node> it = traverser.iterator(); it.hasNext();) {
        it.next();
      }
      Map<String,List<Node>> neighbors =
        customReturnEvaluator.getNeighbors();
      for (String relName : neighbors.keySet()) {
        System.out.println("-- " + relName + " --");
        List<Node> relatedNodes = neighbors.get(relName);
        for (Node relatedNode : relatedNodes) {
          System.out.println(relatedNode.getProperty("name"));
        }
      }
      tx.success();
    } catch (Exception e) {
      tx.failure();
      throw e;
    } finally {
      tx.finish();
    }
  }
}

And the output. I was checking specifically for (1) whether all related nodes are shown, (2) whether the output is sorted by relationship name, and (3) whether the related nodes are ordered correctly by relationship weight. As you can see, it does.

1
2
3
4
5
6
7
8
9
-- GOES_WITH --
whopper
doubleWhopper
tripleWhopper
-- HAS_INGREDIENTS --
water
sugar
secretRecipe
carbonDioxide

It took me a while to figure out how to use the Traverser mechanism to solve the problem described, so hopefully I've saved you some time if you have a similar problem. With the upcoming enhancements to the Traverser API as described by Peter in the comments on the previous post, this approach may not be needed in the future. Also, the approach is probably not ideal, since the idea behind a Traverser is to traverse rather than accumulate. But it may not be a problem if your graph is not too dense, and you want to solve a similar problem with the current version of Neo4J.

Of course, I am by no means an expert on Neo4J, so if you have ideas on achieving the same result in a simpler way, would love to hear from you.

2 comments:

  1. Hi there,
    great follow up! Could we reference this from the Neo4j Wiki? It's a great example of a custom traverser, and if you post it on the Neo4j user mailing list, you might get some feedback on an even tighter traverser ... how is your impression on the API and usability of Neo4j? Any suggestions of how to make it better?

    Cheers

    /peter

    ReplyDelete
  2. Thanks, Peter, and yes, feel free to reference it from the Neo4J wiki. I will post the code on the mailing list like you suggested. As for impressions, I like what I have used so far (core and index-utils) - small, clean and easy to understand. I still want to check out how it scales to large volumes (~10M nodes, with ~20 relationships per node on average), speed and robustness wise. I don't have any suggestions at the moment, but I hope to have some as I explore the API more.

    ReplyDelete

Comments are moderated to prevent spam.