Saturday, January 19, 2008

Externalizing Database Access with a ResultSet Iterator

I recently needed to structure some code such that a query against a database table and a sequential read from a (structured) flat file provided the same interface to client code, something like this:

1
2
3
4
public interface ISource {
  ...
  public SomeObject getNextDocument();
}

This API is a very natural fit for the flat file, since the implementation can hold a reference to the current position in the file advance it with each invocation of getNextDocument(). However, when the ISource implementation happens to be a SQL query running against a database table, things get a little hairy.

The classic JDBC data access pattern requires us to do all our processing within the ResultSet processing loop, and/or accumulate a Collection of objects for later use, something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
List<SomeObject> myObjects = new ArrayList<SomeObject>();
Connection conn = dataSource.getConnection();
PreparedStatement ps = conn.prepareStatement("select * from mytable");
ResultSet rs = ps.executeQuery();
while (rs.next()) {
  SomeObject myObject = doSomethingWithResultSet(rs);
  myObjects.add(myObject);
}
rs.close();
ps.close();

The basic usage pattern for the Spring JdbcTemplate is the queryForList() method, which simply allows accumulation of results for later use. The result of this call is a List of Maps, where each Map represents a row of database results.

1
2
3
4
5
6
7
8
List<SomeObject> myObjects = new ArrayList<SomeObject>();
List<Map<String,Object>> rows = jdbcTemplate.queryForList("select * from mytable");
for (Map<String,Object> row : rows) {
  SomeObject myObject = new SomeObject();
  myObject.setMyField(row.get("my_field"));
  ...
  myObjects.add(myObject);
}

JdbcTemplate also allows processing the individual rows inline, using a RowMapper anonymous inner class, something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
List<SomeObject> myObjects = jdbcTemplate.query(
  "select * from mytable",
  new RowMapper() {
    public Object mapRow(ResultSet rs, int rowNum) throws SQLException {
      SomeObject myObject = new SomeObject();
      myObject.setMyField(rs.getString("my_field"));
      ...
      return myObject;
    }
  });

None of these solutions were what I was after, however. What I needed was a way to tell a method to return me the "next row from the database" and hold a reference internally until I invoke the method again. Iterators came to mind almost instantly, but my first attempt was this rather pathetic one, of holding a private reference to the Iterator of the List object generated from a call to jdbcTemplate.queryForList().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
public class MyFirstPatheticAttempt implements ISource {
  private String sql; // injected from configuration
  private Iterator it;
  ...
  public open() {
    List<Map<String,Object>> rows = jdbcTemplate.queryForList(sql);
    it = rows.iterator();
  }

  public SomeObject getNextObject() {
    if (it.hasNext()) {
      Map<String,Object> row = it.next();
      SomeObject myObject = new SomeObject();
      myObject.setMyField(row.get("my_field"));
      ...
      return myObject;
    }
    return null;
  }
  ...
}

Sure, it satisfied the ISource interface, but it felt horribly wrong. Looking up the Iterator Pattern on wikipedia led me to the Generator Pattern, which was basically what I was after. Turns out that Java does not have the "yield return" construct that is required to support this pattern, but C# and Python does, so perhaps its just a matter of time before it pops up here. In any case, it wasn't available now, so that did not solve my problem, so I kept looking.

I finally came across Ryan Brase's article "Deciding between iterators and lists for returned values in Java" on TechRepublic which had an example of what I needed to do (Listing 2 in the article). Unlike his example, however, my code would not know the SQL query it would be executing, so it would need to return the entire record in some way, without having to explicitly list them in the code.

Fortunately, the folks at the Apache commons-dbutils project have built a ResultSetIterator which is an Iterator implementation whose constructor takes a reference to a ResultSet, and which I gratefully re-used.

So here is what I finally came up with. Although it uses plain old JDBC, it is fairly clean and compact, and does the job of allowing the database access to be externalized into a method call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class MyFinalAttempt implements ISource {
  private String sql; // injected from configuration
  private Iterator it;
  ...
  public open() {
    Connection conn = ds.getConnection();
    PreparedStatement ps = conn.prepareStatement(sql);
    ResultSet rs = ps.executeQuery();
    it = new ResultSetIterator(rs);
  }

  public SomeObject getNextObject() {
    if (it.hasNext()) {
      Object[] row = it.next();
      SomeObject myObject = new SomeObject();
      myObject.setMyField(row[0]);
      ...
      return myObject;
    }
    return null;
  }
  ...
}

The only thing I did not quite like was that the ResultSetIterator.next() call returned an Object[] row. I would have liked it to return a Map<String,Object> instead, although that shortcoming can be worked around fairly easily in application code. The other thing I wanted to do was to use Spring JdbcTemplate, since that way a lot of resource management issues are taken care of automatically, but I could not figure out how to get a handle on the ResultSet.

If anyone has figured out how to get this pattern going using Spring, I would really like to know. Also, not sure how common such a use-case is, but if you have faced it and solved it, or you think my approach is lame and you have a better one, I would appreciate hearing from you.

Update Jan 26 2008

When I was trying to figure out various approaches to solve this problem, I came across a Java based "yield return" solution on Chaotic Java. It provides an abstract Yielder class which implements the Iterable interface. However, it requires bytecode instrumentation using ASM. The resulting code is quite elegant, something like this, although I could not make it work.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  public Iterable<SomeObject> getSomeObjects() {
    Connection conn = ds.getConnection();
    PreparedStatement ps = conn.prepareStatement(sql);
    final ResultSet rs = ps.executeQuery();
    return new Yielder<SomeObject>() {
      protected void yieldNextCore() {
        while (rs.next()) {
          SomeObject obj = new SomeObject();
          obj.setName(rs.getString(1));
          ...
          yieldReturn(obj);
        }
        yieldBreak();
      }
    }
  }

  @Test
  public void processObjects() {
    for (SomeObject obj : getSomeObjects()) {
      // do something with obj
    }
  }

To run it, a -javaagent parameter set to the path to the yielder.jar file needs to be passed to the Java command. I was setting this in the configuration/property for the maven-surefire-plugin and calling this through a JUnit test. I was using asm-all-3.3.0 (downloaded from the yielder SVN repository). From my tests, I see that the first call to getSomeObject() results in all the rows being iterated. Obviously I am doing something wrong, would appreciate any pointers from readers if you have gotten this to work.

2 comments (moderated to prevent spam):

Anonymous said...

The Yielder library has some known bugs, unfortunately.. It might not be You that is doing something wrong. :)

If you'd like to discuss the bug, you can either post about it on the google group, the project's issues list, or just my post about it. Thanks!

Sujit Pal said...

Thanks Aviad, I have posted to the Yielder Google Group with details. Thanks for building and sharing this, I am hoping that this functionality will make it in to Java at some point, similar to Doug Lea's concurrency stuff.