Saturday, July 14, 2007

Annotation Driven Object Persistence with BerkeleyDB

Recently I added functionality to an application that increased its memory footprint considerably. This was because the original application stored its data in data structures in memory for performance, so the new stuff I added had to inter-operate with these data structures, so I did the same. For a while, I was getting the dreaded Out Of Memory Exceptions (OOMEs), but it went away after I replaced a MultiMap like structure (really a HashMap<String,List<String>>) with a plain Java HashMap.

However, that one afternoon of tracking down the OOME set me thinking seriously about whether it may be better to use something like BerkeleyDb as my data store. It is not as fast as in-memory data structures, but it is a lot faster than disk based SQL databases such as MySQL or Oracle. Moreover, it will attempt to keep as much of the data in memory as possible, swapping out to disk files when it cannot. In the past, I had run performance tests between some in-memory databases, and HSQLDB actually came out on top, but I was using BerkeleyDB version 2.1.30 (from Sleepycat before it was acquired by Oracle, I think). This time I decided to use version 3.1.0, the latest available from Oracle's website.

To get up to speed with BerkeleyDB, I decided to create a DAO that persisted a data structure representing a user's preferences. The session object will be keyed off by the userId for registered and logged-in users, and a temporary id built off the user's IP address and user-agent string for other users.

One of the advantages touted for BerkeleyDB is the absence of an SQL parsing layer. This makes it much faster than the other databases, but it also leads to having to write more code. One of the things I did not like about BerkeleyDB in the past is that if you were persisting anything more complicated than a String, you would need to write the serialization and deserialization code to convert the object to and from a byte stream. However, BerkeleyDB-JE 3.1 has a new Direct Persistence Layer (DPL) which generates these for you dynamically. The programmer just has to annotate the class to be persisted and the DPL takes care of the rest. I used the DPL for this user preference DAO example.

For our application, we first define the UserPrefsEntity bean. We need to annotate the class itself as an @Entity, the userId as a @PrimaryKey, and the updated timestamp as a @SecondaryKey. In addition, the DPL framework requires a public constructor with the primary key field as the argument, and a private null (no-args) constructor. Getters and setters for the fields are optional, but you probably need them in the DAO, so I would just put them in and remove them later if they are not used. Here is the code for the bean.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import java.util.Map;
import java.util.TreeMap;

import com.sleepycat.persist.model.Entity;
import com.sleepycat.persist.model.PrimaryKey;
import com.sleepycat.persist.model.Relationship;
import com.sleepycat.persist.model.SecondaryKey;

/**
 * Entity representing a User session object.
 */
@Entity
public class UserPrefsEntity {

  @PrimaryKey private String userId;
  
  @SecondaryKey(relate=Relationship.ONE_TO_ONE) private long updatedMillis;
  
  private Map<String,String> prefs = new TreeMap<String,String>();
  
  public UserPrefsEntity(String userId) {
    this.userId = userId;
  }
  
  private UserPrefsEntity() {
    super();
  }
  
  public String getUserId() {
    return userId;
  }
  
  public void setUpdatedMillis(long updatedMillis) {
    this.updatedMillis = updatedMillis;
  }
  
  public long getUpdatedMillis() {
    return updatedMillis;
  }
  
  public Map<String,String> getPrefs() {
    return prefs;
  }
  
  public void setPrefs(Map<String,String> prefs) {
    this.prefs.clear();
    this.prefs.putAll(prefs);
  }
}

The DAO provides methods to operate on the bean. BerkeleyDB allows you to reference data in it using PrimaryIndex and SecondaryIndex accessors. These accessors, along with the Environment and EntityStore objects, are all declared in the init() method. The global objects are destroyed in the corresponding destroy() method. Since I use Spring, I will make sure that the DAO's bean definition has init-method and destroy-method attributes set to "init" and "destroy" respectively. Non-Spring code, such as my JUnit test shown below, must take care to call init() before all other calls to the DAO, and destroy() after.

The DAO provides methods to retrieve all or part (by preference key prefix) of a user's preferences using the load() method. Preferences can be saved using save(). If we have been collecting preferences for a user while he is still not registered or logged in, once he is, we need to copy all our collected preferences to his new userId using the migrate() method. Finally, there is a expire() method that can be called by a scheduled job to clean out preferences for temporary users after a certain time.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
import java.io.File;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

import org.apache.commons.io.FileUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;
import com.sleepycat.persist.EntityCursor;
import com.sleepycat.persist.EntityStore;
import com.sleepycat.persist.PrimaryIndex;
import com.sleepycat.persist.SecondaryIndex;
import com.sleepycat.persist.StoreConfig;

/**
 * DAO that uses an in-memory Berkeley DB database as its datastore.
 */
public class UserPrefsDao {

  private static final Log logger = LogFactory.getLog(UserPrefsDao.class);
  
  private String dataDirectory;
  private long timeToLiveMillis = 24 * 60 * 60 * 1000; // 1 day
  
  private Environment env;
  private EntityStore store;
  private PrimaryIndex<String,UserPrefsEntity> userPrefsByUserId;
  private SecondaryIndex<UserPrefsEntity,String,UserPrefsEntity> userPrefsByUpdatedMillis;
  
  public void setDataDirectory(String dataDirectory) {
    this.dataDirectory = dataDirectory;
  }

  public void setTimeToLiveMillis(long timeToLiveMillis) {
    this.timeToLiveMillis = timeToLiveMillis;
  }
  
  protected void init() throws Exception {
    File dataDir = new File(dataDirectory);
    if (! dataDir.exists()) {
      FileUtils.forceMkdir(dataDir);
    }
    EnvironmentConfig environmentConfig = new EnvironmentConfig();
    environmentConfig.setAllowCreate(true);
    environmentConfig.setTransactional(true);
    env = new Environment(dataDir, environmentConfig);
    StoreConfig storeConfig = new StoreConfig();
    storeConfig.setAllowCreate(true);
    storeConfig.setTransactional(true);
    store = new EntityStore(env, dataDir.getName(), storeConfig);
    userPrefsByUserId = store.getPrimaryIndex(String.class, UserPrefsEntity.class);
    userPrefsByUpdatedMillis = store.getSecondaryIndex(
      this.userPrefsByUserId, UserPrefsEntity.class, "updatedMillis");
  }
  
  protected void destroy() throws Exception {
    if (store != null) {
      store.close();
    }
    if (env != null) {
      env.close();
    }
  }
  
  /**
   * Retrieve the preferences for the specified user.
   * @param userId the userId.
   * @return the preferences for the user, if it exists.
   * @throws Exception if one is thrown.
   */
  public Map<String,String> load(String userId) throws Exception {
    UserPrefsEntity userPrefs = userPrefsByUserId.get(userId);
    if (userPrefs == null) {
      return Collections.EMPTY_MAP;
    }
    return userPrefs.getPrefs();
  }
  
  /**
   * Retrieves a partial map of preferences for the specified user. This is
   * useful when we want to partition the preferences across multiple applications,
   * so each application only saves and uses a non-overlapping subset of the
   * preferences.
   * @param userId the userId.
   * @param keyPrefix the preference key prefix, eg. language.dialect
   * @return the partial Map of preferences. Only the keys which start with the
   * specified keyPrefix will be returned.
   * @throws Exception if one is thrown.
   */
  public Map<String,String> load(String userId, String keyPrefix) throws Exception {
    TreeMap<String,String> allPrefs = (TreeMap<String,String>) load(userId);
    return allPrefs.tailMap(keyPrefix, true);
  }

  /**
   * Migrate the user's preferences to a permanent storage when he registers.
   * Temporary preference values are stored for a configurable time, by default
   * it is 1 day. However, once the user registers, his preferences are never
   * expired.
   * @param sourceUserId the temporary user id.
   * @param targetUserId the permanent user id.
   * @return the preferences for the target user id.
   * @throws Exception if one is thrown.
   */
  public Map<String,String> migrate(String sourceUserId, String targetUserId) 
      throws Exception {
    UserPrefsEntity sourceEntity = (UserPrefsEntity) userPrefsByUserId.get(sourceUserId);
    logger.debug("Deleting temp user:" + sourceUserId);
    userPrefsByUserId.delete(sourceUserId);
    return save(targetUserId, sourceEntity.getPrefs());
  }
  
  /**
   * Save the user preferences. The map of preferences passed in can be partial
   * or full. Only the preference values provided will be updated, the rest will
   * remain untouched.
   * @param userId the user id.
   * @param values the Map of preferences.
   * @return the updated map.
   * @throws Exception if one is thrown.
   */
  public Map<String,String> save(String userId, Map<String,String> values) 
      throws Exception {
    PrimaryIndex<String,UserPrefsEntity> primaryKey = 
      store.getPrimaryIndex(String.class, UserPrefsEntity.class);
    UserPrefsEntity entity = new UserPrefsEntity(userId);
    entity.setPrefs(values);
    entity.setUpdatedMillis(System.currentTimeMillis());
    logger.debug("Saving prefs for userId:" + userId);
    primaryKey.put(entity);
    return entity.getPrefs();
  }
  
  /**
   * Used for one time load of the existing data. Will probably never be used
   * after that.
   * @param data the Prefs data from the old system.
   * @throws Exception if one is thrown.
   */
  public void saveAllPrefs(Map<String,Map<String,String>> data) throws Exception {
    for (String key : data.keySet()) {
      save(key, data.get(key));
    }
  }
  
  /**
   * Used by backend scheduled job to expire temporary (non-registered user)
   * preferences. The cutoff time is the time specified in the call to 
   * expire. Any entries which are older than millisSinceEpoch - timeToLiveMillis
   * will be expired. 
   * @param millisSinceEpoch the current time in milliseconds since epoch.
   * @throws Exception if one is thrown.
   */
  public void expire(long millisSinceEpoch) throws Exception {
    long cutoff = millisSinceEpoch - timeToLiveMillis;
    List<String> userIdsToDelete = new ArrayList<String>();
    EntityCursor<UserPrefsEntity> userPrefsCursor = null;
    try {
      userPrefsCursor = userPrefsByUpdatedMillis.entities();
      for (UserPrefsEntity userPrefs : userPrefsCursor) {
        long updatedMillis = userPrefs.getUpdatedMillis();
        if (updatedMillis < cutoff) {
          String userId = userPrefs.getUserId();
          if (userId.startsWith("t-")) {
            userIdsToDelete.add(userId);
          }
        } else {
          // all entries will have been updated after the cutoff
          break;
        }
      }
    } finally {
      if (userPrefsCursor != null) {
        userPrefsCursor.close();
      }
    }
    for (String userIdToDelete : userIdsToDelete) {
      logger.debug("Deleting expired user:" + userIdToDelete);
      userPrefsByUserId.delete(userIdToDelete);
    }
  }
}

I created a JUnit test to exercise this class, which I show below to illustrate usage. Because this is not Spring enabled, I use the @BeforeClass and @AfterClass to call the DAO's init() and destroy() methods. The rest of it is pretty self-explanatory.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import java.io.File;
import java.util.HashMap;
import java.util.Map;

import junit.framework.Assert;

import org.apache.commons.io.FileUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;

public class UserPrefsDaoTest {

  private static final Log logger = LogFactory.getLog(UserPrefsDaoTest.class);
  
  private static UserPrefsDao dao;
  
  @BeforeClass 
  public static void setUpBeforeClass() throws Exception {
    FileUtils.forceDelete(new File("/tmp/UserPrefs"));
    dao = new UserPrefsDao();
    dao.setDataDirectory("/tmp/UserPrefs");
    dao.setTimeToLiveMillis(0L);
    dao.init();
  }

  @AfterClass 
  public static void tearDownAfterClass() throws Exception {
    dao.destroy();
  }
  
  @Test 
  public void testSavePrefs() throws Exception {
    // save a temp user
    Map<String,String> value1 = new HashMap<String,String>();
    value1.put("a.b.c.d", "14.0");
    value1.put("a.b.c.d2", "16.0");
    value1.put("a.b", "false");
    dao.save("t-1234", value1);
    Assert.assertNotNull(dao.load("t-1234"));
    
    // save a perm user
    Map<String,String> value2 = new HashMap<String,String>();
    value2.put("x.y.z.a", "234");
    value2.put("x.y", "true");
    value2.put("x.y.z.1", "123");
    dao.save("12345678", value2);
    Assert.assertNotNull(dao.load("12345678"));
    
    // save another temp user
    Map<String,String> value3 = new HashMap<String,String>();
    value3.put("x.y.z.a", "986");
    value3.put("x.y.a", "true");
    value3.put("x.y.z.1", "234");
    dao.save("t-2345", value3);
    Assert.assertNotNull(dao.load("t-2345"));
  }
  
  @Test 
  public void testRetrieve() throws Exception {
    // get back the first temp user
    Map<String,String> rvalues1 = dao.load("t-1234");
    logger.debug("retrieved values for t-1234:" + rvalues1.toString());
    Assert.assertNotNull(rvalues1);
    // get back a perm user
    Map<String,String> rvalues2 = dao.load("12345678");
    logger.debug("retrieved values for 12345678:" + rvalues2.toString());
    Assert.assertNotNull(rvalues2);
  }
  
  @Test 
  public void testRetrieveInvalidUser() throws Exception {
    // try to get a user with incorrect id, should return empty map
    Map<String,String> ivalues1 = dao.load("23456789");
    logger.debug("retrived values for invalid user 23456789:" + ivalues1.size());
    Assert.assertNotNull(ivalues1);
    Assert.assertEquals(0, ivalues1.size());
  }
  
  @Test
  public void testPropertySubsetRetrieval() throws Exception {
    // try to get a subset of properties for a user
    Map<String,String> svalues1 = dao.load("t-1234", "a.b.c");
    logger.debug("retrieved values for t-1234 for a.b.c:" + svalues1.toString());
    Assert.assertNotNull(svalues1);
    Assert.assertEquals(2, svalues1.size());
  }
  
  @Test
  public void testInvalidPropertySubsetRetrieval() throws Exception {
    // try to get a invalid subset of properties for a user, should return empty map
    Map<String,String> svalues1 = dao.load("t-1234", "x.y.z");
    logger.debug("retrieved values for t-1234 for x.y.z:" + svalues1.toString());
    Assert.assertNotNull(svalues1);
    Assert.assertEquals(0, svalues1.size());
  }
  
  @Test
  public void testMigrate() throws Exception {
    // migrate the t-2345 user to perm user 23456789
    Map<String,String> mvalues1 = dao.load("t-2345");
    Map<String,String> mvalues2 = dao.migrate("t-2345", "23456789");
    logger.debug("migrate source values (t-2345):" + mvalues1.toString());
    logger.debug("migrate target values (23456789):" + mvalues2.toString());
    Assert.assertNotNull(mvalues2);
    Assert.assertEquals(mvalues1.size(), mvalues2.size());
  }
  
  @Test
  public void testExpire() throws Exception {
    // expire prefs, temp users (only) should be deleted
    dao.expire(System.currentTimeMillis());
    Map<String,String> rvalues1 = dao.load("t-1234");
    Assert.assertNotNull(rvalues1);
    Assert.assertEquals(0, rvalues1.size());
    Map<String,String> rvalues2 = dao.load("12345678");
    Assert.assertNotNull("User 12345678 should have non-null prefs", rvalues2);
    Assert.assertEquals(3, rvalues2.size());
  }
}

I was quite pleasantly surprised with the Berkeley-DB DPL. Berkeley-DB does not have much of a following in the Java community, perhaps because it is perceived as difficult to use. The annotation based persistence mechanism provided by the DPL goes a very long way in alleviating this problem. There are many situations where BerkeleyDB would be a great fit, and with the DPL, it would be easier to apply. Hopefully, this example illustrates how easy it is to use Berkeley-DB to solve real-life business problems.

On a personal note, when annotations were introduced in Java 1.5, I did not like them that much. I started using the @Override, @SuppressWarning, etc because Eclipse would provide them as suggestions, then I started to use the Spring @Required tag, then the various JUnit 4.0 annotations, and now the DPL annotations. I still don't know much about how annotations work, but I seem to be pretty much hooked on them now.

6 comments (moderated to prevent spam):

Gregory Burd said...

We're very happy to see that the DPL has helped you to use Berkeley DB Java Edition. We realized that JE had been to hard to use and we hope that the DPL is the right compromise between ease of use and flexibility. Thanks for your excellent write up.

-greg

_____________________________________________________________________

Gregory Burd greg.burd@oracle.com
Product Manager, Berkeley DB/JE/XML Oracle Corporation

Sujit Pal said...

Hi Greg, thanks for your email and I am glad you liked the writeup. Indeed, your group's work to build the DPL has made BDB easier to use, at least for Java developers such as myself. Thanks for the great work.

Jean said...

Hello, first let me thank you for this great article.

I'm new to berkeley and to Springframework, I'm not able to run berkeley DB JE with spring. Can you please help me with berkeley db je + springframework integration?

Thank you in advance.

Sujit Pal said...

Hi Jean, thanks for the kind words. I haven't tried BerkeleyDB and Spring together, but I don't think that should be a problem. If you describe what you are looking to do, I will try to help.

Iftekhar said...

hi, ur post is very helpful. I need to integrate berkeley DB, spring, oracle and atomikos(for distributed transaction management) in my application. so i need to configure berkeley db with spring. Please do some favor for me.

Thanks

Sujit Pal said...

Thanks, Iftekar. Can you describe what you are not able to do with BDB and Spring? Spring has many options for injecting beans into other beans - you can do constructor injection, factory beans, etc.