Saturday, February 23, 2008

Implementing Inheritance in database with JPA

This post is the result of a casual conversation with one of our engineers. About couple of months ago, he happened to mention that if we ever got around to refactoring some code in one of our systems, using JPA for database persistence would be preferable instead of JDBC, as we do currently. I did not know much about JPA at the time, except that it was something to do with EJB3 and Hibernate, so I decided to read up about it.

Turns out I was only partially correct. JPA is an API which is implemented by various popular ORM implementations such as Toplink from Oracle, Kodo from BEA, Hibernate from Redhat's JBoss group and OpenJPA from Apache. In a sense, JPA is to ORMs what JDBC was to databases. It provides a common API to work against multiple ORMs, so developers need to learn one API to work against any JPA compliant ORM, and a company could (at least in theory) switch between ORM providers without changing any source code. Based on previous experience with coding against Hibernate 2.x, I think JPA code (using Hibernate) also looks much simpler.

One book I found very helpful was Chris Maki's "JPA 101: Java Persistence Explained" from Sourcebeat. Its available as a PDF eBook and is fairly reasonably priced, and contains almost no fluff, unless you count the first chapter, which deals with setting up the example application using Eclipse and Maven, which probably most (but not all) developers would be familiar with. However, the book has plenty of examples and is very readable. I would recommend it strongly if you are trying to get started with JPA.

After reading the book, I realized that the engineer's suggestion was pretty much spot-on. In particular, I liked the concept of implementing the application's object inheritance hierarchy directly in the database, which I describe below.

So basically, we license content from various different providers. All content has certain metadata that we will always extract, such as title and summary, and we would always assign that article a unique URL on our website. However, each provider is different, and some may provide additional metadata that is unique to the provider. So basically, consider two data sources, one called Magazine and one called Book. The object UML would look something like this:

The ModelBase class is something that is needed by JPA, and its convenient to set up a single one that enforces on the correct id class (for the given database) for any persistable bean in the application. The Article class specifies the properties we would always extract, regardless of provider, and the MagazineArticle and BookArticle specify the metadata unique to each provider.

JPA allows three different inheritance strategies, which most providers implement. I chose the JOINED strategy, where the properties common to all subclasses are stored in a master table, and the unique properties for each Article subclass are stored in their own tables, linked back using the autogenerated surrogate id as the foreign key. This has the advantage of being quite normalized, and if the inheritance structure is relatively flat (mine would only be one level deep), then the performance overhead of doing joins is minimal. The corresponding database structure for the JOINED subclass strategy would look like this:

Notice the absence of the discriminator column in the Article table above. I spent nearly a day trying to figure out what I was doing wrong before I found that Hibernate, unlike other ORM implementations JPA covers, does not need to use the discriminator column for JOINED inheritance, and hence apparently has no plans to conform to a standard it considers broken in this regard. It does not affect me that much since I plan on using Hibernate anyway, but while I am no expert on these things, I think that this strategy may harm Hibernate's adoption by big companies for whom JPA compliance is a higher priority. The least that should be done IMO is to adequately document this abberation, so other developers are not tripped up like I was.

But anyway, on to the code. Since I was using MySQL, I decided to build a ModelBase object which is annotated as @MappedSuperclass, and which specifies the id type and generation strategy. JPA can work with legacy ids, but it is considerably more work to implement than autogenerated surrogate keys, so I decided to keep things simple. In any case, if we decide to switch to some other database, all we would need to do is change the id generation strategy in this one class (and the provider in the persistence.xml class).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// ModelBase.java
package com.mycompany.myapp.persistence;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

import javax.persistence.CascadeType;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.MappedSuperclass;
import javax.persistence.OneToMany;

@MappedSuperclass
public abstract class ModelBase implements Serializable {

  @Id 
  @GeneratedValue(strategy=GenerationType.AUTO)
  private Long id;
  
  public Long getId() {
    return id;
  }
  
  public void setId(Long id) {
    this.id = id;
  }
}

The above class is marked as @MappedSuperclass, so there is no corresponding table in the database. The next class is the main Article class, also abstract, since we don't want to use this as is, ever. For each content provider, we want to add in extra metadata unique to that provider.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Article.java
package com.mycompany.myapp.persistence;

import java.util.ArrayList;
import java.util.List;

import javax.persistence.DiscriminatorColumn;
import javax.persistence.DiscriminatorType;
import javax.persistence.Entity;
import javax.persistence.Inheritance;
import javax.persistence.InheritanceType;

@Entity
@Inheritance(strategy=InheritanceType.JOINED)
@DiscriminatorColumn(discriminatorType=DiscriminatorType.INTEGER, name="articleTypeId")
public abstract class Article extends ModelBase {

  private String articleId;
  private String title;
  private String summary;
  private String url;
  
  public String getArticleId() {
    return articleId;
  }
  
  public void setArticleId(String articleId) {
    this.articleId = articleId;
  }
  
  public String getTitle() {
    return title;
  }
  
  public void setTitle(String title) {
    this.title = title;
  }
  
  public String getSummary() {
    return summary;
  }
  
  public void setSummary(String summary) {
    this.summary = summary;
  }
  
  public String getUrl() {
    return url;
  }
  
  public void setUrl(String url) {
    this.url = url;
  }
}

Notice that we have the @DiscriminatorColumn annotation. With Hibernate, this has absolutely no effect, and in fact, you don't even have to have this for InheritanceType.JOINED. The only time you need this for Hibernate is for single table inheritance.

The subclasses of Article are shown below. As before, we don't need to have the @DiscriminatorValue annotation in either subclass, since Hibernate will not use it or record it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// MagazineArticle.java
package com.mycompany.myapp.persistence;

import java.util.Date;

import javax.persistence.DiscriminatorValue;
import javax.persistence.Entity;
import javax.persistence.Temporal;
import javax.persistence.TemporalType;

@Entity
@DiscriminatorValue("1")
public class MagazineArticle extends Article {

  private static final long serialVersionUID = 4276734517833727032L;

  private String publicationName;
  
  @Temporal(TemporalType.DATE)
  private Date publicationDate;
  
  private String authorName;
  
  public String getPublicationName() {
    return publicationName;
  }
  
  public void setPublicationName(String publicationName) {
    this.publicationName = publicationName;
  }
  
  public Date getPublicationDate() {
    return publicationDate;
  }
  
  public void setPublicationDate(Date publicationDate) {
    this.publicationDate = publicationDate;
  }
  
  public String getAuthorName() {
    return authorName;
  }

  public void setAuthorName(String authorName) {
    this.authorName = authorName;
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// BookArticle.java
package com.mycompany.myapp.persistence;

import javax.persistence.DiscriminatorValue;
import javax.persistence.Entity;

@Entity
@DiscriminatorValue("2")
public class BookArticle extends Article {

  private static final long serialVersionUID = -2274023497279749079L;
  
  private String authorName;
  private String publisherName;
  private String isbnNumber;
  
  public String getAuthorName() {
    return authorName;
  }
  
  public void setAuthorName(String authorName) {
    this.authorName = authorName;
  }
  
  public String getPublisherName() {
    return publisherName;
  }
  
  public void setPublisherName(String publisherName) {
    this.publisherName = publisherName;
  }
  
  public String getIsbnNumber() {
    return isbnNumber;
  }
  
  public void setIsbnNumber(String isbnNumber) {
    this.isbnNumber = isbnNumber;
  }
}

Finally, we need to set up the database. I created a database and populated the tables shown in the database diagram above. Then to link up the code and the database, we create a persistence.xml file in src/main/resources/META-INF directory, like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/persistence 
    http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd"
    version="1.0">
  <persistence-unit name="myapp" transaction-type="RESOURCE_LOCAL">
    <provider>org.hibernate.ejb.HibernatePersistence</provider>
    <class>com.mycompany.myapp.persistence.ModelBase</class>
    <class>com.mycompany.myapp.persistence.Article</class>
    <!-- put your article subclasses here -->
    <class>com.mycompany.myapp.persistence.MagazineArticle</class>
    <class>com.mycompany.myapp.persistence.BookArticle</class>
    <properties>
      <property name="hibernate.connection.driver_class" 
        value="com.mysql.jdbc.Driver"/>
      <property name="hibernate.connection.url" 
        value="jdbc:mysql://localhost:3306/contentdb"/>
      <property name="hibernate.connection.username" value="jpauser" />
      <property name="hibernate.connection.password" value="jpauser"/>
      <property name="hibernate.dialect" 
        value="org.hibernate.dialect.MySQLDialect"/>
      <property name="hibernate.cache.provider_class" 
        value="org.hibernate.cache.HashtableCacheProvider"/>
      <property name="hibernate.show_sql" value="true"/>
    </properties>
  </persistence-unit>
</persistence>

Here is a JUnit to insert data into this database structure. Most people would probably use DBUnit to do this, but my objective was to find how to insert data into the database using JPA, so I did a unit test. Notice how the code is pretty much unaware of the underlying database structure. It deals with Java objects, and the JPA entityManager does the work of creating and executing the SQL for it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// ArticlePersistenceTest.java
package com.mycompany.myapp.persistence;

import java.util.ArrayList;
import java.util.Calendar;
import java.util.Date;
import java.util.List;

import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import javax.persistence.Query;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

public class ArticlePersistenceTest {

  private final Log log = LogFactory.getLog(getClass());
  
  private EntityManager entityManager;
  private EntityManagerFactory entityManagerFactory;
  
  @Before
  public void setUp() throws Exception {
    entityManagerFactory = Persistence.createEntityManagerFactory("myapp");
    entityManager = entityManagerFactory.createEntityManager();
  }

  @After
  public void tearDown() throws Exception {
    entityManager.close();
    entityManagerFactory.close();
  }

  @Test
  public void testPersistAdamArticle() throws Exception {

    // build and persist a magazine article
    MagazineArticle ma = new MagazineArticle();
    ma.setArticleId("mag-000-001");
    ma.setTitle("Magazine Article Title 1");
    ma.setSummary("This is a short summary of magazine article 0001...");
    ma.setUrl("/path/to/mag-art-0001");
    Calendar pubDateCalendar = Calendar.getInstance();
    pubDateCalendar.set(2002, 11, 15); 
    ma.setPublicationDate(pubDateCalendar.getTime());
    ma.setPublicationName("Harper Collins");
    ma.setAuthorName("Dr Doolittle");

    entityManager.getTransaction().begin();
    entityManager.persist(ma);
    entityManager.getTransaction().commit();

    // build and persist a book article
    BookArticle ba = new BookArticle();
    ba.setArticleId("bk-000-001");
    ba.setTitle("Book Article Title 1");
    ba.setSummary("This is a short summary of book article 0001...");
    ba.setUrl("/path/to/book-art-0001");
    ba.setAuthorName("Dr Busybody");
    ba.setPublisherName("Tom Collins");
    ba.setIsbnNumber("1234-5678");

    entityManager.getTransaction().begin();
    entityManager.persist(ba);
    entityManager.getTransaction().commit();
    
    // select all articles
    Query q = entityManager.createQuery("select a from Article a");
    List<Article> results = q.getResultList();
    for (Article result : results) {
      log.debug("result=" + result.toString());
    }
  }
}

In my code, I purposely kept the code as free of override annotations as possible, which may not be possible in real life. For example, your DBA may enforce a particular table naming or column naming structure. You can map beans to corresponding table names using the @Table annotation, and property names to corresponding column names using the @Column annotation.

Another thing I noticed was that the performance of the JPA code is slightly slower compared to straight JDBC calls. However, this is expected, since JPA provides a level of abstraction that allows us to write more readable code, and does some generic heavy lifting behind the scenes that would be concievably less efficient than hand crafted SQL. I think this becomes less noticeable when we run the applications over longer periods of time and we are able to take advantage of the ORM's cache.

Overall, I was quite impressed with JPA. The JOINED subclass strategy is conceptually nicer than the table per class strategy we have currently implemented using straight JDBC. With a JOINED strategy, we can enforce that certain fields will need to be populated regardless of a provider. It is also normalized, with no repetition of column names across individual tables. Often, the implementor of a new table will use different column names or column types, which makes it harder to work with the articles in a generic way on the front end.

As for the learning curve involved with JPA, obviously there is one, but I forsee that JPA will soon be as ubiquitous as JDBC is today. Already, more and more Java shops are switching over to ORMs, and there are plenty of free and open-source products available which are as good as their commercial counterparts. Learning the JPA API will enable you to work with the JPA compliant ORMs out there, and now that it supports annotations, its just a matter of learning a few simple annotations to get going with JPA.

6 comments (moderated to prevent spam):

Hemant Patel said...

Hi Sujit,

JPA is really wonderful and best thing is orm independence, what you need to change is few xml files even if you are changing orm. Do you have any idea about how successful JPA is with DynaBeans, I tried before some time, but got stuck in Lists like isIndexable, isStorable etc. I think using transient variables may be a way we can figure it out.

Sujit Pal said...

Yes, the more I play around with JPA, the more favorably it compares to my initial experiences with Hibernate 2.x :-). Its much easier to learn, that's for sure. As for using JPA with DynaBeans, don't you think they are kind of orthogonal? JPA, like its underlying ORM implementations, persists POJOs into databases by mapping the O to the R (class==table, field==column, etc). OTOH, a DynaBean is not a POJO, and one of its features is that you can access its fields by name, so it could be used to build simple map based persistence mechanisms such as iBatis.

Anonymous said...

Thank you very much for this article.
To have the whole classes code helped me a lot since I was having some doubts with the “official” and not so explanatory documentation.

Sujit Pal said...

Thanks Patricio, glad it helped.

Thomas Dias said...

Thanks! It helped a lot!!

Sujit Pal said...

Thanks, Thomas, you are welcome.