Thursday, August 27, 2009

Writing a Clojure DSL for SQLUnit

Last week, at our monthly eBig Java SIG meeting, Anne Ogborne spoke to us about her experiences using Clojure as a Second Life programmer. In the course of the talk, some one asked her why and when a typical Java programmer would use Clojure - unfortunately, as expected, her answer and the ensuing discussion eventually devolved into a bit of a language flame fest.

As it turns out, I have been debating this question myself to decide whether learning Clojure was worth spending time on. Clojure is a dialect of Lisp, a general purpose programming language that runs on the JVM, and can access Java classes loaded in the JVM, so it can do whatever I can do with Java and/or Lisp. However, as someone who mainly does Java and is fairly happy with it, I do not see myself doing in Clojure things I can already do easily in Java (or Jython/Python).

However, I think there is (at least) one area that Clojure is fantastic for - as a language for writing DSLs. Pragmatic Java programmers typically write their DSLs as a XML or JSON dialect, using one of the many XML/JSON parsers available in the Java world. Some examples that come to mind are Ant's build.xml and Spring's applicationContext.xml. More adventurous ones (or those working in situations where XML/JSON is not an option) would use something like JavaCC or AntLR to build a language and its parser from scratch, for example the Lucene query language.

With Clojure, you can write your DSL as as extension of Clojure. So all you do is implement (in Clojure or Clojure/Java) the bits that are specific to your DSL - things like control flow and variable declaration, assignments, etc. can be left to Clojure. Because of Clojure's (and Lisp's) homoiconic nature, functions that you provide are represented the same as functions provided by Clojure. The DSL user would basically write the DSLs in a "dialect" of Clojure that includes these extra functions.

SQLUnit in Clojure

A few years ago, I built a little tool that allowed developers to unit test stored procedures. Tests were specified as a stored procedure call and expected results in an XML dialect (see here for an example), and the tool would execute the procedure, and match the result with the one provided. Different styles of matching were provided, such as exact equality, row count equality, failure, etc. However, because it grew organically as a result of user feedback, and because of my own lack of design foresight, it eventually became really complex, and began to suck up almost all my free time, so I handed the project off to someone else and moved on to more interesting things.

The first draft of my Clojure version of the XML example is shown below. If you compare the XML with this version, you will see that there is a 1:1 correspondence between the two files. However, this is written in a real programming language, meaning that the DSL user has much more flexibility. For example, if you now wanted to compare the results from the same stored procedures from two different databases (an actual request which required a change to SQLUnit), then you could simply define another Connection object and replace the fourth argument to run-test with another run-select instead of from-string.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
; Source: src/sqlunit/pgsql-test.clj
(ns sqlunit.pgsql-test
  (:import
    (java.sql Types)
  )
  (:use [sqlunit.core])
)

(defn main
  "This is the main test program that will be provided by the caller"
  []
  (def conn (
    connect-to-db "org.postgresql.Driver"
      "jdbc:postgresql://localhost:5432/sqlunitdb"
      "sujit" "")
  )
  (setup-suite conn (to-array [
    "delete from timecard where 1=1"
    "delete from employee where 1=1"
    "delete from department where 1=1"])
  )
  (run-test "Adding department HR"
    assert-results-equal?
    (run-select conn "select AddDept(?)"
      (struct sql-arg 1 "Human Resources" Types/VARCHAR :in))
    (from-string
      "[[{:idx 1 :name \"adddept\" :value \"$deptId_HR\"}]]")
  )
  (run-test "Adding department Infotech"
    assert-results-equal?
    (run-select conn "select AddDept(?)"
      (struct sql-arg 1 "Information Technology" Types/VARCHAR :in))
    (from-string
      "[[{:idx 1 :name \"adddept\" :value \"$deptId_IT\"}]]")
  )
  (run-test "Adding Employee John Doe to InfoTech"
    assert-results-equal?
    (run-select conn "select AddEmployee(?,?,?,?,?,?,?)"
      (struct sql-arg 1 "Information Technology" Types/VARCHAR :in)
      (struct sql-arg 2 "John Doe" Types/VARCHAR :in)
      (struct sql-arg 3 "1900-01-01" Types/DATE :in)
      (struct sql-arg 4 "123456789" Types/VARCHAR :in)
      (struct sql-arg 5 40.00 Types/NUMERIC :in)
      (struct sql-arg 6 10.00 Types/FLOAT :in)
      (struct sql-arg 7 20.00 Types/FLOAT :in))
    (from-string
      "[[{:idx 1 :name \"addemployee\" :value \"$empId_John\"}]]")
  )
  (run-test "Adding Employee John Doe again (expect exception)"
    assert-fail?
    (run-select conn "select AddEmployee(?,?,?,?,?,?,?)"
      (struct sql-arg 1 "Information Technology" Types/VARCHAR :in)
      (struct sql-arg 2 "John Doe" Types/VARCHAR :in)
      (struct sql-arg 3 "1900-01-01" Types/DATE :in)
      (struct sql-arg 4 "123456789" Types/VARCHAR :in)
      (struct sql-arg 5 40.00 Types/NUMERIC :in)
      (struct sql-arg 6 10.00 Types/FLOAT :in)
      (struct sql-arg 7 20.00 Types/FLOAT :in))
    (from-string "[[{}]]") ; we ignore this
  )
  (run-test "Adding Employee Jane Doe to HR"
    assert-results-equal?
    (run-select conn "select AddEmployee(?,?,?,?,?,?,?)"
      (struct sql-arg 1 "Human Resources" Types/VARCHAR :in)
      (struct sql-arg 2 "Jane Doe" Types/VARCHAR :in)
      (struct sql-arg 3 "1910-01-01" Types/DATE :in)
      (struct sql-arg 4 "143456789" Types/VARCHAR :in)
      (struct sql-arg 5 45.00 Types/NUMERIC :in)
      (struct sql-arg 6 10.0 Types/FLOAT :in)
      (struct sql-arg 7 20.0 Types/FLOAT :in))
    (from-string
      "[[{:idx 1 :name \"addemployee\" :value \"$empId_Jane\"}]]")
  )
  (run-test "Adding Employee Dick Tracy to InfoTech"
    assert-results-equal?
    (run-select conn "select AddEmployee(?,?,?,?,?,?,?)"
      (struct sql-arg 1 "Information Technology" Types/VARCHAR :in)
      (struct sql-arg 2 "Dick Tracy" Types/VARCHAR :in)
      (struct sql-arg 3 "1920-01-01" Types/DATE :in)
      (struct sql-arg 4 "144456789" Types/VARCHAR :in)
      (struct sql-arg 5 50.00 Types/NUMERIC :in)
      (struct sql-arg 6 10.0 Types/FLOAT :in)
      (struct sql-arg 7 20.0 Types/FLOAT :in))
    (from-string
      "[[{:idx 1 :name \"addemployee\" :value \"$empId_Dick\"}]]")
  )
  (run-test "Updating Hourly Rate for John"
    assert-results-equal?
    (run-select conn "select UpdateRate(?,?)"
      (struct sql-arg 1 "John Doe" Types/VARCHAR :in)
      (struct sql-arg 2 55.00 Types/NUMERIC :in))
    (from-string
      "[[{:idx 1 :name \"updaterate\" :value 40.00}]]")
  )
  (run-test "Looking up John Doe by name"
    assert-results-equal?
    (run-select conn "select * from GetEmployeeByName(?)"
      (struct sql-arg 1 "John Doe" Types/VARCHAR :in))
    (from-string
      "[[{:idx 1 :name \"emp_id\" :value \"$empId_John\"}
         {:idx 2 :name \"emp_name\" :value \"John Doe\"}
         {:idx 3 :name \"dept_name\" :value \"Information Technology\"}
         {:idx 4 :name \"emp_dob\" :value \"1900-01-01\"}
         {:idx 5 :name \"emp_ssn\" :value \"123456789\"}
         {:idx 6 :name \"rate_per_hour\" :value 55.00}]]")
  )
  (run-test "Looking up all employees in InfoTech"
    assert-results-equal?
    (run-select conn "select * from GetAllEmpForDept(?)"
      (struct sql-arg 1 "Information Technology" Types/VARCHAR :in))
    (from-string
      "[[{:idx 1 :name \"dept_id\" :value \"$deptId_IT\"}
         {:idx 2 :name \"emp_id\" :value \"$empId_John\"}
         {:idx 3 :name \"emp_name\" :value \"John Doe\"}
         {:idx 4 :name \"emp_dob\" :value \"1900-01-01\"}
         {:idx 5 :name \"emp_ssn\" :value \"123456789\"}
         {:idx 6 :name \"rate_per_hour\" :value 55.00}
         {:idx 7 :name \"location\" :value \"(10.0,20.0)\"}]
        [{:idx 1 :name \"dept_id\" :value \"$deptId_IT\"}
         {:idx 2 :name \"emp_id\" :value \"$empId_Dick\"}
         {:idx 3 :name \"emp_name\" :value \"Dick Tracy\"}
         {:idx 4 :name \"emp_dob\" :value \"1920-01-01\"}
         {:idx 5 :name \"emp_ssn\" :value \"144456789\"}
         {:idx 6 :name \"rate_per_hour\" :value 50.00}
         {:idx 7 :name \"location\" :value \"(10.0,20.0)\"}]]")
  )
  (run-test "Adding timecard for John"
    assert-results-equal?
    (run-select conn "select AddTimecard(?,?,?,?)"
      (struct sql-arg 1 "John Doe" Types/VARCHAR :in)
      (struct sql-arg 2 "2000-01-01" Types/DATE :in)
      (struct sql-arg 3 "09:00:00" Types/VARCHAR :in)
      (struct sql-arg 4 "17:00:00" Types/VARCHAR :in))
    (from-string
      "[[{:idx 1 :name \"addtimecard\" :value \"$empId_John\"}]]")
  )
  (run-test "Adding another timecard for John"
    assert-results-equal?
    (run-select conn "select AddTimecard(?,?,?,?)"
      (struct sql-arg 1 "John Doe" Types/VARCHAR :in)
      (struct sql-arg 2 "2000-01-02" Types/DATE :in)
      (struct sql-arg 3 "09:00:00" Types/VARCHAR :in)
      (struct sql-arg 4 "17:00:00" Types/VARCHAR :in))
    (from-string
      "[[{:idx 1 :name \"addtimecard\" :value \"$empId_John\"}]]")
  )
  (run-test "Adding timecard for Dick"
    assert-results-equal?
    (run-select conn "select AddTimecard(?,?,?,?)"
      (struct sql-arg 1 "Dick Tracy" Types/VARCHAR :in)
      (struct sql-arg 2 "2000-01-01" Types/DATE :in)
      (struct sql-arg 3 "10:00:00" Types/VARCHAR :in)
      (struct sql-arg 4 "23:00:00" Types/VARCHAR :in))
    (from-string
      "[[{:idx 1 :name \"addtimecard\" :value \"$empId_Dick\"}]]")
  )
  (run-test "Getting monthly report for InfoTech"
    assert-results-equal?
    (run-select conn "select * from GetMonthlyDeptReport(?,?,?)"
      (struct sql-arg 1 "Information Technology" Types/VARCHAR :in)
      (struct sql-arg 2 "2000-01-01" Types/DATE :in)
      (struct sql-arg 3 "2000-02-01" Types/DATE :in))
    (from-string
      "[[{:idx 1 :name \"dept_name\" :value \"Information Technology\"}
         {:idx 2 :name \"emp_name\" :value \"Dick Tracy\"}
         {:idx 3 :name \"hours_worked\" :value 13}
         {:idx 4 :name \"pay_rate\" :value 50.00}
         {:idx 5 :name \"total_pay\" :value 650.00}]
        [{:idx 1 :name \"dept_name\" :value \"Information Technology\"}
         {:idx 2 :name \"emp_name\" :value \"John Doe\"}
         {:idx 3 :name \"hours_worked\" :value 16}
         {:idx 4 :name \"pay_rate\" :value 55.00}
         {:idx 5 :name \"total_pay\" :value 880.00}]]")
  )
  (teardown-suite conn)
)

(main)

The core.clj file that is imported in the above code via the :use call is shown below. Ideally, this would be a small bit of Clojure glue code that would link up the test with some Java code, but I was trying to learn Clojure at this point, so I decided to do the whole thing in Clojure. I will try the Clojure/Java approach and write about at some point in the future. In any case, here is the Clojure code for a rudimentary version of SQLUnit.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
; Source: pgsql-test.clj
(ns sqlunit.core
  (:import
    (java.lang Class String)
    (java.sql DriverManager Connection Statement PreparedStatement
      ResultSet ResultSetMetaData Types SQLException)
  )
)

; Represents a stored procedure IN/OUT/INOUT parameter. The :idx
; is the (1-based) index, :value is the string or numeric value
; of the argument, :sqltype is the Types value and :paramtype is
; :in/:out/:inout.
(defstruct sql-arg :idx :value :sqltype :paramtype)
; Represents a single column in the database. The :idx is the
; (1-based) index, :name is the column name and :value is the
; value of the column. Note that only pure numeric values are
; left unquoted, all others are stringified and quoted.
(defstruct sql-col :idx :name :value)

; global variables
; symbol table
(def *syms* (ref {}))

(defn connect-to-db
  "Create and return a JDBC Connection using the DriverManager approach.
   Param: driver - the class name of the driver to use.
   Param: url - the JDBC url.
   Param: user - the user name.
   Param: password - the JDBC password.
   Return: a reference to a JDBC connection"
  [driver, url, user, password]
  (Class/forName driver)
  (def connection (DriverManager/getConnection url user password))
  connection
)

(defn get-metadata
  "Returns the metadata for the provided ResultSet object as a sequence
   of sql-arg data.
   Param: resultset - the resultset to examine.
   Return: the resultset metadata as a sequence of sql-args"
  [resultset]
  (def rsmd (.getMetaData resultset))
  (def colcount (.getColumnCount rsmd))
  (map #(struct sql-arg % (.getColumnName rsmd %) (.getColumnType rsmd %))
    (range 1 (inc colcount))
  )
)

(defn from-resultset
  "Converts a JDBC ResultSet object to the internal form, ie, a Sequence
   of Sequence of col structs.
   Param: resultset - the JDBC resultset.
   Return: A Seq of Seq of col structs."
  [resultset]
  (def resultmetas (get-metadata resultset))
  (def result [])
  (while (.next resultset)
    (def row [])
    (doseq [resultmeta resultmetas]
      (def col (struct sql-col
        (:idx resultmeta) (:value resultmeta)
        (String/valueOf (.getObject resultset (:idx resultmeta)))))
      (def row (conj row col))
    )
    (def result (conj result row))
  )
  result
)

(defn from-string
  "Converts a user-supplied raw result string to the internal form, ie,
   a Sequence of Sequence of col structs. Non-numeric values in
   the raw string should be quoted, otherwise this method will fail.
   Param: raw-string - the raw string.
   Return: the Seq of Seq of cols."
  [raw-string]
  (load-string raw-string)
)

(defn no-errors?
  "Return true if the current test already has an error, in which case
   calling code will typically short-circuit and return. The method just
   checks for the existence of the :message key in the global symbol table
   *syms*.
   Param: none
   Return: true or false"
  []
  (not (contains? (deref *syms*) :message))
)

(defn is-symbol?
  "Return true if the value is a symbol, ie, preceded by a $ sign.
   Param: value - the value to check.
   Return: true or false"
  [value]
  (.startsWith (str value) "$")
)

(defn assert-cols-equal?
  "Predicate that returns true if the string value of the columns are
   identical. Side effect is that if the LHS (col-1) value is a symbol,
   then the value is looked up from the symbol table. If the RHS (col-2)
   is a symbol, then the value is looked up from the LHS and the RHS
   and symbol table values updated with this value. Additionally, if
   the string values of the columns don't match after this, then an
   error message is written to the symbol table.
   Param: row-id - the row id of the column (0-based)
   Param: col-id - the column id of the column (0-based)
   Param: col-1 - the col struct representing the first column.
   Param: col-2 - the col struct representing the second column.
   Return: true or false"
  [row-id col-id col-1 col-2]
  (doseq [k (keys col-1)]
    ; if one of the values is a symbol, ie of the form $..., then we
    ; just replace one by the other
    (def col-value-1 (get col-1 k))
    (def col-value-2 (get col-2 k))
    (if (is-symbol? col-value-1)
      ; if the symbol is on the LHS, then check to see if it is defined
      ; in the symbol table
      (def col-value-1 (get *syms* col-value-1))
    )
    (if (is-symbol? col-value-2)
      ; if the symbol is on the RHS, then replace the symbol with the
      ; value on the LHS, and update the symbol table
      (do
        (dosync (alter *syms* assoc col-value-2 col-value-1))
        (def col-value-2 col-value-1)
      )
    )
    (def retval (and no-errors? (= (str col-value-1) (str col-value-2))))
    (if (not retval)
      (dosync (alter *syms* assoc :message (format
        "mismatch at (row,col):(%d,%d) for %s (%s != %s)"
        (inc row-id) (inc col-id) k col-value-1 col-value-2)))
    )
    retval
  )
)

(defn assert-rows-equal?
  "Predicate that compares two rows for equality. A row is a Seq of 
   col structs. The function iterates through the columns and
   delegates to the assert-cols-equal? predicate for each of
   them. If there is an error in the previous call, then the test is
   skipped. First checks for equality of number of rows, and if they
   are not equal, writes out the error message to the symbol table.
   Param: row-id - the row id (0-based).
   Param: row-1 - the LHS row Seq of cols.
   Param: row-2 - the RHS row Seq of cols.
   Return: true or false."
  [row-id row-1 row-2]
  (def count-1 (count row-1))
  (def count-2 (count row-2))
  (if (not (= count-1 count-2))
    (dosync (alter *syms* assoc :message (
      format "row count mismatch at row: %d (%d != %d)"
      (inc row-id) (count-1) (count-2)))
    )
  )
  (if no-errors?
    (dotimes [i count-1]
      ; although assert-cols-equal? produces a retval, there
      ; does not seem to be a way of breaking out on a condition, so we
      ; just let it do the full run, and then check for the no-error?
      ; side effect at the end.
      (assert-cols-equal? row-id i (get row-1 i) (get row-2 i))
    )
  )
  no-errors?
)

(defn assert-results-equal?
  "Predicate that compares two results for equality. A result is a Seq of
   rows. The function loops through the rows in the result and delegates
   to the assert-rows-equal? predicate. First checks for equality
   of results, if not equal, writes an error into the symbol table.
   Param: result-1 - the first result.
   Param: result-2 - the second result.
   Return: true or false."
  [result-1 result-2]
  (def count-1 (count result-1))
  (def count-2 (count result-2))
  (if (not (= count-1 count-2))
    (dosync (alter *syms* assoc :message (
      format "result count mismatch (%d != %d)"
      (count-1) (count-2))))
  )
  (if no-errors?
    (dotimes [i count-1]
      (assert-rows-equal? i (get result-1 i) (get result-2 i))
    )
  )
  no-errors?
)

(defn assert-fail?
  "Checks to see if a failure message was logged as a result of the
   run-select function going into the catch block because of an
   exception in running.
   Params: None
   Return: true or false"
  [& junk]
  (contains? (deref *syms*) :failure)
)

(defn run-select
  "Runs a SELECT SQL query. Returns the result, which is the JDBC
   resultset in internal form.
   Param: connection - the JDBC connection.
   Param: sql - the SQL to run.
   Param: & sql-arg - variable number of sql-arg structs.
   Return: result - the JDBC resultset converted to internal format."
  [connection sql & sql-args]
  (def ps (.prepareStatement connection sql))
  (doseq [sql-arg sql-args]
    (.setObject ps (:idx sql-arg) (:value sql-arg) (:sqltype sql-arg))
  )
  (try
    (do
      (def rs (.executeQuery ps))
      (def result (from-resultset rs))
      (.close rs)
      (.close ps)
      result
    )
    (catch SQLException e
      (dosync (alter *syms* assoc :failure (.getErrorCode e)))
    )
  )
)

(defn run-update
  "Runs a SQL INSERT/UPDATE query. This is used here when setting up
   the test, to remove artifacts of previous tests, for example.
   Param: connection - the JDBC connection.
   Param: sql - the INSERT/UPDATE SQL.
   Param: & sql-args - a variable number of sql-arg structs.
   Returns: none"
  [connection sql & sql-args]
  (def ps (.prepareStatement connection sql))
  (doseq [sql-arg sql-args]
    (.setObject ps (:idx sql-arg) (:value sql-arg) (:sqltype sql-arg))
  )
  (.executeUpdate ps)
  (.close ps)
)

(defn setup-suite
  "Run setup commands (SQL UPDATE/INSERT) commands in preparation
   for running a suite of tests.
   Param: connection - the JDBC connection
   Param: sqls a Seq of SQL UPDATE/INSERT calls.
   Return: None."
  [connection sqls]
  (doseq [sql sqls]
    (run-update connection sql)
  )
)

(defn teardown-suite
  "Run teardown commands (usually closing connection, etc) once the
   suite of tests are completed.
   Param: connection - the JDBC connection.
   Return: None."
  [connection]
  (.close connection)
)

(defn run-test
  "The test function is a super-function which the caller calls to
   run the test.
   Param: test-name - the name of the test.
   Param: assertion - the assertion to check.
   Param: result-1 - the LHS result.
   Param: result-2 - the RHS result.
   Return: None."
  [test-name assertion result-1 result-2]
  (print "Running Test:" test-name)
  (def retval (apply assertion [result-1 result-2]))
  (dosync
    (alter *syms* dissoc :message)
    (alter *syms* dissoc :failure)
  )
  (def test-result
    (if retval "Ok"
      (format "Failed\n..Message: %s" (*syms* :message))
  ))
  (println "..." test-result)
)

The indentation may look a bit funky to experienced Lisp/Clojure programmers, but I come from a Java/Python background, and it helps if I line up the parenthesis like I have above, as well as make the introduction and removal of debugging (println "foo") calls less invasive.

Running this test in the REPL produces the following output, which is similar to what SQLUnit returns as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
user=> 
Running Test: Adding department HR... Ok
Running Test: Adding department Infotech... Ok
Running Test: Adding Employee John Doe to InfoTech... Ok
Running Test: Adding Employee John Doe again (expect exception)... Ok
Running Test: Adding Employee Jane Doe to HR... Ok
Running Test: Adding Employee Dick Tracy to InfoTech... Ok
Running Test: Updating Hourly Rate for John... Ok
Running Test: Looking up John Doe by name... Ok
Running Test: Looking up all employees in InfoTech... Ok
Running Test: Adding timecard for John... Ok
Running Test: Adding another timecard for John... Ok
Running Test: Adding timecard for Dick... Ok
Running Test: Getting monthly report for InfoTech... Ok
nil
user=> 

Conclusions

People (including yours truly in the past) often complain about Lisp's prefix notation and all the parenthesis, but I think thats just a function of usage. It took me approximately 4 (4-6 hours per day) days to write the code in this post, and that was starting with zero knowledge of Clojure. Stuart Holloway's Programming Clojure book helped a lot, as did the four part Clojure tutorial (parts 1, 2, 3 and 4) by the author of the Occasionally sane blog. But as a result of this, I now find Clojure code almost as easy to read as Java or Python code.

The point I am trying to make is that you may get pushback along similar lines from the DSL user, but if you point out to him that (a) the problem is temporary and will go away in 3-4 days, (b) the Clojure code is not much different from the XML version, and (c) that he is programming in a real language, so he does not have to depend on you for every little change, he will probably buy into the idea as well. As a DSL writer, you can concentrate on coding only the actual behavior and thus be more productive. I believe that this alone may be a compelling reason for a Java programmer to learn Clojure.

Other information

I used the Netbeans IDE with the enclojure plugin to write the code. Along with the standard paren-matching and syntax highlighting, enclojure offers a built in REPL which is very convenient - I use the Emacs key-binding, so to run the code within the built-in REPL, the keystroke is ALT+L (not CTRL+L as advertised, perhaps for the default key-binding, on the enclojure docs).

I also had a bit of re-learning to do when setting up PostgreSQL for the test. Last I used it was over 3 years (and 1 machine) ago, so I had to install everything from scratch. Here is a quick cheat sheet if you want to do something similar.

  1. Install PostgreSQL: I used yum to install postgresql, postgresql-libs and postgresql-server. This automatically creates a Unix user called "postgres".
  2. Start the server: Run "sudo /etc/init.d/postgresql restart" from the unix prompt. The first time it will ask you to set up the cluster - do that and retry the command. On my machine, the cluster is created under /var/lib/pgsql/data. Note this directory, you will need it later.
  3. Create database: As postgres, run "createdb sqlunit" from the unix prompt. This will create your database.
  4. Create user: As postgres, run "createuser sujit" from the unix prompt. When prompted, give it PostgresSQL super-user privileges.
  5. Create language: Allow sqlunitdb to use plpgsql as a stored procedure language. As postgres, run "createlang plpgsql sqlunitdb" from the unix prompt.
  6. Load schema and procedures: There is a reload.sh that would do this for you. You will have to edit out the "drop table" calls the first time you use it.
  7. Enable JDBC access: You need to add the line in pg_hba.conf (in your /var/lib/pgsql/data directory) as described in the PostgreSQL JDBC FAQ here. The update to postgres.conf described here did not work and turned out to be unnecessary for me. Restart the server after updating pg_hba.conf.

Saturday, August 22, 2009

Exposing Java Objects over HTTP with Jetty

Couple of weeks ago, I described how I exposed a Python object over HTTP using CherryPy. I wrote the code in there after a quick read of the CherryPy Tutorial, so it's quite basic and doesn't use any advanced CherryPy functionality. But what impressed me was the simplicity of CherryPy's approach to exposing the Python object - and since I am primarily a Java programmer, and therefore mostly need to expose Java objects, I decided to see if I could do something similar using Jetty. This post is a result of that decision.

Overview

I called the system JOH (for Java Over HTTP). It rhymes with D'oh (as in, D'oh! Why didn't I think of this before?). The diagram below illustrates the data flow. Imagine that you have one or more Java objects (the left most box) that you are currently calling directly from Java client code (the right most box). Exposing these objects over HTTP using JOH involves creating a Facade and plugging it into JOH on the server side. The Facade delegates to the Java Objects and decides how to serialize the outputs, in our example we convert our outputs to JSON. On the client side, the client now needs to go through an HTTP client which will deserialize the HTTP response into the desired Object.

As a user, the only significant thing you need to supply is the Facade class, which has a bunch of public methods which translate directly into servlet URI's. So a public method called getFoo() will be accessible over JOH using the URL http://.../getfoo. Unlike CherryPy, where each method is individually exposed, we rely on Java's visibility modifiers here - all public methods on the Facade are exposed - if you don't want to expose some method, change its visibility to protected or private.

Each method on the Facade takes a reference to the HttpServletRequest and HttpServletResponse. When delegating to methods of local Java objects, it extracts the relevant parameter from the request, validates and passes it on. On the way out, it serializes the output of the result and sticks it into the response. The serialization mechanism can probably be factored out into a Renderer abstraction, and multiple types of Renderer provided for different serialization mechanisms. So anyway, as you can see, most of the "magic" of exposing the Java object over HTTP happens in the Facade.

The Joh.expose method (called from the Facade's main() method) is responsible for exposing the Facade to Jetty's lifecycle via the JohHandler. Each incoming HTTP request is intercepted by the JohHandler, and converted into a method call on the Facade, which is then invoked reflectively. After the invocation, the Facade takes over and is responsible for sending out the response.

On the client side, we have a simple JOH Client which uses takes a URL and deserializes the response back to the user's requested class. Both the JSON serialization and deserialization use the Jackson JSON Processor. A similar facade exists on the client side to minimize disruption to client code, if it already exists - instead of calling the Java objects, the client code calls the equivalent methods on the client facade, which delegates to the JOH Client.

JOH Components

We describe the JOH components (in the pink boxes in the diagram above) individually below. If you just want to know how to use this stuff, then you can safely skip down to the next section on the Facade components (the light blue boxes in the diagram).

Joh.java

The Joh class is the main class and consists of a bunch of static methods. The main method is expose() which takes an instance of the Facade class to expose and a configuration file. If a configuration file is not supplied (i.e. null), suitable defaults are used to start the Jetty server. In addition, there are some generally useful utility methods in here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// Source: src/main/java/com/mycompany/joh/Joh.java
package com.mycompany.joh;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.Map;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.commons.collections15.map.CaseInsensitiveMap;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.mortbay.jetty.Connector;
import org.mortbay.jetty.Handler;
import org.mortbay.jetty.HttpStatus;
import org.mortbay.jetty.Server;
import org.mortbay.jetty.bio.SocketConnector;

/**
 * Exposes a Java object reference.
 */
public class Joh {

  private final static Log LOG = LogFactory.getLog(Joh.class);
  
  public final static String HTTP_PORT_KEY = "_http_port";
  
  private final static int DEFAULT_HTTP_PORT = 8080;
  
  public static void expose(Object obj, Map<String,Object> config)
      throws Exception {
    Server server = new Server();
    Connector connector = new SocketConnector();
    if (config != null && config.containsKey(HTTP_PORT_KEY)) {
      connector.setPort((Integer) config.get(HTTP_PORT_KEY));
    } else {
      connector.setPort(DEFAULT_HTTP_PORT);
    }
    server.setConnectors(new Connector[] {connector});
    Handler handler = new JohHandler(obj);
    server.setHandler(handler);
    server.start();
    server.join();
  }
  
  public static Map<String,String> getParameters(
      HttpServletRequest request) {
    Map<String,String> parameters = 
      new CaseInsensitiveMap<String>();
    Map<String,String[]> params = request.getParameterMap();
    for (String key : params.keySet()) {
      parameters.put(key, StringUtils.join(params.get(key), ","));
    }
    return parameters;
  }

  public static void error(Exception e, HttpServletRequest request,
      HttpServletResponse response) {
    response.setContentType("text/html");
    try {
      PrintWriter responseWriter = response.getWriter();
      responseWriter.println("<html><head><title>Error Page</title></head>");
      responseWriter.println("<body><font color=\"red\">");
      e.printStackTrace(responseWriter);
      responseWriter.println("</font></body></html>");
      responseWriter.flush();
      responseWriter.close();
      response.setStatus(HttpServletResponse.SC_BAD_REQUEST);
      LOG.error(e);
    } catch (IOException ioe) {
      LOG.error(ioe);
    }
  }
}

JohHandler.java

The JohHandler hooks into the Jetty request lifecycle, so that the JohHandler is invoked to handle a request. The URI of the request is mapped to a method of the Facade class, and any required method parameters are extracted from the request parameters. The invocation of the method on the Facade will cause the response to be populated with the JSON serialized result of the method call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Source: src/main/java/com/mycompany/joh/JohHandler.java
package com.mycompany.joh;

import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.Map;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.commons.collections15.map.CaseInsensitiveMap;
import org.mortbay.jetty.HttpConnection;
import org.mortbay.jetty.Request;
import org.mortbay.jetty.handler.AbstractHandler;

/**
 * Hook into the Jetty infrastructure.
 */
public class JohHandler extends AbstractHandler {

  private Map<String,Method> methodMap = 
    new CaseInsensitiveMap<Method>();
  
  private Object javaObject;
  
  public JohHandler(Object obj) {
    super();
    this.javaObject = obj;
    Method[] methods = obj.getClass().getMethods();
    for (Method method : methods) {
      methodMap.put(method.getName(), method);
    }
  }
  
  public void handle(String target, HttpServletRequest request,
      HttpServletResponse response, int dispatch) 
      throws IOException, ServletException {
    Request req = (request instanceof Request ? 
      (Request) request : 
      HttpConnection.getCurrentConnection().getRequest());
    // strip off the leading "/" and lowercase the target. The target is
    // the same as the requestURI from the HttpServletRequest object.
    String methodName = request.getRequestURI().substring(1);
    if (methodMap.containsKey(methodName)) {
      Method method = methodMap.get(methodName);
      try {
        method.invoke(javaObject, new Object[] {request, response});
        response.setStatus(HttpServletResponse.SC_OK);
      } catch (InvocationTargetException e) {
        Joh.error(e, request, response);
      } catch (IllegalAccessException e) {
        Joh.error(e, request, response);
      }
    } else {
      Joh.error(new Exception("No such method: " + methodName), 
        request, response);
    }
    req.setHandled(true);
  }
}

JohClient.java

The JobClient is a standard Apache HTTP client that passes in HTTP GET calls to Jetty and gets back a JSON response. It then deserializes the JSON response into the object expected by the client. Here is the code for it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Source: src/main/java/com/mycompany/joh/JohClient.java
package com.mycompany.joh;

import org.apache.commons.httpclient.DefaultHttpMethodRetryHandler;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.params.HttpMethodParams;
import org.codehaus.jackson.map.ObjectMapper;

/**
 * Simple Jetty HTTP Client to test our Joh enabled BlogDict.
 * @author Sujit Pal
 * @version $Revision$
 */
public class JohClient {

  public Object request(String url, Class<?> clazz) {
    HttpClient client = new HttpClient();
    GetMethod method = new GetMethod(url);
    method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
      new DefaultHttpMethodRetryHandler(3, false));
    try {
      int status = client.executeMethod(method);
      if (status != HttpStatus.SC_OK) {
        throw new Exception(method.getStatusText() + 
          " [" + method.getStatusCode() + "]");
      }
      ObjectMapper mapper = new ObjectMapper();
      return mapper.readValue(method.getResponseBodyAsStream(), clazz);
    } catch (Exception e) {
      throw new RuntimeException(e);
    } finally {
      method.releaseConnection();
    }
  }
}

Some examples of calling this component are described in the Facade Components section below.

Facade Components

The original component to be exposed is our tired-but-tested BlogDict class which reads a text file and builds an internal data structure, then exposes methods which allows a caller to query parts of the data structure. Here is the code in Java (last week's post contains the Python version).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// Source: src/test/java/com/mycompany/joh/BlogDict.java
package com.mycompany.joh;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

import org.apache.commons.lang.StringUtils;

/**
 * Simple Java Object to be exposed by JOH.
 */
public class BlogDict {

  private String file;
  
  private Set<String> labels;
  private Map<String,Set<String>> synonyms;
  private Map<String,Set<String>> categories;
  
  public BlogDict(String file) {
    this.file = file;
    init();
  }
  
  public Set<String> getLabels() {
    return labels;
  }
  
  public Set<String> getSynonyms(String label) {
    if (synonyms.containsKey(label)) {
      return synonyms.get(label);
    } else {
      return Collections.emptySet();
    }
  }
  
  public Set<String> getCategories(String label) {
    if (categories.containsKey(label)) {
      return categories.get(label);
    } else {
      return Collections.emptySet();
    }
  }
  
  protected void init() {
    this.labels = new HashSet<String>();
    this.synonyms = new HashMap<String,Set<String>>();
    this.categories = new HashMap<String,Set<String>>();
    try {
      BufferedReader reader = new BufferedReader(new FileReader(file));
      String line = null;
      while ((line = reader.readLine()) != null) {
        if (line.startsWith("#")) {
          continue;
        }
        String[] cols = StringUtils.splitPreserveAllTokens(line, ":");
        this.labels.add(cols[0]);
        this.synonyms.put(cols[0], new HashSet<String>(
          Arrays.asList(StringUtils.split(cols[1], ","))));
        this.categories.put(cols[0], new HashSet<String>(
          Arrays.asList(StringUtils.split(cols[2], ","))));
      }
      reader.close();
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }
}

The BlogDictFacade provides a Facade that delegates to the BlogDict class and serializes the output into JSON and sends it back in the HTTP response. Each of the public getXXX() methods in the BlogDict has an analog in the BlogDictFacade, although the method signature is different and there is no return type. Here is the code - showing it is probably clearer than explaining it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
// Source: src/test/java/com/mycompany/joh/BlogDictFacade.java
package com.mycompany.joh;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.codehaus.jackson.map.ObjectMapper;

/**
 * The web based facade that will be plugged into JOH.
 */
public class BlogDictFacade {

  private BlogDict blogDict;
  
  public BlogDictFacade(String file) {
    this.blogDict = new BlogDict(file);
  }
  
  protected void init() { /* nothing to do here */ }
  
  protected void destroy() { /* nothing to do here */ }
  
  public void getLabels(HttpServletRequest request, 
      HttpServletResponse response) {
    Set<String> labels = blogDict.getLabels();
    response.setContentType("application/x-javascript");
    try {
      PrintWriter responseWriter = response.getWriter();
      ObjectMapper mapper = new ObjectMapper();
      mapper.writeValue(responseWriter, labels);
      responseWriter.flush();
      responseWriter.close();
    } catch (IOException e) {
      Joh.error(e, request, response);
    }
  }
  
  public void getSynonyms(HttpServletRequest request, 
      HttpServletResponse response) {
    Map<String,String> parameters = Joh.getParameters(request);
    if (parameters.containsKey("label")) {
      Set<String> synonyms = 
        blogDict.getSynonyms(parameters.get("label"));
      response.setContentType("application/x-javascript");
      try {
        PrintWriter responseWriter = response.getWriter();
        ObjectMapper mapper = new ObjectMapper();
        mapper.writeValue(responseWriter, synonyms);
        responseWriter.flush();
        responseWriter.close();
      } catch (IOException e) {
        Joh.error(e, request, response);
      }
    } else {
      Joh.error(new Exception("Parameter 'label' not provided"), 
        request, response);
    }
  }
  
  public void getCategories(HttpServletRequest request, 
      HttpServletResponse response) {
    Map<String,String> parameters = Joh.getParameters(request);
    if (parameters.containsKey("label")) {
      Set<String> categories = 
        blogDict.getCategories(parameters.get("label"));
      response.setContentType("application/x-javascript");
      try {
        PrintWriter responseWriter = response.getWriter();
        ObjectMapper mapper = new ObjectMapper();
        mapper.writeValue(responseWriter, categories); 
        responseWriter.flush();
        responseWriter.close();
      } catch (IOException e) {
        Joh.error(e, request, response);
      }
    } else {
      Joh.error(new Exception("Parameter 'label' not provided"), 
        request, response);
    }
  }
  
  public static void main(String[] argv) throws Exception {
    Map<String,Object> config = new HashMap<String,Object>();
    config.put(Joh.HTTP_PORT_KEY, new Integer(8080));
    Joh.expose(new BlogDictFacade("/home/sujit/bin/blog_dict.txt"), 
      config);
  }
}

Since we are really just testing this stuff at this point, I don't have any client code, so I decided to do away with the client side Facade and just write a unit test that goes directly against the JohClient. This is shown below, to illustrate usage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Source: src/test/java/com/mycompany/joh/JohClientTest.java
package com.mycompany.joh;

import java.util.Set;

import junit.framework.Assert;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.Test;

/**
 * Test for JohClient.
 */
public class JohClientTest {

  private final Log log = LogFactory.getLog(getClass());
  
  @Test
  public void testGetLabel() throws Exception {
    JohClient client = new JohClient();
    Set<String> labels = (Set<String>) client.request(
      "http://localhost:8080/getlabels", Set.class);
    log.debug("labels=" + labels);
    Assert.assertNotNull(labels);
    Assert.assertTrue(labels.contains("crawling"));
  }
  
  @Test
  public void testGetSynonyms() throws Exception {
    JohClient client = new JohClient();
    Set<String> synonyms = (Set<String>) client.request(
      "http://localhost:8080/getsynonyms?label=crawling", Set.class);
    log.debug("synonyms(crawling)=" + synonyms);
    Assert.assertNotNull(synonyms);
    Assert.assertTrue(synonyms.contains("crawler"));
  }
  
  @Test
  public void testGetCategories() throws Exception {
    JohClient client = new JohClient();
    Set<String> categories = (Set<String>) client.request(
      "http://localhost:8080/getcategories?label=crawling", Set.class);
    log.debug("categories(crawling)=" + categories);
    Assert.assertNotNull(categories);
    Assert.assertTrue(categories.contains("lucene"));
  }
}

Shell script

This is Java, so we need to build a shell script to run the server. A simple shell script with all the dependencies in the classpath is shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# Source: src/main/scripts/joh_blogdict.sh
PROJECT_HOME=$HOME/src/gclient
M2_REPO=$HOME/.m2/repository
CLASSPATH=\
$M2_REPO/commons-lang/commons-lang/2.3/commons-lang-2.3.jar:\
$M2_REPO/org/apache/commons/collections15/4.01/collections15-4.01.jar:\
$M2_REPO/commons-logging/commons-logging/1.1/commons-logging-1.1.jar:\
$M2_REPO/log4j/log4j/1.2.14/log4j-1.2.14.jar:\
$M2_REPO/org/mortbay/jetty/jetty/6.1.5/jetty-6.1.5.jar:\
$M2_REPO/org/mortbay/jetty/jetty-util/6.1.5/jetty-util-6.1.5.jar:\
$M2_REPO/org/mortbay/jetty/servlet-api-2.5/6.1.5/servlet-api-2.5-6.1.5.jar:\
$M2_REPO/org/codehaus/jackson/core/1.2.0/core-1.2.0.jar:\
$M2_REPO/org/codehaus/jackson/mapper/1.2.0/mapper-1.2.0.jar:\
$PROJECT_HOME/target/classes:\
$PROJECT_HOME/target/test-classes

java -cp $CLASSPATH com.mycompany.joh.BlogDictFacade 2>&1 | tee $0.log

After running this script on the command prompt, you can hit the BlogDict using either a browser or something like JohClientTest shown above. To shutdown the server, hit CTRL+C.

Conclusion

Although a lot of code is shown here on this blog, in reality, a user who is looking for functionality to expose a Java object needs to only create the Facade object and plug it into JOH using the Joh.expose() call. So the approach is very similar to CherryPy's. In the same spirit, there is no attempt to force the user (via interface, etc) to conform to a specific approach - this is more of a prescriptive approach.