Friday, November 16, 2012

An ElasticSearch Web Client with Scala and Play2

In this post, I describe the second part of my submission for the Typesafe Developer Contest. This part is a rudimentary web based search client to query an ElasticSearch (ES) server. It is a Play2/Scala web application that communicates with the ES server via its JSON query DSL.

The webapp has a single form that allows you to specify a Lucene query and various parameters and returns a HTML or JSON response. It will probably remind Solr developers of the admin form. I find the Solr admin form very useful for trying out qeries before baking them into code, and I envision a similar use for this webapp for ES search developers.

Since ES provides a rich JSON based Query DSL, the form here has a few more features than the Solr admin form, such as allowing for faceting and sorting. Although in the interests of full disclosure, it provides only a subset of the variations possible via direct use of JSON and curl on the command line. But its good for quick and dirty verification of search ideas. In order to quickly get started with ES's query DSL, I found this DZone article by Peter Kar and this blog post by Pulkit Singhal very useful (apart from the ES docs themselves, of course).

Since Play2 was completely new to me a week ago and now I am the proud author of a working webapp, I would like to share with you some of my insights into this framework. I typically learn new things by making analogies to stuff I already know, so I will explain Play2 by making analogies to Spring. If you know Spring, it may be helpful, and if you don't, well, maybe it was not that terribly helpful anyway...

Routing in Play2 is done using the conf/routes file, which maps URL patterns and HTTP methods to Play2 controller actions. Actions can be thought of as @RequestMapping methods in a Multi-action Spring controller, and are basically functions that transform a Request into a Response. A response can be a String wrapped in an Ok() method or it can be a method call into a view with some data, which returns a templated string to Ok(). There, thats it - about everything you need to know about Play2 to get to using it.

Unlike the last time (with Akka), this time around I did not use the Typesafe Play tutorial. Instead I downloaded Play2 and used the play command to build a new web application template (play new search), then to compile and run it. The best tutorial I found was this one on flurdy.com, which covers everything from choice of IDE to deployment on Heroku and everything in between. Other useful sources are Play's documentation (available with the Play2 download) and this example Play2 app on GitHub.

Here is my conf/routes file. I added the two entries under Search pages. They both respond to HTTP GET requests and call the form() and search() Actions respectively. The other two entries come with the generated project and are needed (so don't delete them).

1
 2
 3
 4
 5
 6
 7
 8
 9
10
# conf/routes
# Home page
GET     /                           controllers.Application.index

# Search pages
GET     /form                       controllers.Application.form
GET     /search                     controllers.Application.search

# Map static resources from the /public folder to the /assets URL path
GET     /assets/*file               controllers.Assets.at(path="/public", file)

There is another file in the conf directory, called conf/application.conf. It contains properties required by the default application. I added a new property for the URL for the ES server in this file.

1
2
3
# conf/application.conf
...
es.server="http://localhost:9200/"

The Play2 "new" command also generates a skeleton controller app/controllers/Application.scala, into which we add the two new form and search Actions. Here is the completed Application.scala file.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// app/controllers/Application.scala
package controllers

import models.{Searcher, SearchParams}
import play.api.data.Forms.{text, number, mapping}
import play.api.data.Form
import play.api.libs.json.{Json, JsValue}
import play.api.libs.ws.WS
import play.api.mvc.{Controller, Action}
import play.api.Play

object Application extends Controller {

  // define the search form
  val searchForm = Form(
    mapping (
      "index" -> text,
      "query" -> text,
      "filter" -> text,
      "start" -> number,
      "rows" -> number,
      "sort" -> text,
      "writertype" -> text,
      "fieldlist" -> text,
      "highlightfields" -> text,
      "facetfields" -> text
    ) (SearchParams.apply)(SearchParams.unapply)
  )
  
  // configuration parameters from conf/application.conf
  val conf = Play.current.configuration
  val server = conf.getString("es.server").get

  // home page - redirects to search form
  def index = Action {
    Redirect(routes.Application.form)
  }

  // form page
  def form = Action {
    val rsp = Json.parse(WS.url(server + "_status").
      get.value.get.body)
    val indices = ((rsp \\ "indices")).
      map(_.as[Map[String,JsValue]].keySet.head)
    Ok(views.html.index(indices, searchForm))
  } 

  // search results action - can send view to one of
  // three different pages (xmlSearch, jsonSearch or htmlSearch)
  // depending on value of writertype
  def search = Action {request =>
    val params = request.queryString.
      map(elem => elem._1 -> elem._2.headOption.getOrElse(""))
    val searchParams = searchForm.bind(params).get
    val result = Searcher.search(server, searchParams)
    searchParams.writertype match {
      case "json" => Ok(result.raw).as("text/javascript")
      case "html" => Ok(views.html.search(result)).as("text/html")
    }
  }
}

We first define a Search form and map it to the SearchParams class (defined in the model, below). The index Action has been changed to redirect to the form Action. The form method makes a call to the ES server to get a list of indexes (ES can support multiple indexes with different schemas within the same server), and then delegates to the index view with this list and an empty searchForm.

The search Action binds the request to the searchParams bean, then sends this bean to the Searcher.search() method, which returns a SearchResult object containing the results of the search. Two different views are supported - the HTML view (delegating to the search view template) and the raw JSON view that just dumps the JSON response from ES.

The respective views for the form and search are shown below. Not much to explain here, except that its another templating language that you have to learn. Its set up like a function - you pass in parameters that you use in the template. I followed the lead of the flurdy.com tutorial referenced above and kept it as HTML-ish as possibly, but Play2 has an extensive templating language of its own that you may prefer.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
@** app/views/index.scala.html **@
@(indices: Seq[String], searchForm: Form[SearchParams])

@import helper._

@main("Search with ElasticSearch") {
  
  <h2>Search with ElasticSearch</h2>
  @form(action = routes.Application.search) {  
    <fieldset>
      <legend>Index Name</legend>
      <select name="index">
      @for(index <- indices) {
        <option value="@index">@index</option>
      }
      </select>
    </fieldset>
    <fieldset>
      <legend>Lucene Query</legend>
      <input type="textarea" name="query" value="*:*" maxlength="1024" rows="10" cols="80"/>
    </fieldset>
    <fieldset>
      <legend>Filter Query</legend>
      <input type="textarea" name="filter" value="" maxlength="512" rows="5" cols="80"/>
    </fieldset>  
    <fieldset>
      <legend>Start Row</legend>
      <input type="text" name="start" value="0" maxlength="5"/>
    </fieldset>
    <fieldset>
      <legend>Maximum Rows Returned</legend>
      <input type="text" name="rows" value="10" maxlength="5"/>
    </fieldset>
    <fieldset>
      <legend>Sort Fields</legend>
      <input type="text" name="sort" value="" maxlength="80" size="40"/>
    </fieldset>
    <fieldset>
      <legend>Output Type</legend>
      <select name="writertype">
        <option value="html" selected="true">HTML</option>
        <option value="json">JSON</option>
      </select>
    </fieldset>
    <fieldset>
      <legend>Fields To Return</legend>
      <input type="text" name="fieldlist" value="" maxlength="80" size="40"/>
    </fieldset>
    <fieldset>
      <legend>Fields to Highlight</legend>
      <input type="text" name="highlightfields" value="" maxlength="80" size="40"/>
    </fieldset>
    <fieldset>
      <legend>Fields to Facet</legend>
      <input type="text" name="facetfields" value="" maxlength="80" size="40"/>
    </fieldset>
    <input type="submit" value="Search"/>
  }
}

The resulting input form looks like this:


1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
@** app/views/search.scala.html **@
@(result: SearchResult)

@import helper._

@main("Search with ElasticSearch - HTML results") {
  <h2>Search Results</h2>
  <p><b>@result.meta("start") to @result.meta("end") results of @result.meta("numFound") in @result.meta("QTime") ms</b></p>
  <hr/>
  <p><b>JSON Query: </b>@result.meta("query_json")</p>
  <hr/>
  @for(doc <- result.docs) {
    <fieldset>
      <table cellspacing="0" cellpadding="0" border="1" width="100%">
      @for((fieldname, fieldvalue) <- doc) {
        <tr valign="top">
          <td width="20%"><b>@fieldname</b></td>
          <td width="80%">@fieldvalue</td>
        </tr>
      }
      </table>
    </fieldset>
  }
  <hr/>
}

Finally, we come to the part of the application that is not autogenerated by Play2 and which contains all the business logic of the application - the model. Here is the code.

1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
// app/models/Searcher.scala
package models

import scala.Array.canBuildFrom

import play.api.libs.json.{Json, JsValue}
import play.api.libs.ws.WS

case class SearchResult(
  meta: Map[String,Any], 
  docs: Seq[Seq[(String,JsValue)]],
  raw: String
)

case class SearchParams(
  index: String,
  query: String,
  filter: String,
  start: Int,
  rows: Int,
  sort: String,
  writertype: String, 
  fieldlist: String,
  highlightfields: String,
  facetfields: String
)

object Searcher {
  
  def search(server: String, params: SearchParams): SearchResult = {
    val payload = Searcher.buildQuery(params)
    val rawResponse = WS.url(server + params.index + 
      "/_search?pretty=true").post(payload).value.get.body
    println("response=" + rawResponse)
    val rsp = Json.parse(rawResponse)
    val meta = (rsp \ "error").asOpt[String] match {
      case Some(x) => Map(
        "error" -> x,
        "status" -> (rsp \ "status").asOpt[Int].get
      )
      case None => Map(
        "QTime" -> (rsp \ "took").asOpt[Int].get,
        "start" -> params.start,
        "end" -> (params.start + params.rows),
        "query_json" -> payload,
        "numFound" -> (rsp \ "hits" \ "total").asOpt[Int].get,
        "maxScore" -> (rsp \ "hits" \ "max_score").asOpt[Float].get
      )
    }
    val docs = if (meta.contains("error")) Seq()
    else {
      val hits = (rsp \ "hits" \ "hits").asOpt[List[JsValue]].get
      val idscores = hits.map(hit => Map(
        "_id" -> (hit \ "_id"),
        "_score" -> (hit \ "_score")))
      val fields = hits.map(hit => 
        (hit \ "_source").asOpt[Map[String,JsValue]].get)
      idscores.zip(fields).
        map(tuple => tuple._1 ++ tuple._2).
        map(doc => doc.toSeq.sortWith((doc1, doc2) => doc1._1 < doc2._1))
    }
    new SearchResult(meta, docs, rawResponse)
  }
  
  def buildQuery(params: SearchParams): String = {
    val queryQuery = Json.toJson(
      if (params.query.isEmpty || "*:*".equals(params.query))
        Map("match_all" -> Map.empty[String,String])
      else Map("query_string" -> Map("query" -> params.query)))
    val queryFilter = if (params.filter.isEmpty) null
      else Json.toJson(Map("query_string" -> Json.toJson(params.filter)))
    val queryFacets = if (params.facetfields.isEmpty) null
      else {
        val fields = params.facetfields.split(",").map(_.trim)
        Json.toJson(fields.zip(fields.
          map(field => Map("terms" -> Map("field" -> field)))).toMap)
      }
    val querySort = if (params.sort.isEmpty) null
      else Json.toJson(params.sort.split(",").map(_.trim).map(field => 
        if (field.toLowerCase.endsWith(" asc") || 
            field.toLowerCase.endsWith(" desc")) 
          (field.split(" ")(0), field.split(" ")(1)) 
        else (field, "")).map(tuple => 
          if (tuple._2.isEmpty) Json.toJson(tuple._1)
          else Json.toJson(Map(tuple._1 -> tuple._2))))  
    val queryFields = if (params.fieldlist.isEmpty) null
      else Json.toJson(params.fieldlist.split(",").map(_.trim))
    val queryHighlight = if (params.highlightfields.isEmpty) null
      else {
        val fields = params.highlightfields.split(",").map(_.trim)
        Json.toJson(Map("fields" -> fields.zip(fields.
          map(field => Map.empty[String,String])).toMap))
      }
    Json.stringify(Json.toJson(Map(
      "from" -> Json.toJson(params.start),
      "size" -> Json.toJson(params.rows),
      "query" -> queryQuery,
      "filter" -> queryFilter,
      "facets" -> queryFacets,
      "sort" -> querySort,
      "fields" -> queryFields,
      "highlight" -> queryHighlight).
      filter(tuple => tuple._2 != null)))
  }
}

The first two are simple case classes, SearchParams and SearchResults are an FBO (Form Backing Object) and DTO (Data Transfer Object) respectively from the Spring world. The search() method takes the ES server URL and the filled in SearchParams object, calls buildQuery() to build the ES Query JSON, then hits the ES server. It then parses the JSON response from ES to create the SearchResult bean, which is passes back to the search Action. The SearchResults object contains a Map containing response metadata, a List of List of key-value pairs which contain the documents, and the raw JSON response from ES.

Here are some screenshots of the results for "hedge fund" from our Enron index that we built using the code from the previous post.






The one on the left shows HTML results (and also shows the JSON query that one would need to use to get the results. The one on the right shows the raw JSON results from the ES server.

Thats all I have for this week. Hope you found it interesting.

Update 2011-11-20 - There were some minor bugs caused by the fields parameter being blank. If the fields parameter is blank, the _source JSON field is returned by ES instead of an array of field objects. The fix is to pass in a "*" (all fields) as the default for the fields parameter. The updated code can be found on my GitHub page.

4 comments (moderated to prevent spam):

Anonymous said...

This post is quite old but would you mind updating it, so it works with the current versions of play/ES?

I'd really appreciate that, if you have the time.

Thanks in advance,
Sebastian

Sujit Pal said...

No promises, I have a lot in my queue at the moment unfortunately, I'll respond back once done. Can you describe what happens (or what fails) when you use it, and what versions of Play and ES you are using?

Anonymous said...

Thank you for responding and the future effort!

My ES version is 1.2.1 and play is the latest as well, it says: activator 1.2.2.


The first error is:

[info] Compiling 7 Scala sources and 1 Java source to /Users/Sebastian/Documents/Github/es-web-client/target/scala-2.11/classes...
[error] /Users/Sebastian/Documents/Github/es-web-client/app/controllers/Application.scala:46: value body is not a member of scala.util.Try[play.api.libs.ws.WSResponse]
[error] get.value.get.body)
[error] ^
[error] /Users/Sebastian/Documents/Github/es-web-client/app/models/Searcher.scala:32: You do not have an implicit Application in scope. If you want to bring the current running Application into context, just add import play.api.Play.current
[error] val rawResponse = WS.url(server + params.index +
[error] ^
[error] two errors found
[error] (compile:compile) Compilation failed
[error] application -

! @6ijc9h95k - Internal server error, for (GET) [/] ->

play.PlayExceptions$CompilationException: Compilation error[value body is not a member of scala.util.Try[play.api.libs.ws.WSResponse]]
at play.PlayExceptions$CompilationException$.apply(PlayExceptions.scala:27) ~[na:na]
at play.PlayExceptions$CompilationException$.apply(PlayExceptions.scala:27) ~[na:na]
at scala.Option.map(Option.scala:145) ~[scala-library-2.11.1.jar:na]
at play.PlayReloader$$anon$1$$anonfun$play$PlayReloader$$anon$$taskFailureHandler$1.apply(PlayReloader.scala:297) ~[na:na]
at play.PlayReloader$$anon$1$$anonfun$play$PlayReloader$$anon$$taskFailureHandler$1.apply(PlayReloader.scala:292) ~[na:na]



If I do what it tells me, I get the following error:

[info] Compiling 7 Scala sources and 1 Java source to /Users/Sebastian/Documents/Github/es-web-client/target/scala-2.11/classes...
[error] /Users/Sebastian/Documents/Github/es-web-client/app/controllers/Application.scala:46: value body is not a member of scala.util.Try[play.api.libs.ws.WSResponse]
[error] get.value.get.body)
[error] ^
[error] /Users/Sebastian/Documents/Github/es-web-client/app/models/Searcher.scala:32: You do not have an implicit Application in scope. If you want to bring the current running Application into context, just add import play.api.Play.current
[error] val rawResponse = WS.url(server + params.index +
[error] ^
[error] two errors found
[error] (compile:compile) Compilation failed
[error] application -

! @6ijc9o3bh - Internal server error, for (GET) [/] ->

play.PlayExceptions$CompilationException: Compilation error[value body is not a member of scala.util.Try[play.api.libs.ws.WSResponse]]
at play.PlayExceptions$CompilationException$.apply(PlayExceptions.scala:27) ~[na:na]
at play.PlayExceptions$CompilationException$.apply(PlayExceptions.scala:27) ~[na:na]
at scala.Option.map(Option.scala:145) ~[scala-library-2.11.1.jar:na]
at play.PlayReloader$$anon$1$$anonfun$play$PlayReloader$$anon$$taskFailureHandler$1.apply(PlayReloader.scala:297) ~[na:na]
at play.PlayReloader$$anon$1$$anonfun$play$PlayReloader$$anon$$taskFailureHandler$1.apply(PlayReloader.scala:292) ~[na:na]



I'm new to all of this so this working example would really help. Any help is appreciated, thank you in advance! :)

Sebastian

Sujit Pal said...

Thanks for the info, Sebastian. Let me look at it and get back to you once I have some time.