The title of this post is kind of misleading, since you are probably here after unsuccesfully trying to create a BooleanQuery object in PyLucene. I had the same problem but what I describe here is a workaround using Lucene's Query Parser syntax.
What I was trying to do was to query a Lucene index with a main query which was a set of ids, along with a facet as a QueryFilter object. To build the main query, I was using code that looked like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import PyLucene ... def search(): searcher = PyLucene.IndexSearcher(dir) ... # find the ids to query on from database rows = cursor.fetchall() bquery = PyLucene.BooleanQuery() # build up the id query for row in rows: tquery = PyLucene.TermQuery(PyLucene.Term("id", str(row))) bquery.add(tquery, False, False) # now add in the facet bquery.add(PyLucene.TermQuery(PyLucene.Term("facet", facetValue)), True, False) # send query to searcher hits = searcher.search(bquery) numHits = hits.length() for i in range(0, numHits): # do something with the data doc = hits.doc(i) field1 = doc.get("field1") ...
This would give me the error below. I was going by the BooleanQuery.add() signature for the Lucene 1.4 Java version, but it looks like PyLucene.BooleanQuery does not support it.
1 2 3 4 5 6 7
Traceback (most recent call last): File "./myscript.py", line 76, in ? main() ... File "./myscript.py", line 40, in process bquery.add(tquery, False, False) PyLucene.InvalidArgsError: (<type 'PyLucene.BooleanQuery'>, 'add', (<TermQuery: id:8112526>, False, False))
I tried looking for it on Google, but did not find anything useful. In any case, I had to generate this report in a hurry so I did not have lots of time to figure out how to use it.
However, I knew that the query that would be generated would be something like that shown below, which I could generate simply using Lucene's Query Parser Syntax.
+(id:value1 id:value2 ...) +facet:facetValue
So I changed my code to do this instead:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
import PyLucene ... def search(): searcher = PyLucene.IndexSearcher(dir) analyzer = PyLucene.KeywordAnalyzer() ... # find the ids to query on from database rows = cursor.fetchall() ids =  for row in rows: ids.append(str(row)) if (len(ids) == 0): return idQueryPart = string.join(ids, ' OR ') query = PyLucene.QueryParser("id", analyzer).parse( "(" + idQueryPart + ") AND facet:" + facetValue) # send query to searcher hits = searcher.search(query) numHits = hits.length() for i in range(0, numHits): # do something with the data doc = hits.doc(i) field1 = doc.get("field1") ...
So this is probably something that most of you PyLucene users would probably have figured out for themselves, but for those that didn't, I hope the post is useful. Of course, the nicest solution would have been to figure out how to use the PyLucene.BooleanQuery directly. For me, the solution I describe works fine for me, and it kind of makes sense if you think of Python as a scripting language - if we want to talk directly to the API, we should probably use Java instead.
Of course, I may be totally off the mark, and BooleanQuery is really supported in PyLucene and I just don't know how to use it. If this is the case, I would really like to know. Thanks in advance for any help you can provide in this regard.