The title of this post is kind of misleading, since you are probably here after unsuccesfully trying to create a BooleanQuery object in PyLucene. I had the same problem but what I describe here is a workaround using Lucene's Query Parser syntax.
What I was trying to do was to query a Lucene index with a main query which was a set of ids, along with a facet as a QueryFilter object. To build the main query, I was using code that looked like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import PyLucene
...
def search():
searcher = PyLucene.IndexSearcher(dir)
...
# find the ids to query on from database
rows = cursor.fetchall()
bquery = PyLucene.BooleanQuery()
# build up the id query
for row in rows:
tquery = PyLucene.TermQuery(PyLucene.Term("id", str(row[0])))
bquery.add(tquery, False, False)
# now add in the facet
bquery.add(PyLucene.TermQuery(PyLucene.Term("facet", facetValue)), True, False)
# send query to searcher
hits = searcher.search(bquery)
numHits = hits.length()
for i in range(0, numHits):
# do something with the data
doc = hits.doc(i)
field1 = doc.get("field1")
...
|
This would give me the error below. I was going by the BooleanQuery.add() signature for the Lucene 1.4 Java version, but it looks like PyLucene.BooleanQuery does not support it.
1 2 3 4 5 6 7 | Traceback (most recent call last):
File "./myscript.py", line 76, in ?
main()
...
File "./myscript.py", line 40, in process
bquery.add(tquery, False, False)
PyLucene.InvalidArgsError: (<type 'PyLucene.BooleanQuery'>, 'add', (<TermQuery: id:8112526>, False, False))
|
I tried looking for it on Google, but did not find anything useful. In any case, I had to generate this report in a hurry so I did not have lots of time to figure out how to use it.
However, I knew that the query that would be generated would be something like that shown below, which I could generate simply using Lucene's Query Parser Syntax.
1 | +(id:value1 id:value2 ...) +facet:facetValue
|
So I changed my code to do this instead:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import PyLucene
...
def search():
searcher = PyLucene.IndexSearcher(dir)
analyzer = PyLucene.KeywordAnalyzer()
...
# find the ids to query on from database
rows = cursor.fetchall()
ids = []
for row in rows:
ids.append(str(row[0]))
if (len(ids) == 0):
return
idQueryPart = string.join(ids, ' OR ')
query = PyLucene.QueryParser("id", analyzer).parse(
"(" + idQueryPart + ") AND facet:" + facetValue)
# send query to searcher
hits = searcher.search(query)
numHits = hits.length()
for i in range(0, numHits):
# do something with the data
doc = hits.doc(i)
field1 = doc.get("field1")
...
|
So this is probably something that most of you PyLucene users would probably have figured out for themselves, but for those that didn't, I hope the post is useful. Of course, the nicest solution would have been to figure out how to use the PyLucene.BooleanQuery directly. For me, the solution I describe works fine for me, and it kind of makes sense if you think of Python as a scripting language - if we want to talk directly to the API, we should probably use Java instead.
Of course, I may be totally off the mark, and BooleanQuery is really supported in PyLucene and I just don't know how to use it. If this is the case, I would really like to know. Thanks in advance for any help you can provide in this regard.
2 comments (moderated to prevent spam):
Hi, I think you may have been working off old documentation... I'm using PyLucene 2.2 and the following works fine - probably quicker than chaining terms in the query string!
limitedQ = BooleanQuery()
firstQ = parser.parse(query)
secondQ = TermQuery(Term(key, value))
limitedQ.add(secondQ,BooleanClause.Occur.MUST)
limitedQ.add(firstQ,BooleanClause.Occur.SHOULD)
Hi James, thanks very much for the tip, I will try this out. I think I may have been using PyLucene 2.0 (unfortunately I can't say for sure since my disk crashed since I wrote this script, and currently I don't have PyLucene installed since I haven't needed it since the new disk was put in). If PyLucene version numbering tracks Lucene's, then I am almost sure it was PyLucene 2.0, since we were using Lucene 2.0 at that time.
Post a Comment