tag:blogger.com,1999:blog-7583720.post893907290112492975..comments2024-03-05T03:17:02.289-08:00Comments on Salmon Run: Implementing Concordance with Lucene Span QueriesSujit Palhttp://www.blogger.com/profile/06835223352394332155noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-7583720.post-14069993125146244152020-04-29T11:05:49.225-07:002020-04-29T11:05:49.225-07:00Yay! I met Tim Allison couple of years ago at a Ha...Yay! I met Tim Allison couple of years ago at a Haystack conference and I knew he was working on it, but it is very cool to see it finally make it into Lucene.Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-49833189820591067732019-11-24T09:23:22.049-08:002019-11-24T09:23:22.049-08:00Update, apparently this is an upcoming feature in ...Update, apparently this is an upcoming feature in Lucene: https://issues.apache.org/jira/browse/LUCENE-5317breandanhttps://www.blogger.com/profile/14294299214650285183noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-83590391672868541892015-11-10T13:11:55.480-08:002015-11-10T13:11:55.480-08:00Hi Moh, you are welcome and thank you for the comp...Hi Moh, you are welcome and thank you for the compliment. The only change to schema.xml is to set termVectors=true, termPositions=true and termOffsets=true for the field on which you want to build concordances on. The code here is Lucene, and I didn't do this in Solr because it was meant for backend use and I could just read a copy of the production index to get my data. However, if you want Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-89663920678127015792015-11-10T11:45:08.426-08:002015-11-10T11:45:08.426-08:00Thank you so much Sujit for such great article.
I...Thank you so much Sujit for such great article.<br /><br />I want to achieve the same with Drupal. What kind of changes did you made to Solr schema.xml and solrconfig.xml to achieve this concordance functionality?<br /><br />I am not familiar with Java.<br /><br />thanks,<br />MohAnonymoushttps://www.blogger.com/profile/00677738495182623992noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-39450922446192658732013-02-27T11:01:35.871-08:002013-02-27T11:01:35.871-08:00Thanks Asma. To answer your question, I suspect th...Thanks Asma. To answer your question, I suspect that your body field is being analyzed with an analyzer that has a lowercase tokenizer in it, so PDFBox is probably getting analyzed down to "pdfbox". Assuming you have Solr, you can see what "PDFBox" in "body" is analyzed to on its analysis page (otherwise you will have to write a little code) , and use the analyzed Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-76836379519799807272013-02-25T23:16:53.869-08:002013-02-25T23:16:53.869-08:00hi Sujit, very nice and helpful article.i have app...hi Sujit, very nice and helpful article.i have apply your code to a small pdf document containing these 3 lines of text<br />Hello World by PDFBox <br />World is beautiful place <br />No world is better than home <br />i search for {"input","better", "beautiful" , "World","PDFBox" };<br />the result is <br />==== concordance for term='input&#Anonymoushttps://www.blogger.com/profile/02020323169141031683noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-11313064221981433262011-10-07T11:18:17.146-07:002011-10-07T11:18:17.146-07:00Hi Manas, haven't used spans.skipTo() myself, ...Hi Manas, haven't used spans.skipTo() myself, so dont have a code example available offhand. I'll check it out though and if I find a way to use it, I'll post it.Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-68533349997467312022011-09-20T14:02:28.379-07:002011-09-20T14:02:28.379-07:00Hi Sujit,
Nice article. Helping me a lot in a per...Hi Sujit,<br /><br />Nice article. Helping me a lot in a personal project where I'm doing some text analysis.<br /><br />Not being very familiar with low level Lucene apis, I was wondering if you could point me to the proper direction for the following:<br /><br />I wanted to restrict the span display for a specific document. I looked into the user groups for some idea - but didn't find Anonymousnoreply@blogger.com