tag:blogger.com,1999:blog-7583720.post4148211253087544310..comments2024-03-05T03:17:02.289-08:00Comments on Salmon Run: Lucene: A Token Concatenating TokenFilterSujit Palhttp://www.blogger.com/profile/06835223352394332155noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-7583720.post-48904053096901709522014-02-19T09:15:21.027-08:002014-02-19T09:15:21.027-08:00You are welcome, and thank you for the code.
You are welcome, and thank you for the code.<br />Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-68966598864836836682014-02-19T08:08:21.956-08:002014-02-19T08:08:21.956-08:00Hi Sujit,
Thanks a lot for posting this! I also w...Hi Sujit,<br /><br />Thanks a lot for posting this! I also wanted to tell that changing<br /><br />concat = false;<br />return false;<br /><br />to <br /><br />concat = false;<br />phrases.clear();<br />words.clear();<br />return false;<br /><br />Would make it work with multi-valued fields. Thank you once again!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-20984497845716805012012-07-11T07:57:00.366-07:002012-07-11T07:57:00.366-07:00Hi Alex, the phrases data structure is global, so ...Hi Alex, the phrases data structure is global, so it retains state across calls to incrementToken(). The idea is that the caller (Analyzer) will instantiate the Tokenizer, then call analyze() to pass in the data to be tokenized. The analyze() method calls incrementToken() in a loop and expects tokens back. So rather than changing upstream code, we make the splitting and buffering happens within Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-34170257927863252612012-07-03T13:17:25.123-07:002012-07-03T13:17:25.123-07:00Hi Sujit,
I came across your blog when trying to ...Hi Sujit, <br />I came across your blog when trying to learn how to write my own custom filter. I feel that I have a good grasp on how to start now, but I have a quick question regarding this piece of code you posted:<br /><br />while (phrases.size() > 0) {<br /> String phrase = phrases.removeFirst();<br /> restoreState(current);<br /> clearAttributes();<br /> Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-7868902629226256742011-12-04T18:55:51.424-08:002011-12-04T18:55:51.424-08:00Thanks Prashant, and yes, that would work as well....Thanks Prashant, and yes, that would work as well. The idea here though is to accumulate all possible combinations of the terms so far, along with synonyms added to some of the terms in the phrase. So assume that our phrase is "quick fox" and our dictionary has {quick => fast, faster}. So the synonym tokenizer in the analyzer chain will produce the following terms: {quick, fast, Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-34670288319689998442011-11-27T05:16:27.383-08:002011-11-27T05:16:27.383-08:00Hi Sujit,
i have been following u for a log time.
...Hi Sujit,<br />i have been following u for a log time.<br />I have a query as we can keep multiple Token of Single words in same field Like Filed as MappedFSN:Quick<br />And we have token for Quick as Faster,Fast,Fasting and so on.<br />So can we have Numeric Tokens Like this in One field.<br />Like CATEGORYID:1234<br />And Tokens 324,34345,5467,34329 etc.<br />Regards<br />Prashantshoonyahttps://www.blogger.com/profile/15254144034371536667noreply@blogger.com