Comments on Salmon Run: Lucene: A Token Concatenating TokenFilter

You are welcome, and thank you for the code.

2014-02-19T09:15:21.027-08:00

You are welcome, and thank you for the code.

Hi Sujit, Thanks a lot for posting this! I also w...

2014-02-19T08:08:21.956-08:00

Hi Sujit,

Thanks a lot for posting this! I also wanted to tell that changing

concat = false;
return false;

to

concat = false;
phrases.clear();
words.clear();
return false;

Would make it work with multi-valued fields. Thank you once again!

Hi Alex, the phrases data structure is global, so ...

2012-07-11T07:57:00.366-07:00

Hi Alex, the phrases data structure is global, so it retains state across calls to incrementToken(). The idea is that the caller (Analyzer) will instantiate the Tokenizer, then call analyze() to pass in the data to be tokenized. The analyze() method calls incrementToken() in a loop and expects tokens back. So rather than changing upstream code, we make the splitting and buffering happens within

Hi Sujit, I came across your blog when trying to ...

2012-07-03T13:17:25.123-07:00

Hi Sujit,
I came across your blog when trying to learn how to write my own custom filter. I feel that I have a good grasp on how to start now, but I have a quick question regarding this piece of code you posted:

while (phrases.size() > 0) {
String phrase = phrases.removeFirst();
restoreState(current);
clearAttributes();

Thanks Prashant, and yes, that would work as well....

2011-12-04T18:55:51.424-08:00

Thanks Prashant, and yes, that would work as well. The idea here though is to accumulate all possible combinations of the terms so far, along with synonyms added to some of the terms in the phrase. So assume that our phrase is "quick fox" and our dictionary has {quick => fast, faster}. So the synonym tokenizer in the analyzer chain will produce the following terms: {quick, fast,

Hi Sujit, i have been following u for a log time. ...

2011-11-27T05:16:27.383-08:00

Hi Sujit,
i have been following u for a log time.
I have a query as we can keep multiple Token of Single words in same field Like Filed as MappedFSN:Quick
And we have token for Quick as Faster,Fast,Fasting and so on.
So can we have Numeric Tokens Like this in One field.
Like CATEGORYID:1234
And Tokens 324,34345,5467,34329 etc.
Regards
Prashant