Comments on Salmon Run: Entity Discovery using Mahout CollocDriver

I see, thank you for the detailed explanation. We ...

2013-10-29T10:15:51.592-07:00

I see, thank you for the detailed explanation. We do something similar (but cruder version of what you are doing) during our concept mapping process - we restrict analysis to noun phrases (although that part is now disabled because of concerns about throwing the baby out of the bathwater) and we post-process the concepts so that the longest match is considered. Thanks to your explanation, I think

Not really Synonym normalization. This is how I di...

2013-10-28T11:29:01.967-07:00

Not really Synonym normalization. This is how I did it. For example if I take the document http://www.washingtonpost.com/blogs/the-fix/wp/2013/10/24/john-boehners-next-big-test-immigration-reform/

Just taking the (NN || NNS) and (NNP || NNPS) would give the following candidates (also note that there are some editorial mistakes in them as well :-) which will get discarded)

[

My goal is more modest, its to help human taxonomy...

2013-10-26T10:59:29.393-07:00

My goal is more modest, its to help human taxonomy editors discover new concepts in a corpus of text. Do you normalize synonyms automatically? If you do, would appreciate some pointers.

I used it to normalize concepts extracted via pos ...

2013-10-24T18:08:13.259-07:00

I used it to normalize concepts extracted via pos tagging since I did not want extraneous words. For example if a doc contains several variations of a concept like - computer security, computer vulnerability, hacking, computer system vulnerability etc - I use this this to normalize and find base concept

Hi Ravi, it looks interesting, similar to RAKE wit...

2013-10-24T10:08:35.559-07:00

Hi Ravi, it looks interesting, similar to RAKE with the rules but uses ngrams (which almost every other approach uses). I was going to try implementing and benchmarking it against some job description data I had (from the Adzuna challenge on Kaggle) and see how it compares with some of the other approaches.

Let me know your thoughts as everybody understands...

2013-10-23T11:22:32.978-07:00

Let me know your thoughts as everybody understands an algo differently. I just want to corroborate my understanding ;-)

Thanks Ravi this looks very interesting. Definitel...

2013-10-21T18:55:34.762-07:00

Thanks Ravi this looks very interesting. Definitely something worth trying out.

Hello Sujit, I tried several algorithms (RA...

2013-10-21T08:43:04.931-07:00

Hello Sujit,
I tried several algorithms (RAKE, PMI, N-Grams, Maximum Entropy etc) for concept/Theme extraction from document texts and found this decent paper from stanford which gave reasonably good results although the algorithm itself is pretty basic.

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=85057D4ADAAD516A5F763D7EC94F5B66?doi=10.1.1.173.5881&rep=