tag:blogger.com,1999:blog-7583720.post4648330197313852384..comments2024-03-17T13:30:18.387-07:00Comments on Salmon Run: Finding Significant Phrases in Tweets with NLTKSujit Palhttp://www.blogger.com/profile/06835223352394332155noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-7583720.post-89597532990953427842014-05-04T09:39:31.338-07:002014-05-04T09:39:31.338-07:00Hi Surafel, we can use the fact that we are constr...Hi Surafel, we can use the fact that we are constructing a significant N-gram by joining a (N-1)-gram with a new word to post process the output to do what you want. For each phrase, you could look for [:-1] of it in the phrase list and mark it for deletion. Once you are done you just filter for the phrases that are not marked for deletion. (Deleted the old comment that involved looking through Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-4513836972850920492014-05-04T07:54:10.029-07:002014-05-04T07:54:10.029-07:00Hi Sujit Pal,
it may not be directly related with ...Hi Sujit Pal,<br />it may not be directly related with this one. But could you give me some idea regarding: how to exclude those n-grams that appear in a lower list with one additional token at the end.<br /><br />For example: the president of (trigram), the president of the (four-gram), the president of the states (five-gram). The five-gram excludes all lower n-grams. Thanks alot : ) <br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-18276517018322483782013-08-21T11:32:25.576-07:002013-08-21T11:32:25.576-07:00Thanks for the pointer Ravi, and no I haven't,...Thanks for the pointer Ravi, and no I haven't, I'll take a look.<br />Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-70897435429172528052013-08-21T08:09:47.793-07:002013-08-21T08:09:47.793-07:00Have you tried TwitterNLP from Carnegie Mellon ?I ...Have you tried TwitterNLP from Carnegie Mellon ?I believe you can get better entities/collocations and can get rid of Twitterisms :-)<br /><br />Ravi Kiran Bhaskar<br />Technical Architect<br />The Washington PostAnonymousnoreply@blogger.com