tag:blogger.com,1999:blog-7583720.post4045978726834373619..comments2024-03-17T13:30:18.387-07:00Comments on Salmon Run: Implementing Concept Subsumption with BitsetsSujit Palhttp://www.blogger.com/profile/06835223352394332155noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-7583720.post-22920437994626436112011-09-04T11:22:30.180-07:002011-09-04T11:22:30.180-07:00Yes, sorry and thanks for correcting, I guess I re...Yes, sorry and thanks for correcting, I guess I remembered seeing a set of messages from the carrot devs on the lucene/solr lists and extrapolated from there. Best of luck with your theme extraction project.Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-31769329853849817372011-08-29T08:27:04.751-07:002011-08-29T08:27:04.751-07:00Hi Sujit,
Thank you for your continued supp...Hi Sujit,<br /> Thank you for your continued support. BTW carrot has been in SOLR from 1.4. I haven't had time to play with it, will do in a while. I believe the carrot integration to SOLR is nascent in 1.4, However, beyond 3.1 it has been integrated fully. Still only 3 algorithms are supported<br /><br />org.carrot2.clustering.lingo.LingoClusteringAlgorithm<br /><br />org.carrot2.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-40713094464997651792011-08-27T19:58:48.493-07:002011-08-27T19:58:48.493-07:00Ah, I see, I guess I was right about me getting to...Ah, I see, I guess I was right about me getting too comfortable in my space then :-). Now that I understand what themes are, I think you are on the right track - definitely worth exploring at least - longest noun phrases would tend to reflect the theme(s) a document can represent, and searching by these strings would return "similar" documents along that theme. You may want to take a Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-56126624703131992552011-08-25T23:03:03.126-07:002011-08-25T23:03:03.126-07:00Hello Sujit,
That is a neat idea, howev...Hello Sujit,<br /> That is a neat idea, however, the idea of convergence towards a centroid assumes the distribution of terms are within certain boundaries i.e. medical terminologies/products etc can be depicted in a finite set (not many variations)...I believe for news articles its hard to converge as you can refer to something without even mentioning it (blogs or commentaries etc). Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-47895541434754630152011-08-24T19:12:46.703-07:002011-08-24T19:12:46.703-07:00Thanks for the kind words, Ravi. Hopefully your ju...Thanks for the kind words, Ravi. Hopefully your junior developers are not led astray by my crazy talk :-).<br /><br />To answer your question, I have been thinking of doing something similar, but to improve the quality of TGNI's concept mapping - basically pass the sentences through another filter which only passes through noun-phrases and noun tokens in non-noun phrases.<br /><br />By theme Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-9803156664112129492011-08-24T07:53:24.725-07:002011-08-24T07:53:24.725-07:00Hello Sujit,
Brilliant piece of work. I reli...Hello Sujit,<br /> Brilliant piece of work. I religiously read your blog, lovely explanation and code, keep it up. <br /><br />I often show your blog to Junior developers to explain to them how code should be written. Hope you dont mind using your blog for my personal gain :-)<br /><br />I had a small question, If I were to grab all contiguous nouns in a text document and apply the concept Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-45447195222628228532011-08-22T12:09:39.065-07:002011-08-22T12:09:39.065-07:00Sorry it should have been java.util.Arrays (Eclips...Sorry it should have been java.util.Arrays (Eclipse must have put this in based on classpath availability, apparently the asList() method behave similarly for both) - I will update the code.Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-24749657724254351812011-08-22T07:45:34.046-07:002011-08-22T07:45:34.046-07:00Why is scala.actors.threadpool.Arrays used?Why is scala.actors.threadpool.Arrays used?Anonymousnoreply@blogger.com