tag:blogger.com,1999:blog-7583720.post7384092476732800391..comments2024-03-05T03:17:02.289-08:00Comments on Salmon Run: Learning Mahout : ClassificationSujit Palhttp://www.blogger.com/profile/06835223352394332155noreply@blogger.comBlogger16125tag:blogger.com,1999:blog-7583720.post-50774023425386793162016-03-31T13:00:31.549-07:002016-03-31T13:00:31.549-07:00Hi Chiru, I believe Mahout still ships with the Na...Hi Chiru, I believe Mahout still ships with the Naive Bayes Classifier - here is a <a href="https://mahout.apache.org/users/basics/algorithms.html" rel="nofollow">List of supported algorithms</a> in Mahout 0.10 according to the Mahout website.Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-43096162294113237662016-03-30T07:58:02.334-07:002016-03-30T07:58:02.334-07:00please anyone send the algorithm in mahout classif...please anyone send the algorithm in mahout classification which already implemented (naive)chirunoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-49179563492076690282015-06-03T09:37:26.628-07:002015-06-03T09:37:26.628-07:00Hi Rachana, the confusion matrix is the output of ...Hi Rachana, the confusion matrix is the output of the classification evaluation, and tells how well the classifier performed across different classes. Some classifiers take the classification flags made by upstream (less accurate) classifiers, perhaps in that case, we could also use the probability of a correct answer for a class from the confusion matrix as a feature? But in the case described, Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-70113624856627098842015-06-03T06:17:01.663-07:002015-06-03T06:17:01.663-07:00hello any way to use confusion matrix in program.....hello any way to use confusion matrix in program...can i get source coderachanahttps://www.blogger.com/profile/04408561325875733155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-46209482572142099142015-02-12T09:31:41.210-08:002015-02-12T09:31:41.210-08:00Hi Rajarshi, thanks for the kind words, glad you f...Hi Rajarshi, thanks for the kind words, glad you find it helpful. To do incremental clustering, assuming a k-means like environment, you could maintain the centroids of your clusters (basically a averaged term vector of all documents in the cluster, I believe Mahout k-means dumps this information out already) and then assign an incoming document based on which centroid it is closest to. You couldSujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-59525322508874765632015-02-11T21:19:26.179-08:002015-02-11T21:19:26.179-08:00Hi Sujit,
I have been following your blog for last...Hi Sujit,<br />I have been following your blog for last one month and I find it very informative.I am rather new to Data Mining. I wanted to ask you this question regarding clustering of documents. <br />Mahout or Carrot2 does and excellent job with clustering my documents. Now, I want to make this process incremental. Let's say I have a set of document S1 and by the time I finish processing Rajarshi Roynoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-43821505038396751302014-12-04T15:04:19.463-08:002014-12-04T15:04:19.463-08:00Hi VRK, you can find it on GitHub here.Hi VRK, you can find it <a href="https://github.com/sujitpal/mia-scala-examples/tree/master/src/main/scala/com/mycompany/mia/classify" rel="nofollow">on GitHub here</a>.<br />Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-1354310827717526292014-12-03T22:49:36.111-08:002014-12-03T22:49:36.111-08:00Hi, Sujit, can you send me the complete code for c...Hi, Sujit, can you send me the complete code for classifying 20newsgroup dataset using SGD and NB. I need from pre processing to classification of files. Its urgent.<br />my mail: vrk_nitw@yahoo.comVRKnoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-2157883250339263762013-09-11T18:29:05.522-07:002013-09-11T18:29:05.522-07:00Hi Anitha, I suspect that /user/as7784/Pharma is a...Hi Anitha, I suspect that /user/as7784/Pharma is a local (non HDFS) directory but your mahout script expects to see it in HDFS (or vice versa). Take a look at the MAHOUT_LOCAL environment variable (comments inside the bin/mahout script) - if you set it to some value, it forces mahout to work off the local file system.<br />Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-1418596196893761402013-09-11T05:50:00.494-07:002013-09-11T05:50:00.494-07:00Hi Sujit, I am trying to run logistic regression i...Hi Sujit, I am trying to run logistic regression in mahout. I am getting the following error message though I have the /user/as7784/Pharma folder in Hadoop. Appreciate your help.<br /><br />$ mahout trainlogistic --input /user/as7784/Pharma --output ./model --target MOM_POPS --categories 2 --predictors Devaluation Seasonality_YE P_Debt --types numeric<br />Warning: $HADOOP_HOME is deprecated.<Anonymoushttps://www.blogger.com/profile/11792086769031020194noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-58812429475911197382013-01-06T12:43:41.503-08:002013-01-06T12:43:41.503-08:00Hi, its difficult to do without more context. I lo...Hi, its difficult to do without more context. I looked at the line in BayesUtils and the problem is that the code depends on a certain format for the label and not finding it, but that does not tie it back to why its failing for you. Perhaps look at the code being pointed by the /entire/ stack trace (not the last 2 lines), that may provide you a better answer.<br />Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-85626384803969853152013-01-03T22:16:58.251-08:002013-01-03T22:16:58.251-08:00Hi Sujit,
Can you please help to solve this issue...Hi Sujit,<br /><br />Can you please help to solve this issue,<br /><br />http://stackoverflow.com/questions/14151877/error-while-creating-mahout-modelMahout Newbienoreply@blogger.comtag:blogger.com,1999:blog-7583720.post-9513995477115221682012-11-04T08:27:22.597-08:002012-11-04T08:27:22.597-08:00@Priyadarshan: sorry about the delay in respondng,...@Priyadarshan: sorry about the delay in respondng, looks like I missed your comment. Once you train the model, you would write some code that would load up the model and use it to predict classes for unseen cases, similar to the SGD20NewsgroupsClassifier.test() method.<br />Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-21602847133607111922012-10-15T11:15:31.357-07:002012-10-15T11:15:31.357-07:00Hi Rajesh, yes you are right, the model is useless...Hi Rajesh, yes you are right, the model is useless, its not predicting anything, its like a broken clock thats right twice a day :-). It classifies everything as category 1, 27 of which are right and 13 of which are wrong which gives it the AUC of 0.57. Based on a thread on the Mahout ML that I read couple of weeks ago, I believe this is because the model's feature vectors haven't been Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-26253256112875297402012-10-15T06:08:19.714-07:002012-10-15T06:08:19.714-07:00I am also trying to build model with mahout sgd. s...I am also trying to build model with mahout sgd. similar to your results<br /><br />AUC = 0.57<br />confusion: [[27.0, 13.0], [0.0, 0.0]]<br />entropy: [[-0.4, -0.3], [-1.2, -0.7]]<br /><br />Please closely see confusion matrix.<br />I think it means all instances are classified as category 1: 27 + 13.<br /><br />Is such kind of model useful.Rajesh Nikamhttps://www.blogger.com/profile/05536582994476730563noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-90418370610692355992012-10-08T04:48:14.267-07:002012-10-08T04:48:14.267-07:00hi sujit,
can you please tell me how to use the m...hi sujit,<br /><br />can you please tell me how to use the model created from mahout(0.7) after this command:-<br /> bin/mahout trainnb \<br /> -i 20news-train-vectors -el -o model -li labelindex -owPriyadarshan rajnoreply@blogger.com