tag:blogger.com,1999:blog-7583720.post8486637350192770767..comments2024-03-05T03:17:02.289-08:00Comments on Salmon Run: Nutch/GORA - Delta IndexingSujit Palhttp://www.blogger.com/profile/06835223352394332155noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-7583720.post-86271015419149542182012-03-24T13:18:21.299-07:002012-03-24T13:18:21.299-07:00Hi Lina, if you are using Nutch, you could write a...Hi Lina, if you are using Nutch, you could write a custom parsing plugin that parses the contents of the page to produce the meta tags and write out the (URL=>metatag) mappings to a MySQL database (<i>in addition</i> to writing out the parse segments). This is inline with your normal nutch crawl.<br /><br />If you are using Nutch/GORA you can also do this offline by running a MR job against Sujit Palhttps://www.blogger.com/profile/06835223352394332155noreply@blogger.comtag:blogger.com,1999:blog-7583720.post-18452570299166319272012-03-19T21:56:01.147-07:002012-03-19T21:56:01.147-07:00Hi sujit,
I am working on nutch and your blog post...Hi sujit,<br />I am working on nutch and your blog posts are really helpful But i have some different problem. I want to crawl a particular site and fetch meta tags and other tags from html doc which are crawled and then store the URL->meta data in a mysql database.<br />Can you please tell me how should i approach to solve this problem. Since i am new So dont know much stuff.And I have Linahttps://www.blogger.com/profile/10650848559645392815noreply@blogger.com