开发者

What Nutch is all about? [closed]

开发者 https://www.devze.com 2023-01-31 12:42 出处:网络
开发者_Python百科 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical andcannot be reasonably answered in its current for
开发者_Python百科 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

Im going to make my own search engine.

When searching about search engine, crawler, and so on, I confused about Nutch.

I don’t understand what is Nutch. Is it for internal use like Lucene (correct me if Im wrong) or a framework for creating a search engine (example:google, bing, yahoo)?


Nutch is a full featured search engine - it can crawl external web sites, and it understands and respects robots.txt.

http://nutch.apache.org/about.html

Overview Nutch is open source web-search software. It builds on Lucene and Solr, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster

The system can be enhanced (eg other document formats can be parsed) using a plugin mechanism.

For more information about Nutch, please see the Nutch wiki.


Nutch is a ready-made, configurable web crawler with a Java Servlet for performing searches. If you wanted to do this as a project, Nutch probably does too much since all that's left is creating the pages for entering searches and displaying results.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号