nutch
Web Cralwer Algorithm: depth?
I\'m working on a crawler and need to understand exactly what is meant by \"link depth\". Take nutch for example: http://wiki.apache.org/nutch/NutchTutorial[详细]
2023-01-28 17:50 分类:问答Running a web crawler for selected sites on google app engine?
I need to write a crawler to extract some info from few pre-slected websites only. I know this is a straightway job but am thinking of using google app engine to get this done.[详细]
2023-01-28 08:19 分类:问答Nutch API advice
I\'m working on a project where I need a mature crawler to do some work, and I\'m evaluating Nutch for this purpose.[详细]
2023-01-28 07:04 分类:问答Nutch problem: java.lang.NoClassDefFoundError
I\'m trying to run Nutch on my Windows machine. I have Nutch, Java, Tomcat, and Cygwin installed. When I try to run the crawl command in Cygwin, I get the following error:[详细]
2023-01-27 23:20 分类:问答Hadoop to create an Index and Add() it to distributed SOLR... is this possible? Should I use Nutch? ..Cloudera?
Can I use a MapReduce framework to create an index and somehow add it to a distributed Solr? I have a burst of information (logfiles and documents) that will be transported over the internet and stor[详细]
2023-01-25 22:25 分类:问答Drupal + Nutch + Solr
We\'re about to start a project consisting of a search engine website. We need to implement a site that has social functionalities upon it\'s core search engine solution. Obviously, we need to choose[详细]
2023-01-25 05:35 分类:问答Why is nutch parsing application/x-javascript files?
I configured nutch with the following in my conf/nutch-site.xml <prope开发者_如何学Pythonrty>[详细]
2023-01-25 02:51 分类:问答Can't access hadoop web ui for job tracker [closed]
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time,or an extraordinarily narrow situation that is not generally applic[详细]
2023-01-24 05:47 分类:问答Profiling Lucene in Nutch
I\'m trying to profile Nutch using VisualVM.Lucene is the part of the Nutch core responsible for ge开发者_StackOverflow社区nerating url indexes and for searching these indexes due to some query.I\'m r[详细]
2023-01-23 16:55 分类:问答nutch and sitemap.xml
does apache-nutch support sitemaps? o开发者_JAVA技巧r how can i implement it myself? how can i use priority field, should it be multiplied to boost field?Not that I\'m aware of.[详细]
2023-01-21 03:56 分类:问答