nutch
Do ivy dependency revisions have anything to do with svn's?
With no background on ivy dependencies I\'m trying to build nutch with solr 4.0, but I\'m not sure how to change the nutch ivy dependency on solr in the ivy.xml:开发者_JS百科[详细]
2023-03-04 13:36 分类:问答Why do I get "security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000"?
$hdfs dfs -rmr crawl 11/04/16 08:49:33 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000[详细]
2023-02-26 05:13 分类:问答wrote code in java for nutch
hello: I\'m writing code in java for nutch(open source search engine) to remove the movments from arabic words in the indexer.[详细]
2023-02-23 02:28 分类:问答How to omit JavaScript and comments using nutch crawl?
I am a newbie at this, trying to use Nutch 1.2 to fetch a site. I\'m using only a Linux console to work with Nut开发者_如何学Goch as I don\'t need anything else. My command looks like this[详细]
2023-02-21 02:29 分类:问答Nutch: Invoke in Java, not command line?
Am I being thick or is there really no way to invoke Apache Nutch through some Java code programmatically? Where is the documentation (or a guide or tutorial) on how to do this? Google has failed me.[详细]
2023-02-19 17:50 分类:问答Nutch web spider, index entire web
Alright, I\'ve been messing around with Nutch and need to know what parameter inside the crawl-urlfilter.txt file I edit so the spider has no boundaries. In other words I want it to roam around the we[详细]
2023-02-16 21:43 分类:问答Will Nutch, the spider, index webpages it already has in it's index?
Does Nutch index pages again if they\'re already in 开发者_运维技巧the index? If so, how do I change this?Yes and no. By default Nutch will reindex pages only after a certain period 1 month (from memo[详细]
2023-02-16 21:38 分类:问答Run Nutch on existing Hadoop cluster
We have a Hadoop cluster (Hadoop 0.20) and I want to use Nutch 1.2 to import some files over HTTP into HDFS, but I couldn\'t get Nutch running on the cluster.[详细]
2023-02-16 14:48 分类:问答changing the url domain in nutch index programmatically
i\'m currently making search engine for a website content (only for searching within that website). however, i\'m thinking of building the index in the staging server. it\'s something like this:[详细]
2023-02-15 23:54 分类:问答Should I deploy on GAE or AWS?
GAE: +1 Servlet Container ready (+ JVM6) +2 openid out-of-the-box support /API -1 JPA2.0 rest开发者_开发知识库rictions (inc. - no criteria API)[详细]
2023-02-15 12:33 分类:问答