Hadoop
Hadoop Code - Git and SVN
All the Apache Hadoop Code is hosted in SVN. How does Git help in Had开发者_如何学运维oop development process? It\'s not clear from the below article.[详细]
2023-04-01 20:53 分类:问答Setup Nutch 1.3 and Hadoop
I am a newbie to Nutch and Hadoop and trying to follow the tutorial here at http://wiki.apache.org/nutch/NutchHadoopTutorial.[详细]
2023-04-01 20:39 分类:问答Large scale data processing Hbase vs Cassandra [closed]
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references,o开发者_JS百科r expertise, but this question will likely soli[详细]
2023-04-01 10:31 分类:问答How to Get Pig to Work with lzo Files?
So, I\'ve seen a couple of tutorials for this online, but each seems to say to do something different. Also, each of them doesn\'t seem to specify whether you\'re trying to get things to work on a rem[详细]
2023-04-01 05:05 分类:问答Why is the right number of reduces in Hadoop 0.95 or 1.75?
The hadoop documentation states: The right number of reduces seems to be 0.95 or 1.75 multiplied by ( * mapred.tasktracker.reduce.tasks.maximum).[详细]
2023-04-01 00:44 分类:问答Process entire files in Hadoop using Python code (preferably in Dumbo)
It seems a very common use case but so hard to do in 开发者_JAVA百科Hadoop (it is possible with WholeFileRecordReader class).[详细]
2023-04-01 00:39 分类:问答HDFS path changing when trying to update files in HDFS
I am new to Hadoop and HDFS, so maybe it is something I am doing wrong when I copy from local (Ubuntu 10.04) to HDFS on a single node on localhost.The initial copy works fine, but when I modify my loc[详细]
2023-03-31 22:57 分类:问答setCompressOutput in Hadoop
When should use and not to use FileOutputFormat.setCompressOutpu开发者_Python百科t(conf, true);? I heard that it compresses mapper output. Is there any possibility to compress reducer side output?[详细]
2023-03-31 20:45 分类:问答Nutch on EMR problem reading from S3
Hi I am trying to run Apache Nutch 1.2 on Amazon\'s EMR. To do this I specifiy an input directory from S3.I get the following error:[详细]
2023-03-31 20:16 分类:问答Hbase performance
I am using Spring + Datanucleus JDO + Hbase. Hbase is on a fully distributed mode with two nodes. I am facing serious performance issues here.[详细]
2023-03-31 14:08 分类:问答