Hadoop
Hadoop: what should be mapped and what should be reduced?
This is my first time using map/reduce. I want to write a program that processes a large log file. For example, if I was processing a log file that had records consisting of {Student, College, and GPA[详细]
2023-04-09 06:33 分类:问答Write to different files using hadoop streaming
I\'m currently processing about 300 GB of log files on a 10 servers hadoop cluster. My data is being saved in folders named YYMMDD so each day can be accessed quickly.[详细]
2023-04-09 04:16 分类:问答Is it better to use the mapred or the mapreduce package to create a Hadoop Job?
To create MapReduce jobs you can either use the old org.apache.hadoop.mapred package or the newer org.apache.hadoop.mapreduce package for Mappers and Reducers, Jobs ... The first one had been ma开发者[详细]
2023-04-09 02:37 分类:问答Hadoop MAC OS installation woes
So I\'m trying to install hadoop on MAC OS X Leopard following the steps in this note: Running Hadoop on a OS X Single Node Cluster.[详细]
2023-04-09 02:30 分类:问答How to tell hadoop how much memory to allocate to a single mapper job?
I\'ve created a Elastic MapReduce job, and I\'m trying to optimize its performance. At this moment I\'m trying to increase the number of mappers per instance. I am 开发者_运维问答doing this via mapre[详细]
2023-04-08 17:47 分类:问答Hadoop java mapper -copyFromLocal heap size error
As part of my Java mapper I have a command executes some code on the local node and copies a local output file to the hadoop fs.Unfortunately I\'m getting the following output:[详细]
2023-04-08 14:56 分类:问答How to set setMaxMapTaskFailuresPercent in hadoop's new api?
Before, you could set max failures percent by using: JobConf.setMaxMapTaskFailuresPercent(int) but now, that\'s obsolete.[详细]
2023-04-08 13:16 分类:问答How to load data to hive from HDFS without removing the source file?
When load data from HDFS to Hive, using开发者_如何学编程 LOAD DATA INPATH \'hdfs_file\' INTO TABLE tablename;[详细]
2023-04-08 11:56 分类:问答Exploring nutch over hadoop
What possibly can i do with Hadoop and Nutch used as a search engine ? I know that nutch is used to build a web crawler . But i\'m not finding the perfect picture . Can i use mapreduce with nutch and[详细]
2023-04-08 02:50 分类:问答Hadoop: High CPU load on client side after committing jobs
I couldn\'t find an answer to my issue while sifting through some Hadoop guides: I am committing various Hadoop jobs (up to 200) in one go via a shell script on a client computer. Each job is started[详细]
2023-04-08 01:38 分类:问答