amazon-emr
How to tell hadoop how much memory to allocate to a single mapper job?
I\'ve created a Elastic MapReduce job, and I\'m trying to optimize its performance. At this moment I\'m trying to increase the number of mappers per instance. I am 开发者_运维问答doing this via mapre[详细]
2023-04-08 17:47 分类:问答hadoop streaming ensuring one key per reducer
I have a mapper that, while processing data, classifies output into 3 different types (type is the output key). My goal is to create 3 different csv files via the reducers, each with all of the data f[详细]
2023-04-04 22:00 分类:问答java.lang.RuntimeException: java.lang.ClassNotFoundException when trying to run Jar job on Elastic MapReduce
What should I change to fix following error: I\'m trying to start a job on Elastic Mapreduce, and it crashes every time with message:[详细]
2023-04-04 04:18 分类:问答Has anybody created a job with multiple inputs using the the ruby client for Amazon's Elastic Map Reduce?
Through the UI Amazon\'s framework allows me to create jobs with multiple inputs by specifying multiple --input lines. e.g.:[详细]
2023-04-02 02:19 分类:问答Multiple files as input on Amazon Elastic MapReduce
I\'m trying to run a job on Elastic MapReduce (EMR) with a custom jar. I\'m trying to process about a 1000 files in a single directory. When I submit my job with the parameter s3n://bucketname/compres[详细]
2023-03-21 23:00 分类:问答Amazon Elastic Map Reduce: Does input fragments size matter
Given I need to process input of 20 Gb with the use of 10 instances. Is it different to have 10 input files of 2Gb compare to 4 input files of 5Gb?[详细]
2023-03-17 17:42 分类:问答Elastic Map Reduce External Jars
So, it is easy enough to handle external jars when using hadoop straight up. You have -libjars option that will do this for you. The question is how do you do this with EMR. There must be an easy way[详细]
2023-03-12 10:59 分类:问答Using s3 as fs.default.name or HDFS?
I\'m setting up a Hadoop cluster on EC2 and I\'m wondering how to do the DFS. All my data is currently in s3 and all map/reduce applications use s3 file pa开发者_运维百科ths to access the data. Now I\[详细]
2023-03-11 21:16 分类:问答Why is the elephantbird Pig JsonLoader only processing part of my file?
I\'m using Pig on Amazon\'s Elastic Map-Reduce to do batch analytics.My input files are on S3 and contain events that are represented by one JSON dictionary per line.I use the elephantbird JsonLoader[详细]
2023-03-01 15:28 分类:问答getting large datasets onto amazon elastic map reduce
There are some large datasets (25gb+, downloadable on the Internet) that I want to play around with using Amazon EMR. Instead of downloading the datasets onto my own computer, and then re-uploading th[详细]
2023-02-28 19:11 分类:问答