hdfs
Which is the easiest way to combine small HDFS blocks?
I\'m collecting logs with Flume开发者_JAVA技巧 to the HDFS. For the test case I have small files (~300kB) because the log collecting process was scaled for the real usage.[详细]
2023-01-30 12:54 分类:问答What is the best components stack for building distributed log aggregator (like Splunk)?
I\'m trying to find the best components I could use to build something similar to Splunk in order to aggregate logs from a big number of servers in computing grid. Also it should be distributed becaus[详细]
2023-01-04 14:38 分类:问答Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3
I have setup a Hadoop cluster containing 5 nodes on Amazon EC2. Now, when i login into the Master node and submit the following command[详细]
2023-01-03 09:45 分类:问答FileInputStream for a generic file System
I have a file that contains java serialized objects like \"Vector\". I have stored this file over Hadoop Distributed File System(HDFS). Now I intend to read this file (using method readOb开发者_开发技[详细]
2022-12-30 22:43 分类:问答How does Hadoop perform input splits?
This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of the form <k,v> where[详细]
2022-12-30 10:04 分类:问答Hadoop safemode recovery - taking lot of time
We are running our cluster on Amazon EC2. we are using cloudera scripts to setup hadoop. On the master node, we start below services.[详细]
2022-12-29 19:37 分类:问答Hadoop pseudo-distributed mode error
I have set-up Hadoop on a OpenSuse 11.2 VM using Virtualbox.I have made the prerequisite configs. I ran this example in the Standalone mode successfully.[详细]
2022-12-26 21:19 分类:问答What should be hadoop.tmp.dir ?
Hadoop has configuration parameter hadoop.tmp.dir which, as per documentation, is `\"A base for other temporary directories.\" I presume, this path refers to local file system.[详细]
2022-12-21 08:15 分类:问答Writing data to Hadoop
I need to write data in to Hadoop (HDFS) from external sources like a windows box. Right now I have been copying the data onto the namenode and using HDFS\'s put command to ingest it into the cluster.[详细]
2022-12-08 02:53 分类:问答