apache-pig
Using Pig and Python
Apologies if this question is poorly worded: I am embarking on a large scale machine learning project and I don\'t like programming in Java. I love writing programs in Python. I have heard good things[详细]
2023-03-18 00:46 分类:问答POST Hadoop Pig output to a URL as JSON data?
I have a Pig job which analyzes log files and write summary output to S3. Instead of writing the output to S3, I want to convert it to a JSON payload and POST it to a URL.[详细]
2023-03-16 02:12 分类:问答Is it possible to define our own comparator class for using order by in pig?
I want to order the tuples 开发者_Go百科using my own comparator class. If I run a query like say \" B = ORDER A by $0,$1 \"[详细]
2023-03-12 00:33 分类:问答Are there any useful tutorials on Pig UDFs other than the apache.org tutorial?
I\'ve spent a few hours getting acclimated, but I w开发者_C百科ant to find some other ways to practice.The book Programming Pig is available online, and has a great chapter on writing UDFs:[详细]
2023-03-10 22:17 分类:问答Understanding SQL joins within WHERE clause
I have a query in SQL that I\'m trying to translate into Pig Latin (for use on a Hadoop cluster).Most of the time I have no problem moving the queries over to Pig, but I\'ve encountered something I ca[详细]
2023-03-05 08:07 分类:问答Equivalent of linux 'diff' in Apache Pig
I wan开发者_开发百科t to be able to do a standard diff on two large files. I\'ve got something that will work but it\'s not nearly as quick as diff on the command line.[详细]
2023-03-03 14:31 分类:问答Why is the elephantbird Pig JsonLoader only processing part of my file?
I\'m using Pig on Amazon\'s Elastic Map-Reduce to do batch analytics.My input files are on S3 and contain events that are represented by one JSON dictionary per line.I use the elephantbird JsonLoader[详细]
2023-03-01 15:28 分类:问答How does Pig use Hadoop Globs in a 'load' statement?
As I\'ve noted previously, Pig doesn\'t cope well with empty (0-byte) files. U开发者_JAVA技巧nfortunately, there are lots of ways that these files can be created (even within Hadoop utilitities).[详细]
2023-02-28 04:11 分类:问答Running Pig query over data stored in Hive
I would like to know how to run Pig queries stored in Hive format. I have configured Hive to store compressed data (using this tutorial http://wiki.apache.org/hadoop/Hive/CompressedStorage).[详细]
2023-02-28 00:31 分类:问答strsplit issue - Pig
I have following tuple H1 and I want to strsplit its $0 into tuple.However Ialways get an error message:[详细]
2023-02-25 03:24 分类:问答