I have a file in hdfs with 100 columns, which i want to proces using pig. I want to load this file into a tuple 开发者_如何转开发with columns names in a separate pig script, and reuse this script from other pig scripts. How do I do this?
Say this 100 column pig script is - 100col.pig. How do i call it from anotherone.pig?
Check into the exec
command (for batch processing) or the run
command (for interactive scripts). Also, if you need to use (non-grunt) shell commands, check the fs
command. Here's a good reference:
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
You should try using macros that is present in pig version 0.9.
http://pig.apache.org/docs/r0.9.1/cont.html#macros
Its a little late for this answer, but I was recently working on this requirement and found almost nothing helpful, until I found this, hope this will help someone in need:
** This excerpt is taken from Programming Pig book.
For a long time in Pig Latin, the entire script needed to be in one file. This produced some rather unpleasant multithousand-line Pig Latin scripts. Starting in 0.9, the preprocessor can be used to include one Pig Latin script in another. Taken together with the macros, it is now possible to write modular Pig Latin that is easier to debug and reuse: import is used to include one Pig Latin script in another:
--main.pig
import '../examples/ch6/dividend_analysis.pig';
daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
date:chararray, open:float, high:float, low:float, close:float,
volume:int, adj_close:float);
results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close');
import writes the imported file directly into your Pig Latin script in place of the import statement. In the preceding example, the contents of dividend_analysis.pig will be placed immediately before the load statement. Note that a file cannot be imported twice. If you wish to use the same functionality multiple times, you should write it as a macro and import the file with that macro.
Here there are 2 options as mentioned above. Pig gives run and exec commands to tackle your requirement.
exec command is there for calling a pig script that is inependent and a standalone run. run command is there for running a pigscipt and preserve its variables and aliases.
I suppose you need to check out the run command to achieve your requirements. http://pig.apache.org/docs/r0.9.1/cmds.html#run
精彩评论