开发者

how to call a pig script within another pig script

开发者 https://www.devze.com 2023-04-07 23:07 出处:网络
I have a file in hdfs with 100 columns, which i want to proces using pig.I want to load this file into a tuple 开发者_如何转开发with columns names in a separate pig script, and reuse this script from

I have a file in hdfs with 100 columns, which i want to proces using pig. I want to load this file into a tuple 开发者_如何转开发with columns names in a separate pig script, and reuse this script from other pig scripts. How do I do this?

Say this 100 column pig script is - 100col.pig. How do i call it from anotherone.pig?


Check into the exec command (for batch processing) or the run command (for interactive scripts). Also, if you need to use (non-grunt) shell commands, check the fs command. Here's a good reference:

http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html


You should try using macros that is present in pig version 0.9.

http://pig.apache.org/docs/r0.9.1/cont.html#macros


Its a little late for this answer, but I was recently working on this requirement and found almost nothing helpful, until I found this, hope this will help someone in need:

** This excerpt is taken from Programming Pig book.

For a long time in Pig Latin, the entire script needed to be in one file. This produced some rather unpleasant multithousand-line Pig Latin scripts. Starting in 0.9, the preprocessor can be used to include one Pig Latin script in another. Taken together with the macros, it is now possible to write modular Pig Latin that is easier to debug and reuse: import is used to include one Pig Latin script in another:

--main.pig

import '../examples/ch6/dividend_analysis.pig';
daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
date:chararray, open:float, high:float, low:float, close:float,
volume:int, adj_close:float);
results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close');

import writes the imported file directly into your Pig Latin script in place of the import statement. In the preceding example, the contents of dividend_analysis.pig will be placed immediately before the load statement. Note that a file cannot be imported twice. If you wish to use the same functionality multiple times, you should write it as a macro and import the file with that macro.


Here there are 2 options as mentioned above. Pig gives run and exec commands to tackle your requirement.

exec command is there for calling a pig script that is inependent and a standalone run. run command is there for running a pigscipt and preserve its variables and aliases.

I suppose you need to check out the run command to achieve your requirements. http://pig.apache.org/docs/r0.9.1/cmds.html#run

0

精彩评论

暂无评论...
验证码 换一张
取 消