开发者

Hadoop pig latin unable to stream through a python script

开发者 https://www.devze.com 2023-02-24 03:48 出处:网络
I have a simple python script (moo.py) that i am trying to stream though impor开发者_如何学Pythont sys, os

I have a simple python script (moo.py) that i am trying to stream though

impor开发者_如何学Pythont sys, os
for line in sys.stdin:
    print 1;

and i try to run this pig script

DEFINE CMD `python moo.py` ship('moo.py');
data = LOAD 's3://path/to/my/data/*' AS (a:chararray, b:chararray, c:int, d:int);
res = STREAM data through CMD;
dump res;

when i run this pig script local (pig -x local) everything is fine, but when i run it without -x local, it prints out this error

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.

[Log file]

Caused by: java.io.FileNotFoundException: File moo.py does not exist.

any idea?


it's most likely an issue of relative path.

try:

DEFINE CMD `python moo.py` ship('/local/path/to/moo.py');

it can also be an issue of read/write/execute permission.


The problem was that i used ship() function instead of cache() while ship() works file - passing local files from the master to the slaves cache() is used by the slaves to obtain files from an accessible place such as s3 on amazon

hope that helps anyone :]

0

精彩评论

暂无评论...
验证码 换一张
取 消