I have a simple python script (moo.py) that i am trying to stream though
impor开发者_如何学Pythont sys, os
for line in sys.stdin:
print 1;
and i try to run this pig script
DEFINE CMD `python moo.py` ship('moo.py');
data = LOAD 's3://path/to/my/data/*' AS (a:chararray, b:chararray, c:int, d:int);
res = STREAM data through CMD;
dump res;
when i run this pig script local (pig -x local) everything is fine, but when i run it without -x local, it prints out this error
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
[Log file]
Caused by: java.io.FileNotFoundException: File moo.py does not exist.
any idea?
it's most likely an issue of relative path.
try:
DEFINE CMD `python moo.py` ship('/local/path/to/moo.py');
it can also be an issue of read/write/execute permission.
The problem was that i used ship()
function instead of cache()
while ship()
works file - passing local files from the master to the slaves
cache()
is used by the slaves to obtain files from an accessible place
such as s3 on amazon
hope that helps anyone :]
精彩评论