zcat on amazon s3_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-12 12:27 出处：网络

Is it possible to cat a gzip file stored on amaz开发者_如何转开发on s3? Maybe using some streamming client?

相关专题：

Is it possible to cat a gzip file stored on amaz开发者_如何转开发on s3? Maybe using some streamming client?

We are looking for a operation similar to zcat s3://bucket_name/your_file | grep "log_id"

Found this thread today, and liked Keith's answer. Fast forward to today's aws cli it's done with:

aws s3 cp s3://some-bucket/some-file.bz2 - | bzcat -c | mysql -uroot some_db

Might save someone else a tiny bit of time.

From S3 REST API » Operations on Objects » GET Object:

To use GET, you must have READ access to the object. If you grant READ access to the anonymous user, you can return the object without using an authorization header.

If that's the case, you can use:

$ curl <url-of-your-object> | zcat | grep "log_id"

$ wget -O- <url-of-your-object> | zcat | grep "log_id"

However, if you haven't granted anonymous READ access on the object, you need to create and send the authorization header as part of the GET request and that becomes somewhat tedious to do with curl/wget. Lucky for you, someone has already done it and that's the Perl aws script by Tim Kay as recommended by Hari. Note that you don't have to put Tim Kay's script on your path or otherwise install it (except making it executable), as long as you use the command versions which start with aws, eg.

$ ./aws cat BUCKET/OBJECT | zcat | grep "log_id"

You could also use s3cat, part of Tim Kay's command-line toolkit for AWS:

http://timkay.com/aws/

To get the equivalent of zcat FILENAME | grep "log_id", you'd do:

> s3cat BUCKET/OBJECT | zcat - | grep "log_id"

Not exaclty a zcat, but a way to use hadoop to download large files parallel from S3 could be http://hadoop.apache.org/common/docs/current/distcp.html

hadoop distcp s3://YOUR_BUCKET/your_file /tmp/your_file

hadoop distcp s3://YOUR_BUCKET/your_file hdfs://master:8020/your_file

Maybe from this point you can pipe a zcat...

To add your credentials you have to edit core-site.xml file with:

<configuration>
<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value>YOUR_KEY</value>
 </property>
 <property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value>YOUR_KEY</value>
</property>
<property>
  <name>fs.s3n.awsAccessKeyId</name>
  <value>YOUR_KEY</value>
 </property>
 <property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>YOUR_KEY</value>
</property>
</configuration>

If your OS supports it (likely) you can use /dev/fd/1 for the target for aws s3 cp:

aws s3 cp s3://bucket_name/your_file /dev/fd/1 | zcat | grep log_id

There seem to be some trailing bytes after EOF, but zcat and bzcat conveniently just write a warning to STDERR.

I just confirmed that this works by loading some DB dumps straight from S3 like this:

aws s3 cp s3://some_bucket/some_file.sql.bz2 /dev/fd/1 | bzcat -c | mysql -uroot some_db

All this with nothing but the stuff already on your computer and the official AWS CLI tools. Win.

You need to try using s3streamcat, it supports bzip, gzip and xz compressed files.

Install with

sudo pip install s3streamcat Usage

Usage:

s3streamcat s3://bucketname/dir/file_path
s3streamcat s3://bucketname/dir/file_path | more
s3streamcat s3://bucketname/dir/file_path | grep something

zcat on amazon s3

精彩评论

关注公众号

热门标签

图文推荐

zcat on amazon s3

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：