开发者

GUNZIP / Extract file "portion by portion"

开发者 https://www.devze.com 2022-12-30 00:20 出处:网络
I\'m on a shared server with restricted disk space and i\'ve got a gz file that super expands into a HUGE file, more than what i\'ve got. How can I extract it \"portion\" by \"portion (lets say 10 MB

I'm on a shared server with restricted disk space and i've got a gz file that super expands into a HUGE file, more than what i've got. How can I extract it "portion" by "portion (lets say 10 MB at a time), and process each portion, without extracting the whole thing even temporarily!

No, this is just ONE super huge compressed file, not a set of files please...


Hi David, your solution looks quite elegant, but if i'm readying it right, it seems like every time gunzip extracts from the beginning of the file (and the output of that is thrown away). I'm sure that'll be causing a huge strain on the shared server i'm on (i 开发者_开发技巧dont think its "reading ahead" at all) - do you have any insights on how i can make gunzip "skip" the necessary number of blocks?


If you're doing this with (Unix/Linux) shell tools, you can use gunzip -c to uncompress to stdout, then use dd with the skip and count options to copy only one chunk.

For example:

gunzip -c input.gz | dd bs=10485760 skip=0 count=1 >output

then skip=1, skip=2, etc.


Unfortunately I don't know of an existing Unix command that does exactly what you need. You could do it easily with a little program in any language, e.g. in Python, cutter.py (any language would do just as well, of course):

import sys
try:
  size = int(sys.argv[1])
  N = int(sys.argv[2])
except (IndexError, ValueError):
  print>>sys.stderr, "Use: %s size N" % sys.argv[0]
  sys.exit(2)
sys.stdin.seek((N-1) * size)
sys.stdout.write(sys.stdin.read(size))

Now gunzip <huge.gz | python cutter.py 1000000 5 > fifthone will put in file fifthone exactly a million bytes, skipping the first 4 million bytes in the uncompressed stream.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号