I'm on a shared server with restricted disk space and i've got a gz file that super expands into a HUGE file, more than what i've got. How can I extract it "portion" by "portion (lets say 10 MB at a time), and process each portion, without extracting the whole thing even temporarily!
No, this is just ONE super huge compressed file, not a set of files please...
Hi David, your solution looks quite elegant, but if i'm readying it right, it seems like every time gunzip extracts from the beginning of the file (and the output of that is thrown away). I'm sure that'll be causing a huge strain on the shared server i'm on (i 开发者_开发技巧dont think its "reading ahead" at all) - do you have any insights on how i can make gunzip "skip" the necessary number of blocks?
If you're doing this with (Unix/Linux) shell tools, you can use gunzip -c
to uncompress to stdout, then use dd
with the skip
and count
options to copy only one chunk.
For example:
gunzip -c input.gz | dd bs=10485760 skip=0 count=1 >output
then skip=1, skip=2, etc.
Unfortunately I don't know of an existing Unix command that does exactly what you need. You could do it easily with a little program in any language, e.g. in Python, cutter.py
(any language would do just as well, of course):
import sys
try:
size = int(sys.argv[1])
N = int(sys.argv[2])
except (IndexError, ValueError):
print>>sys.stderr, "Use: %s size N" % sys.argv[0]
sys.exit(2)
sys.stdin.seek((N-1) * size)
sys.stdout.write(sys.stdin.read(size))
Now gunzip <huge.gz | python cutter.py 1000000 5 > fifthone
will put in file fifthone
exactly a million bytes, skipping the first 4 million bytes in the uncompressed stream.
精彩评论