开发者

How do i tell in Perl what the size of a file inside a gzip archive is without unpacking the whole file?

开发者 https://www.devze.com 2023-02-09 06:41 出处:网络
I have a bunch of ridiculously big files (multiple gigabytes in size) that do have a really high compression ratio (1:200 or better). I have to process those and would like to at least show some开发者

I have a bunch of ridiculously big files (multiple gigabytes in size) that do have a really high compression ratio (1:200 or better). I have to process those and would like to at least show some开发者_StackOverflow kind of progress estimate. For that reason i'd like to know the size of the file inside the .gz, so i can compare it with what i pulled out already.

However, since unpacking the whole file in advance each time is rather prohibitive and a waste of time, i'd like to figure the size out without doing that.

I know it is possible. I can just open gzip files with Total Commander and the viewer plugin will show me the right size. (I know it's not unpacking because it shows me the size immediately, which wouldn't really be possible with a 10GB file inside the gzip.)

There probably are some header fields that contain that information.

However looking through the docs of various CPAN modules i couldn't find anything that fits the bill. IO::Uncompress::Gunzip lets me get at a header, but it doesn't contain any file size information.

Any suggestions?


Just so there's a proper answer for this:

sub get_gz_size {
    my ( $gz_file ) = @_;
    my @raw = `gzip --list $gz_file`;
    my $size = ( split " ", $raw[1] )[1];
    return $size;
}


As described in the comments above, the last 4 bytes contain the isize

Here's some code I wrote to calculate the uncompressed bytes given a file path:

sub get_isize
{
   my ($file) = @_;

   my $isize_len = 4;

   # create a handle we can seek
   my $FH;
   unless( open( $FH, '<:raw', $file ) )
   {
      die "Failed to open $file: $!";
   }
   my $io;
   my $FD = fileno($FH);
   unless( $io = IO::Handle->new_from_fd( $FD, 'r' ) )
   {
      die "Failed to create new IO::Handle for $FD: $!";
   }

   # seek back from EOF
   unless( $io->IO::Seekable::seek( "-$isize_len", 2 ) ) 
   {
      die "Failed to seek $isize_len from EOF: $!"
   }

   # read from here into mod32_isize
   my $mod32_isize;
   unless( my $bytes_read = $io->read( $mod32_isize, $isize_len ) )
   {
      die "Failed to read $isize_len bytes; read $bytes_read bytes instead: $!";
   }

   # convert mod32 to decimal by unpacking value
   my $dec_isize = unpack( 'V', $mod32_isize );

   return $dec_isize;
}

For uncompressed files larger than 4Gb, you'll need to guess whether to add 4Gb to the isize retrieved, based upon the expected minimum compression factor.

use constant MIN_COMPRESS_FACTOR => 200;
my $outer_bytes = ( -s $path );
my $inner_bytes = get_isize( $path );
$bytes += 4294967296 if( $inner_bytes < $outerbytes * MIN_COMPRESS_FACTOR );

If your uncompressed file is larger than 4294967296 * 2, then you're going to have to guess how many multiples of 4294967296 to apply (although I've never tested this), however you'll need to have an accurate judge of the expected compression ratio for this to work out:

my $estimated_multiplier = int( ($outerbytes * MIN_COMPRESS_FACTOR) / 4294967296 );
$bytes += ( 4294967296 * $estimated_multiplier ) if( $estimated_multiplier );
0

精彩评论

暂无评论...
验证码 换一张
取 消