I have a C / C++ program which needs to read in a file that may or may not be gzip compressed. I know we can use gzread() from 开发者_高级运维zlib to read in both compressed and uncompressed files - however, I want to use the zlib functions ONLY if the file is gzip compressed (for performance reasons).
So is there any way to programatically detect or check if a certain file is gzipped from C / C++?
There is a magic number at the beginning of the file. Just read the first two bytes and check if they are equal to 0x1f8b
.
Do you prefer false positives, false negatives, or no false results at all (there goes performance down the drain...)?
The RFC 1952: GZIP file format specification version 4.3 states the first 2 bytes (of each member and therefore) of the file are '\x1F'
and '\x8B'
. Use that for a first check that can result in false positives.
What is the difference in performance between reading compressed and uncompressed files using gzread()?
Anyway, in order to detect if a file is gzipped, you can read the magic number at the beginning of the file, which is 1f 8b
according to the link.
You can test for the signatures described in the RFCs 1951 and 1952 to get an idea. For GZIP files the second one is the relevant and it is definitive. There are some false positives on other formats, so you should check as much of the header for plausible values.
For just zlib streams it's somewhat harder, because they are even more prone to false positives. But you would rarely encounter those in the wild on their own.
精彩评论