I'm writing a script that can determine if a page is compressed or not, and I've been doing a bit of research and cannot figure out how to determine if a page is compressed. I'd assume th开发者_如何学Cat a page compressed would have something in the headers to say that it is a compressed file. Like Content-Type or something.
Any help is appreciated.
It's actually Content-encoding
. Depending on the type of compression, this may be gzip
(or x-gzip
), deflate
or compress
in case of compressed data.
To cite wikipedia:
The “Content-Encoding”/"Accept-Encoding" and "Transfer-Encoding"/"TE" headers in HTTP/1.1 allow clients to optionally receive compressed HTTP responses and (less commonly) to send compressed requests. The specification for HTTP/1.1 (RFC 2616) specifies three compression methods: “gzip” (RFC 1952; the content wrapped in a gzip stream), “deflate” (RFC 1950; the content wrapped in a zlib-formatted stream), and "compress" (explained in RFC 2616 section 3.5 as 'The encoding format produced by the common UNIX file compression program "compress". This format is an adaptive Lempel-Ziv-Welch coding (LZW).'). Many client libraries, browsers, and server platforms (including Apache and Microsoft IIS) support gzip.
Compressed page will have Content-Encoding header with compression algorithm.
For example:
Content-Encoding: gzip
That's the web-browser which can see whether the page is compressed or not. As a web server Apache locates Accept-Encoding: gzip,deflate
in HTTP request header. If it is present it compresses PHP script's HTML response and does compression accordingly.
Ref: http://www.websiteoptimization.com/speed/tweak/compress/
Do a http request with accepting gzip, and then analyse received headers, and look for Content-Encoding: gzip
精彩评论