I'm using curl to fetch a webpage, I need to detect if the response is gzip or not.
This works perfectly fine if Content-Encoding is specified in the response headers, but some servers instead return "Transfer-Encoding": "Chunked开发者_Go百科" and no Content-Encoding header.
Is there any way to detect gzip or get the raw (encoded) server response?
I tried looking at curl_getinfo but the content_encoding isn't specified either.
Thanks.
You can check if response starts with gzip magic numbers, specifically 1f 8b
.
Is there any way to detect gzip
Yes. You can use cURLs Header functions. For example you can define an function, which handles the header responses. Use curl_setopt()
with the CURLOPT_HEADERFUNCTION
option. Or write it to an file (which you have created with fopen()
) with the CURLOPT_WRITEHEADER
option.
There may are more options you could use. Look out the possibilities at the curl_setopt() manual. The header you are looking for have the name: Content-Encoding.
If you have the output in a file, you could also use PHPs finfo with some of its predefined constants. Or mime_content_type() (DEPRECATED!) if finfo is not available to you.
[...] or get the raw (encoded) server response?
Yes. You can specify the accept-encoding header. The value you are look for is identity. So you can send:
Accept-Encoding: identity
May have look to the HTTP/1.1 RFC
To get an unencoded/uncompressed output (for example to directly write it into a file).
Use CURLOPT_ENCODING
for this purpose. You can set it also with *curl_setopt*.
You can either issue a separate HEAD request:
CURLOPT_HEADER => true
CURLOPT_NOBODY => true
Or request the header to be prefixed to your original request:
CURLOPT_HEADER => true
But, if you just want to get the (decoded) HTML, you can use:
CURLOPT_ENCODING => ''
And CURL will automatically negotiate with the server and decode it for you.
精彩评论