Using PHP how can I accurately test that a remote website supports the "If-Modified-Since" HTTP header.
From what I have read, if the remote file you GET has been modified since the date specified in the header request - it should return a 200 OK status. If it hasn't been modified, it should return a 304 Not Modified.
Therefore my question is, what if the server doesn't support "If-Modified-Since" but still returns a 200 OK?
There are a few tools out there that check if your website supports "If-Modified-Since" so I guess I'm asking how they work.
Edit:
I have performed some testing using Curl, sending the following;
curl_setopt($ch, CURLOPT_HTTPHEADER, array("If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',time()+60*60*60*60)));
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FORBID_REUSE, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 4);
curl_setopt($ch, CURLOPT_TIMEOUT, 4);
i.e. a date in the future google.com returns;
HTTP/1.0 304 Not Modified
Date: Fri, 05 Feb 2010 16:11:54 GMT
Server: gws
X-XSS-Protection: 0
X-Cache: MISS from .
Via: 1.0 .:80 (squid)
Connection: close
and if I send;
curl_setopt($ch, CURLOPT_HTTPHEADER, array("If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',time()-60*60*60*60)));
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FORBID_REUSE, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 4);
curl_setopt($ch, CURLOPT_TIMEOUT, 4);
i.e. a date in the past, google.com returns;
HTTP/1.0 200 OK
Date: Fri, 05 Feb 2010 16:09:12 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Server: gws
X-XSS-Protection: 0
X-Cache: MISS from .
Via: 1.0 .:80 (squid)
Connection: close
If I then send both to bbc.co.uk (which doesn't support it);
The future one returns;
HTTP/1.1 200 OK
Date: Fri, 05 Feb 2010 16:12:51 GMT
Server: Apache
Set-Cookie: BBC-UID=84bb66bc648318e367bdca3ad1d48cf627005b54f090f211a2182074b4ed92c40ForbSoft%20Web%20Diagnostics%20%28URL%20Validator%29; expires=开发者_如何学运维Tue, 04-Feb-14 16:12:51 GMT; path=/; domain=bbc.co.uk;
Accept-Ranges: bytes
Cache-Control: max-age=0
Expires: Fri, 05 Feb 2010 16:12:51 GMT
Pragma: no-cache
Content-Length: 111677
Content-Type: text/html
The date in the past returns;
HTTP/1.1 200 OK
Date: Fri, 05 Feb 2010 16:14:01 GMT
Server: Apache
Set-Cookie: BBC-UID=841b66ec44232cd91e81e88a014a3c5e50ed4e20c0e07174c4ff59675cd2fa210ForbSoft%20Web%20Diagnostics%20%28URL%20Validator%29; expires=Tue, 04-Feb-14 16:14:01 GMT; path=/; domain=bbc.co.uk;
Accept-Ranges: bytes
Cache-Control: max-age=0
Expires: Fri, 05 Feb 2010 16:14:01 GMT
Pragma: no-cache
Content-Length: 111672
Content-Type: text/html
So my question still stands.
I have performed some testing on this and it appears to work as follows;
If you send an If-Modified-Since header with a date that is in the past (5 mins previous to the current time should do it) then sites such as google.com, w3.org, mattcutts.com will return a "HTTP/1.1 304 Not Modified" header. Sites such as yahoo.com, bbc.co.uk and stackoverflow.com always return a "HTTP/1.1 200 OK".
The "Last-Modified" header has nothing to do with "If-Modified-Since" because the whole point of sending back a "HTTP/1.1 304 Not Modified" header is that you don't have to send the body with it (thus saving bandwidth - which is the whole point behind this).
Therefore, the answer to my question is that if a site doesn't return a "HTTP/1.1 304 Not Modified" header when you send an "If-Modified-Since 5 mins ago" header, the site doesn't support the "If-Modified-Since" request properly.
If I am incorrect, please say so and provide testing to show.
Edit: I forgot to add that a good test is to make a normal HEAD request to the domain (e.g. w3.org), grab the "Last Modified" date and then make another request with "If-Modified-Since:". This will test that both the "Last Modified" value and "If-Modified-Since" request are supported. Please Note: just because the server sends back a "Last Modified" date doesn't mean it supports "If-Modified-Since"
If the entity returns a "Last-Modified" header, then it supports it. Makes sense really.
More info: http://httpd.apache.org/docs/2.2/caching.html (A Brief Guide to Conditional Requests)
Obviously only static pages/files will have that header. With dynamic content (asp, php, etc) there is no way to know by the headers (unless the site handlers caching manually, e.g. like this), and the entity may or may not support If-Modified-Since, from my experience.
Maybe you can just do two requests, one followed by another, sending a If-Modified-Since header, and then verify if the second request is a 304 or a 200.
EDIT- hurikhan77 points out a important note, and it's that, for example testing the root of the site for this capability, does not guarantee that the rest of the site does/doesn't support this too.
regarding the first answer above I'd like to note that conditional requests make as much sense on dynamic content as they do on static content. If the code that generates the dynamic content knows that the backend entity (e.g. database item) has not changed it should send a 304 upon a conditional request.
Jan
精彩评论