开发者

How to By pass WP super cache using python?

开发者 https://www.devze.com 2023-01-14 23:01 出处:网络
I\'开发者_Python百科m trying to collecting data from a frequently updating blog, so I simply use a while loop which includes urllib2.urlopen(\"http:\\example.com\") to refresh the page every 5 minutes

I'开发者_Python百科m trying to collecting data from a frequently updating blog, so I simply use a while loop which includes urllib2.urlopen("http:\example.com") to refresh the page every 5 minutes to collect the data I wanted.

But I notice that I'm not getting the most recent content by doing this, it's different from what I see via browser such as Firefox, and after checking both the source code of Firefox and the same page I get from python, I found that it's WP Super Cache which is preventing me from getting the most recent result.

And I still get the same cache page even if I spoof the headers in my python code. So I wonder is there a way to by pass WP super cache? And why there's no such super cache in Firefox at all?


Have you tried changing the URL with some harmless data? Something like this:

import time
urllib2.urlopen("http:\example.com?time=%s" % int(time.time()))

It will actually call http:\example.com?time=1283872559. Most caching systems will bypass the cache if there's a querystring or it's something that isn't expected.

0

精彩评论

暂无评论...
验证码 换一张
取 消