My goal is to crawl a given site, and log statistics for the total payload of each page on the site. By payload开发者_Go百科 I mean the number of bytes once the original document, css, js, images, etc... are downloaded. I'm attempting to put together a graph which will show the "heaviest" pages on my site so that those can be dealt with first.
Does anyone know of any tools or techniques to do this? My preference is something that would integrate well with a web app, in PHP or Python.
I've seen plenty of questions on SO about Mechanize, they usually look like they get a lot done with only a little bit of code.
精彩评论