When using wget to create static copies of my site however there are several elements which require external assets that are pulled in via javascript. The pattern of the script should be fairly constant and no urls are dynamically created. The urls I need to extract look like :
oncl开发者_高级运维ick="return ns.homepage.load({e:this, src:'https://mysub.mydomain.tld/somedir/content/123456789.html'})"
I'd like to output the list of these urls to a local file so I can wget them as well.
use perl + HTML::TreeBuilder to pull your side code and then parse it.
You may have to do some regex work, i.e this module may only get you as far as slurping the 'onclick()' event - but it shouldn't be too bad to get the rest.
精彩评论