开发者

using grep to capture javascript links

开发者 https://www.devze.com 2023-01-18 23:44 出处:网络
When using wget to create static copies of my site however there are several elements which require external assets that are pulled in via javascript. The pattern of the script should be fairly consta

When using wget to create static copies of my site however there are several elements which require external assets that are pulled in via javascript. The pattern of the script should be fairly constant and no urls are dynamically created. The urls I need to extract look like :

oncl开发者_高级运维ick="return ns.homepage.load({e:this, src:'https://mysub.mydomain.tld/somedir/content/123456789.html'})"

I'd like to output the list of these urls to a local file so I can wget them as well.


use perl + HTML::TreeBuilder to pull your side code and then parse it.

You may have to do some regex work, i.e this module may only get you as far as slurping the 'onclick()' event - but it shouldn't be too bad to get the rest.

0

精彩评论

暂无评论...
验证码 换一张
取 消