开发者

How to download a complete web page (with all its contents) in Java?

开发者 https://www.devze.com 2022-12-17 00:20 出处:网络
Using Java, I need to save a complete webpage(with all its contents like images, css, java开发者_开发百科script e.t.c) like how we can do with save as-->complete webpage option with HttpClient lib.

Using Java, I need to save a complete webpage(with all its contents like images, css, java开发者_开发百科script e.t.c) like how we can do with save as-->complete webpage option with HttpClient lib. How can I do this?


You can try lib curl java http://curl.haxx.se/libcurl/java/

And you can refer to this discussion also curl-equivalent-in-java


You have to write an application that fetches the html file, parses it and extracts all the references, and then fetches all the files found by parsing.


It's not so easy because some CSS/JS/Images files paths might be "hidden". Just consider the following example:

<script type="...">
   document.write("&bla;script" + " type='...' src='" + blahBlah() + "'&bla;" + "&bla;/script&bla;");
</script>

However, fetching page source, parsing in the search for URLs and downloading founded URLs is pretty everything you'll probably need.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号