开发者

java web page reader

开发者 https://www.devze.com 2023-01-30 05:45 出处:网络
I want to retrieve all the links in web page ,but the web page use javascript and each page contain number of 开发者_如何学运维links

I want to retrieve all the links in web page ,but the web page use javascript and each page contain number of 开发者_如何学运维links

how can i go to the next page and read its contain in java program


Getting this info from a Javascript'ed page can be a hard job. Your program must interpret the whole page and understand what the JS is doing. Not all web spiders doing this.

Most modern JS libraries (jquery, etc) are mostly manipulate CSS and attributes of HTML elements. So first you have to generate the "flat" HTML from HTML source and JS and then maybe run a classical web spider over the flat HTML code.

(For example the FF webdeveloper plugin allows to see the original source code of a page and the generated code of the page, when all JS is done).


What you are looking for is called Web Spider engine. There are plenty of open source web spider engine's are available. Check http://j-spider.sourceforge.net/ for example

0

精彩评论

暂无评论...
验证码 换一张
取 消