representation of web page as browser see it_问答_开发者

representation of web page as browser see it

开发者 https://www.devze.com 2023-02-07 16:45 出处：网络

I have some ideas of how to build a more intelligent web spider, which interacts with a web page and extracts information in a manner more similar to how us humans do.

To do this I need a representation of a web page 开发者_如何转开发that is similar or identical to that we see in our browsers

In other words I need access to the data concerning the location, colour and style of all the elements on the page, possibly at a pixel level.

But I don't want just a rendered bitmap, I want to be able to extract text, click links and push buttons and so on

I get the feeling the DOM model may be a starting point but more concrete advice would be appreciated

To clarify, I want to programmatically obtain access to web pages in a form similar to that presented to us by a browser, but for example to check the colour or text at a specific pixel location or region.

You might want to check out Selenium (or other ways of scripting your browser, such as greasemonkey). Since how a web page is displayed depends quite a bit on the particular browser, scripting one is obviously the most precise way of getting to what the user sees.