I have a small script where I am gathering the HTML of a URL. This works fine and it brings back the HTML. The problem is, this url is rendering some DIVs after the page has loaded so I can see those DIVs when I open it in b开发者_如何学编程rowser but not when I use curl or file get contents. What would be the solution for this?
No. You have no reliable chance to run javascript throught php. However, you can sniff ajax requests in your debugger and take urls & get them too. You just need to create your own parser
this url is rendering some DIVs
That doesn't make any semantic sense. A url is an address of some data - which may include code and references to other URLs. The URL doesn't "render" anything.
If you mean that the page referenced by the URL renders divs - that makes a bit more sense.
It may be that the server is supplying different content based on the request headers (e.g. user-agent or cookies). Or it may be that javascript invoked from the page is rendering additional content into the HTML.
To find out which, just disable javascript in your browser. If the divs are still rendered then the server is delivering different content based on the user agent - you just need to fake the user-agent in your request.
OTOH if the content is added via javascript, then it will be a big task to implement this using PHP.
you may have a possibility using some javascript interpreter with the downloaded page. It's possible to use Rhino shell
on the command line (and you can execute this in php via system()
or shell_exec()
.
It seems though that you may have a hard time parsing the html and feeding Rhino just the javascript in that page (I don't see any option to parse an html file), but sure there are other js interpreters and maybe one suits your needs: wikipedia page on JS engines
精彩评论