I'm trying to grab content from a website, parse it, extract what I want, and put it in a database.
I'm using PHP.
I'开发者_StackOverflow中文版ve built a script using cURL that goes through the pages I need and grabs the HTML content. Now, from what I understand, I need a tool or library that will allow me to take that string full of the page's HTML and parse it.
Any tips on how (best) to do that with PHP?
If you need to follow HTML structure, use DOM
extension. Use method loadHTML
to load the data and then you can use the data either as DOMDocument or SimpleXML document (use simplexml_import_dom
to convert).
If you just need to extract some stuff and not bother with understanding document structure, use regular expressions.
精彩评论