I need to do screen scraping and for that I need to re开发者_StackOverflowad some xml from python. I want to get a proper DOM tree out of it. How can I do that?
Check out the minidom package which also has examples.
BTW if your screen scraping is HTML don't use XML parsing. There's other stuff for that. (Question about screen scraping, Question about python HTML screen scraping).
The lxml library works well for scraping HTML. Here are some links to get you started:
- Parsing XML and HTML with lxml
- lxml: an underappreciated web scraping library
精彩评论