How can i extract title,header of a web page开发者_Python百科 directly from the internet??
You could do this using a combination of regular expressions and the WebRequest / WebResponse classes. For any web scraping needs though, i'd strongly recommend looking into using Simon Mourier's Html Agility Pack, which is much more tolerant of 'bad' HTML, and also allows you to traverse the DOM as a proper XML tree.
Step 1 - use a WebRequest to obtain a WebResponse from the web page you want to extract information from.
Step 2 - you will end up with what is essentially a string, which represents the HTML or XHTML web page, so you need to strip out the bits you want
If you have any problems with either of these steps, make sure your question includes plenty of detail about the problem.
I would use Regex to parse a pages HTML for <title>.*?</title>
.
I'm not sure how you would get the "header" though. You would need some sort of rule as to what the header looks like.
If it is just the head
tag, you can use the aforementioned title method to get that.
精彩评论