I have the following html :
<div class="headNormal">
<h1><a href="/questions/76/specify-a-mirror-when-configuring-a-gdi-e开发者_Go百科nvironment">
Specify a Mirror when configuring a GDI environment</a></h1></div>
And i'd like to catch the "Specify a Mirror when configuring a GDI environment" thing... but i'm not sure of the regex i should use for this
So far i have : <div class="headNormal">(.*)</div>
but it doesnt give me anything.
Any help?
Based on the exact snippet you've provided, you'd want something like this:
<a .+?>(.*?)</a>
However, you're opening yourself up to a whole world of hurt if you've got to parse large HTML documents and extract the text from anchors (case-in-point is Konrad Rudolph's comment on this question). You'd be much better off with a parser.
You're not specific about the language you're using, but if it's .NET have a look at the HTML Agility Pack.
精彩评论