I have the following regular expression:
(?:<(?<tag>\w*)>(?<text>.*)</\k<tag>>)
I want it t grab the text within the first HTML element.
eg.
<p>This should capture</p>This shouldn't
Works, but ...
<p>This should capture</p><p>This shouldn't</p>
Doesn't work. As you'd expect, it returns:
This shou开发者_运维百科ld capture</p><p>This shouldn't
I'm racking my brains here. How can I just have it select the FIRST inner text?
(I'm trying to be tag-agnostic, so <strong>This should match</strong>
is equally appropriate, etc.)
You should use the HTML Agility Pack.
For example:
doc.DocumentNode.Descendants("p").First().InnerText
Stop. Just stop. If you are parsing HTML, use an HTML parser (or XML if you're dealing with valid XHTML). See this answer for more info.
In order to have a non-greedy * selection, you should add an ? after the *.
(?:<(?<tag>\w*)>(?<text>.*?)</\k<tag>>)
精彩评论