I have this html:
<a hre开发者_JAVA百科f="http://www.site.com/">This is the content.</a>
I just need to get rid of the anchor tag html around the content text, so that all I end up with is "This is the content".
Can I do this using Regex.Replace?
Your regex: <a[^>]+?>(.*?)</a>
Check this Regex with the Regex-class and iterate through the result collection and you should get your inner text.
String text = "<a href=\"link.php\">test</a>";
Regex rx = new Regex("<a[^>]+?>(.*?)</a>");
// Find matches.
MatchCollection matches = rx.Matches(text);
// Report the number of matches found.
Console.WriteLine("{0} matches found. \n", matches.Count);
// Report on each match.
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
Console.WriteLine("Groups:");
foreach (var g in match.Groups)
{
Console.WriteLine(g.ToString());
}
}
Console.ReadLine();
Output:
1 matches found.
<a href=\"link.php\">test</a>
Groups:
<a href=\"link.php\">test</a>
test
The match expression in ()
is stored in the second item of match
's Groups
collection (the first item is the whole match itself). Each expression in ()
gets into the Groups
collection. See the MSDN for further information.
If you had to use Replace, this'd work for simple string content inside the tag:
Regex r = new Regex("<[^>]+>");
string result = r.Replace(@"<a href=""http://www.site.com/"">This is the content.</a>", "");
Console.WriteLine("Result = \"{0}\"", result);
Good luck
You could also use groups in Regex.
For example, the following would give you the content of any tag.
Regex r = new Regex(@"<a.*>(.*)</a>");
// Regex r = new Regex(@"<.*>(.*)</.*>"); or any kind of tag
var m = r.Match(@"<a href=""http://www.site.com/"">This is the content.</a>");
string content = m.Groups[1].Value;
you use groups in regexes by using the parenthesis, although group 0 is the whole match, not just the group.
精彩评论