Need some quick C# regex help_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-30 21:27 出处：网络

I have this html: <a hre开发者_JAVA百科f=\"http://www.site.com/\">This is the content.</a>

相关专题：regex

I have this html:

<a hre开发者_JAVA百科f="http://www.site.com/">This is the content.</a>

I just need to get rid of the anchor tag html around the content text, so that all I end up with is "This is the content".

Can I do this using Regex.Replace?

Your regex: <a[^>]+?>(.*?)</a>

Check this Regex with the Regex-class and iterate through the result collection and you should get your inner text.

String text = "<a href=\"link.php\">test</a>";

Regex rx = new Regex("<a[^>]+?>(.*?)</a>");
// Find matches.
MatchCollection matches = rx.Matches(text);

// Report the number of matches found.
Console.WriteLine("{0} matches found. \n", matches.Count);

// Report on each match.
foreach (Match match in matches)
{
    Console.WriteLine(match.Value);

    Console.WriteLine("Groups:");
    foreach (var g in match.Groups)
    {
        Console.WriteLine(g.ToString());
    }
}

Console.ReadLine();

Output:

  1 matches found. 
  <a href=\"link.php\">test</a> 
  Groups:
  <a href=\"link.php\">test</a> 
  test

The match expression in () is stored in the second item of match's Groups collection (the first item is the whole match itself). Each expression in () gets into the Groups collection. See the MSDN for further information.

If you had to use Replace, this'd work for simple string content inside the tag:

Regex r = new Regex("<[^>]+>");
string result = r.Replace(@"<a href=""http://www.site.com/"">This is the content.</a>", "");
Console.WriteLine("Result = \"{0}\"", result);

Good luck

You could also use groups in Regex.

For example, the following would give you the content of any tag.

      Regex r = new Regex(@"<a.*>(.*)</a>"); 
      // Regex r = new Regex(@"<.*>(.*)</.*>"); or any kind of tag

        var m = r.Match(@"<a href=""http://www.site.com/"">This is the content.</a>");

        string content = m.Groups[1].Value;

you use groups in regexes by using the parenthesis, although group 0 is the whole match, not just the group.