开发者

A regular expression for anchor html tag in C#?

开发者 https://www.devze.com 2023-03-16 22:27 出处:网络
I need a regular expression in C# for anchor tag in html source codes as general as it\'s possible开发者_如何学JAVA. Consider this html code:

I need a regular expression in C# for anchor tag in html source codes as general as it's possible开发者_如何学JAVA. Consider this html code:

<a id="[constant]"
      href="[specific]"
    >GlobalPlatform Card Specification 2.2
    March, 2006</a>

By [constant] I mean the value is a constant string so there is no problem with it. By [specific] I mean the address is a simple and specific string so the regular expression for it, is simple. The main problem is that I can not handle the newline character in the middle of title of anchor tag. I wrote this regular expression previously that works well except handling the newline character between title of anchor tag.

<a[\\s\\n\\r]+id=\"[constant]"[\\s\\n\\r]+href=\"[specific]"[\\s\\n\\r]*>[\\s\\n\\r]*[^\\n\\r]+[\\s\\n\\r]*</a>

Please help me


You should stay away from regular expressions when it comes to parse HTML and use an HTML parser like the HTML Agility Pack.

And to help you get started check how simple it can be to parse that single anchor tag.

HtmlDocument doc = new HtmlDocument();

doc.LoadHtml(@"<a id=""[constant]""
      href=""[specific]""
    >GlobalPlatform Card Specification 2.2
    March, 2006</a>
");

var anchor = doc.DocumentNode.Element("a");

Console.WriteLine(anchor.Id);
Console.WriteLine(anchor.Attributes["href"].Value);

Beats regular expressions, don't you think? :)


if you are using C# you can define option multiline while creating Regex,

Regex r = new Regex(pattern, RegexOptions.Multiline);
0

精彩评论

暂无评论...
验证码 换一张
取 消