The following codes does not work, I am trying to retrive TR strings from a HTML table. Is there any issue with this code or any other solution available?
public static List<string> GetTR(string Tr)
{
List<string> trContents = new List<string>();
string regexTR = @"<(tr|TR)[^<]+>((\s*?.*?)*?)<\/(tr|TR)>";
MatchCollection tr_Matches = Regex.Matches(Tr, regexTR, RegexOptions.Singleline);
foreach (Match match in tr_Matches)
{
trContents.Add(match.Value);
}
return trContents;
}
Sample input string is given below:
"<TR><TD noWrap align=left>abcd</TD><TD noWrap align=left>SPORT</TD><TD align=left>5AT</开发者_JAVA技巧TD></TR>"
Parsing HTML with regular expressions is asking for trouble.
Do the job properly using something like HTML Agility Pack.
I think this regular expression would be more appropriate:
<(tr|TR)[^>]*>.*<\/\1>
this regex matches your input string:
<(tr|TR)+>((\s*?.*?)*?)<\/(tr|TR)>
i removed "[^<]"... not sure why you need that. also, try to add a non-greedy match...
however, it is better to go with something like HTML Agility Pak (if you want to keep your sanity) :)
(<(tr|TR)[^<]*>)(.+)((<\(tr|TR)[^<]*>)
精彩评论