C# Regex parsing HTML [duplicate]_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-21 23:57 出处：网络

This question already has answers here: How do I remove all HTML tags from a string without knowing which 开发者_如何学Pythontags are in it?

相关专题：regex

This question already has answers here: How do I remove all HTML tags from a string without knowing which 开发者_如何学Pythontags are in it? (5 answers) Closed 2 years ago.

My Value 1

I need "My Value 1" Please Help Me. C# language

As HTML code is very "unpredictable" I would recommend using a HTML parsing kit. Which programming language do you use? In .NET I have used HTML Agility Pack with great success. In Java HTML Parser might be handy (though I have not worked with it yet).

You cannot properly parse HTML with regular expressions because regexps can't handle the nesting allowed by HTML. To do it properly. For that one line you show, you can use a regexp but you can't count on that line remaining identical so must use SAX/DOM for the task generally.

I think parsing HTML using Regexes is not a wise idea, as highlighted by spa. A classic previous answer to a similar question is RegEx match open tags except XHTML self-contained tags

c# language

 string input = "<tr><TD><FONT size=\"2\">My Value 1</FONT></TD></tr>";
 string pattern = @"<[^>]*?>";
 string output = Regex.Replace(input, pattern, ""); //My Value 1

Just to remove all html tags.

function stripTags(markup){
  return markup.replace(/\s*<[^>]*?>\s*/gim,'');
}

This assumes all you really want is the inner text represented by "My Value 1" above.

Try:

/<tr>\s*<td>\s*<font.*?>(.*?)<\/font>\s*<\/td>\s*<\/tr>/i

Used in PHP:

<?php

if(preg_match('/<tr>\s*<td>\s*<font.*?>(.*?)<\/font>\s*<\/td>\s*<\/tr>/i',
              '<tr><TD><FONT size="2">My Value 1</FONT></TD></tr>',$matches))
        echo $matches[1]; // prints My Value 1
?>

if you are using PHP, split on </FONT>

$string='<tr><TD><FONT size="2">My Value 1</FONT></TD></tr>';
$s = explode('</FONT>',$string);
foreach ($s as $v){
     if ( strpos($v,"<FONT") !==FALSE) {
        $t = explode(">",$v);
        print end($t)."\n";
    }

}

output

$ php test.php
My Value 1

in perl I would use

my $string='<tr><TD><FONT size="2">My Value 1</FONT></TD></tr>';
$string =~ m/(<.*?>)*([^<]*)(<.*?>)*/;
print $2;

to get the desired result. The last part is not strictly necessary,

(<.*?>)*([^<]*)

will work as well

If you want to get the contents within the tags I think the following Regexp is enough:

^<.*>([^<>]+)<.*>$

It will only work if there really is any data between the tags somewhere, otherwise it will give a no-match.

C# Regex parsing HTML [duplicate]

精彩评论

关注公众号

热门标签

图文推荐

C# Regex parsing HTML [duplicate]

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：