开发者

How to tell if a html string will show as empty in the browser? (C# / regex?)

开发者 https://www.devze.com 2022-12-18 01:34 出处:网络
I have a control that will return some html to me a开发者_开发知识库s a string. Before putting that on screen, I\'d like to be able to tell if it\'ll just show as empty.

I have a control that will return some html to me a开发者_开发知识库s a string. Before putting that on screen, I'd like to be able to tell if it'll just show as empty.

For example the control might return <p><br /></p>, which when I test using C# for string.Emtpy obviously it's not - but nothing gets displayed on screen.

Is there a regex function to test whether html will actually show any text on screen? Or using C# - is there any function to test the string containing html to see whether it actually contains anything other than tags?

Cheers, I'm a little confused how to get around this without writing some custom parser, a road I don't want to have to go down!


As answered by @Ignacio you should use something like the HTML Agility pack. Here's a sample bit of code that seems to work for your situation.

HtmlDocument docEmpty = new HtmlDocument();
docEmpty.LoadHtml("<p><br /></p>");

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<p>I am not empty...<br /></p>");

bool shouldBeEmpty = string.IsNullOrEmpty(docEmpty.DocumentNode.InnerText);
bool shouldNotByEmpty = string.IsNullOrEmpty(doc.DocumentNode.InnerText);

Note: This sample uses the http://html-agility-pack.net/?z=codeplex parser.


As suggested by others, you can use a HTML parser, which is a solid way to handle your need. But I think it would add much overhead, since the parser has to do a lot of stuff to understand the HTML code.

Maybe your idea to use regex is not so bad. It should be quicker too. I suggest you use Regex to replace every opening and closing tag with empty string. Everything that is not replaced should be some text to appear in the internet browser ...

string input = "<p> <br />  </p>";
  string pattern = "<[^<>^]+?>";
  string replacement = "";
  string result1 = Regex.Replace(input, pattern,replacement);
  pattern = "[\s\t\n]*"; ///filter for space, new line, tab 
  string result_final = Regex.Replace(result1 , pattern, replacement);
  if (string.IsNullOrEmpty(result_final)) ... /// empty html


Don't write a custom parser, just use an existing parser and apply some search rules to it.


Not sure if it's relevant but I made this test, and it seems to be what the OP wants, without using any external library (but requiring .Net > 3.0)

XElement docEmpty = XElement.Parse("<p><br /></p>");
Console.WriteLine(string.IsNullOrEmpty(docEmpty.Value)); // Outputs True.

XElement doc = XElement.Parse("<p>This is a test<br /></p>");
Console.WriteLine(string.IsNullOrEmpty(doc.Value)); // Outputs False.


The problem with @kane's answer is that there are times where the innertext is legitimately empty...such as

<p><a href="http://somewhere.com"><img src="image/page" /></a></p>

If you simply rely on innertext, then that guy up top would be flagged as empty.

I love the HTML Agility Pack, but be sure to check the innerhtml as well...

0

精彩评论

暂无评论...
验证码 换一张
取 消