开发者

Regex to replace < or > with &gt; or &lt; inside html tag

开发者 https://www.devze.com 2023-01-23 21:00 出处:网络
fo开发者_StackOverflow中文版r example. <html> <head></head> <body> <div>

fo开发者_StackOverflow中文版r example.

<html>
<head></head>
<body>
<div>
<h1>-----> hello! ----< </h1>
</div>
</body>

I want to replace the > and < inside the h1 tag with the corresponding > and <

which is the correct pattern?

thanks in advance!


In agreement with the commenter "Why is this broken HTML being generated in the first place?", if you represent documents like this then you will have exactly these problems that you are currently having. There are two valid situations

  • You have some data (not HTML escaped) e.g. a bunch of strings in PHP
  • You have an HTML document, containing tags, and text which is HTML escaped

So when you generate the HTML document from your source data (strings, database) you need to do the escaping them (e.g. by using htmlspecialchars as another answerer correctly pointed out.)

You need to avoid, at all costs, a situation where you have a string like you have, which has HTML tags and non-escaped text.

For example, if you text contained the text <b>text</b> and you literally wanted that text to be displayed in the HTML document i.e. you wanted the angle-brackets to be seen rather than the text be in bold (e.g. you were writing a document about how to program HTML) then you have no way to differentiate that from actual HTML code once you have such a document.


You could throw it at tidy (see the docs) and see if it can fix the errors. A lot better than trying to do the "right thing" on your own with regex.

$html = <<<EOT
<html>
<head></head>
<body>
<div>
<h1>-----> hello! ----< </h1>
</div>
</body>
EOT;

$config = array ( 
  'clean'                       => true, 
  'drop-proprietary-attributes' => true, 
  'output-xhtml'                => false, 
  'show-body-only'              => false, 
  'wrap'                        => '0'
); 

$tidy = new tidy();
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();

echo tidy_get_output($tidy);

It might be that you must enable tidy first in your PHP environment.


I would pass it through tidy.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号