开发者

Close tags from a truncated HTML string

开发者 https://www.devze.com 2023-01-21 22:47 出处:网络
I have inherited a site with a news section that displays a summary of the news article. For whatever reason the creators decided that displaying the first X characters of the article would be fine. O

I have inherited a site with a news section that displays a summary of the news article. For whatever reason the creators decided that displaying the first X characters of the article would be fine. Of course this very quickly led to the summary being something like:

<p>What a mighty fine <a href="blah">da
<p>What a mighty fine and warm <a href="htt
<p>His name was &quot;Emil&qu

Which quite obviously screws with the page, especially when the opening tags aren't even closed.

What I'm after is a way to close all open tags within the string being taken. I开发者_JAVA技巧 really really don't want to use regex to do it. I'm sure there's a nice parser that can do it easily, I just can't seem to find it right now.


The best thing is probably to find a better algorithm for generating the excerpt, for example by running strip_tags before the truncation.

How will you otherwise handle hard-to-find-programmatically errors such as <p>What a mighty fine and warm <a href="htt or <p>His name was &quot;Emil&qu?


Have you taken a look at Tidy?

Example:

$options = array("show-body-only" => true); 
$tidy = tidy_parse_string("<B>Hello</I> How are <U> you?</B>", $options);
tidy_clean_repair($tidy);
echo $tidy;

Outputs:

<b>Hello</b> How are <u>you?</u> 


I would install the PHP bindings for Tidy. You can then use this to clean up an HTML fragment using the following code:

<?php

$fragment = '<p>What a mighty fine <a href="blah">da';

$tidy = new tidy();

$tidy->parseString($fragment,array('show-body-only'=>true),'utf8');
$tidy->cleanRepair();

echo $tidy;
0

精彩评论

暂无评论...
验证码 换一张
取 消