开发者

Shortened HTML text and malformed tags

开发者 https://www.devze.com 2022-12-15 09:03 出处:网络
In my web application I intend to shorten a lengthy string of HTML formatted text if it is more than 300 characters long and then display the 300 characters and a Read More link on the page.

In my web application I intend to shorten a lengthy string of HTML formatted text if it is more than 300 characters long and then display the 300 characters and a Read More link on the page.

The issue I came across is when the 300 character limit is reached inside an HTML tag, example: (look for HERE)

 <a hreHERE="somewhere">link</a>
 <a hre="somewhere">liHEREnk</a>

When this happens, the entire page could become ill-formatted because everything after the HERE in the previous example is removed and the HTML tag is kept open.

I thinking of using CSS to hide any overflow beyond a certain limit and create the "Read More" link if the text is beyond a certain number, but this would entail me including all the text on the page.

I've also thought about splitting the text at . to ensure that it's split at the end of a sentence, but that would mean I would include more characters than I needed.

Is there a better way to accomplish this?

Note: I have not specified a server side开发者_运维知识库 language because this is more of a general question, but I'm using ASP.NET/C# .


Extract the plaintext from the HTML, and display that. There are libraries (like the HTML Agility Pack for .NET) that make this easy, and it's not too hard to do it yourself with an XML parser. Trying to fix a truncated HTML snippet is a losing cause.


One option I can think of is to cut it off at 300 characters and make sure the last index of '<' is less than the last index of '>'. If it is, truncate the string right before the last instance of '>', then use a library like tidy html to fix tags that are orphaned (like the </a> in the example).

There are problems with this though. One thing being if there are 300 chars worth of nothing but HTML - your summary will be displayed as empty.

If you do not need the html to be displayed it's far easier to simply extract the plain text and use that instead.

EDIT: Added using something like tidy html for orphaned tags. Original answer only solved cutting thing mid-tag, rather than within an opening/closing tag.

0

精彩评论

暂无评论...
验证码 换一张
取 消