How to shrink html string size_问答_开发者_运维开发者技术经验分享

I have a listing page to display a set of products. Each item has its own well-formatted HTML description. I wanna display part of each item's description to a maximum of 200 characters excluding the html tags and html attributes.

The problem is when I cut down the html string, the return result of html string may not be well-format(may lose the end tag etc.).

Do you guys have 开发者_开发百科any idea to shrink the length of html string and output a well-format html?

For example:

the following html text is the description <p class="abc-class">0123456789</p>**

If I wanna display max 5 chars, the result I wanna to see is <p class="abc-class">01234</p>

so what you're gonna do to get the correct.

PS: remember this is a simplest situation.

Cutting the html down to size isn't a good idea because as you've stated you end up messing up the valid html. Instead, what you're wanting to do is cut down the size of the text description. To do that you'll need to extract the text you want to display and then cut it down to the size you want....

On the other hand, why not have whatever is generating the html first limit the size of the text to begin with. That way you don't need to worry about getting the text out of the html and cutting it down.

that said, it's kind of difficult to say anymore without a code sample...

In c# Truncate HTML safely for article summary I've answered this with a link to my Gist: https://gist.github.com/2413598

i would do like this:

  string value = "<p class=\"abc-class\">0123456789</p>";
  char[] delimiters = new char[] { '<', '>' };
    string[] parts = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
  string value2 = parts[1].ToString();
  //
  // here you do what you want to value2
  //

  Console.WriteLine(delimiters[0]+parts[0]+delimiters[1]+value2+delimiters[0]+parts[2]+delimiters[1]);
  Console.WriteLine(value);

you split your string and you work on the part you are interested in, then you build it again, maybe you can recycle this snippet more times.

splitting the string in this way is faster that using string.split(' ')

hope it fits your needs!

But you are generating the description from somewhere, or do you receive the whole html from another source. If you are generating the product description, I think you should do your trimming before stuffing it up in the html befroe returning it.

Your question sisn't explicitly state that you get the html like that from another source, that's why I believe the above suggestion is the easiest solution

It can be done (and I have done it), but it still leaves potential for oddly-rendered-markup, especially when CSS styles are applied. When I wrote it, I did so in Javascript, but the same approach and still be used and involves working with the DOM and not a String.

As you can see, it simply goes through and counts the found text. Once the limit is reached it truncates any remaining text in the node (adding ellipses as desired) and then stops processing further child nodes and removes all subsequent uncles and great uncles, etc in any parents or grandparents, etc. This could (and arguably should) be adapted to use a non-mutating approach.

You are free to use any ideas/strategies/code from below you see fit.

/*
    Given a DOM Node truncate the contained text at a certain length.
    The truncation happens in a depth-first manner.

    Any elements that exist past the exceeded length are removed
    (this includes all future children, siblings, cousins and whatever else)
    and the text in the element in which the exceed happens is truncated.

    NOTES:
    - This modifieds the original node.
    - This only supports ELEMENT and TEXT node types (other types are ignored)

    This function return true if the limit was reached.
*/
truncateNode : function (rootNode, limit, ellipses) {
    if (arguments.length < 3) {
        ellipses = "..."
    }

    // returns the length found so far.
    // if found >= limit then all FUTURE nodes should be removed
    function truncate (node, found) {
        var ELEMENT_NODE = 1
        var TEXT_NODE = 3

        switch (node.nodeType) {
            case ELEMENT_NODE:
                var child = node.firstChild
                while (child) {
                    found = truncate(child, found)
                    if (found >= limit) {
                        // remove all FUTURE elements
                        while (child.nextSibling) {
                            child.parentNode.removeChild(child.nextSibling)
                        }
                    }
                    child = child.nextSibling
                }
                return found
            case TEXT_NODE:
                var remaining = limit - found
                if (node.nodeValue.length < remaining) {
                    // still room for more (at least one more letter)
                    return found + node.nodeValue.length
                }
                node.nodeValue = node.nodeValue.substr(0, remaining) + ellipses
                return limit
            default:
                // no nothing
        }
    }

    return truncate(rootNode, 0)    
},

Well, I really must be bored. Here it is in C#. Almost the same. Still should be updated to be non-mutative. Exercise to the reader, blah, blah...

class Util
{

    public static string
    LazyWrapper (string html, int limit) {
        var d = new XmlDocument();
        d.InnerXml = html;
        var e = d.FirstChild;
        Truncate(e, limit);
        return d.InnerXml;
    }

    public static void
    Truncate(XmlNode node, int limit) {
        TruncateHelper(node, limit, 0);
    }

    public static int
    TruncateHelper(XmlNode node, int limit, int found) {
        switch (node.NodeType) {
        case XmlNodeType.Element:
            var child = node.FirstChild;
            while (child != null) {
                found = TruncateHelper(child, limit, found);
                if (found >= limit) {
                    // remove all FUTURE elements
                    while (child.NextSibling != null) {
                        child.ParentNode.RemoveChild(child.NextSibling);
                    }
                }
                child = child.NextSibling;
            }
            return found;
        case XmlNodeType.Text:
            var remaining = limit - found;
            if (node.Value.Length < remaining) {
                // still room for more (at least one more letter)
                return found + node.Value.Length;
            }
            node.Value = node.Value.Substring(0, remaining);
            return limit;
        default:
            return found;
        }
    }

}

Usage and result:

Util.LazyWrapper(@"<p class=""abc-class"">01<x/>23456789<y/></p>", 5)
// => <p class="abc-class">01<x />234</p>

How to shrink html string size

精彩评论

关注公众号

热门标签

图文推荐

How to shrink html string size

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：