Regexp to wrap each word on HTML page_问答_开发者

开发者 https://www.devze.com 2023-03-29 12:09 出处：网络

Is it possible to wrap each word on HTML page with span element? I\'m trying something like /(\\s*(?:<\\/?\\w+[^>]*>)|(\\b\\w+\\b))/g

Is it possible to wrap each word on HTML page with span element? I'm trying something like

/(\s*(?:<\/?\w+[^>]*>)|(\b\w+\b))/g

but results 开发者_如何学运维far from what I need.

Thanks in advance!

Well, I don't ask for the reason, you could do it like this:

function getChilds( nodes ) {
    var len = nodes.length;

    while( len-- ) {
        if( nodes[len].childNodes && nodes[len].childNodes.length ) {
            getChilds( nodes[len].childNodes );
        }

        var content = nodes[len].textContent || nodes[len].text;

        if( nodes[len].nodeType === 3 ) {
            var parent = nodes[len].parentNode,
                newstr = content.split(/\s+/).forEach(function( word ) {
                    var s = document.createElement('span');
                    s.textContent = word + ' ';

                    parent.appendChild(s);
                });

            parent.removeChild( nodes[len] );
        }
    };
}

getChilds( document.body.childNodes );

Even tho I have to admit I didn't test the code yet. That was just the first thing which came to my mind. Might be buggy or screw up completely, but for that case I know the gentle and kind stackoverflow community will kick my ass and downvote like hell :-p

You're going to have to get down to the "Text" nodes to make this happen. Without making it specific to a tag, you really to to traverse every element on the page, wrap it, and re-append it.

With that said, try something like what a garble post makes use of (less making fitlers for words with 4+ characters and mixing the letters up).

To get all words between span tags from current page, you can use:

var spans = document.body.getElementsByTagName('span');
if (spans)
{
  for (var i in spans)
  {
    if (spans[i].innerHTML && !/[^\w*]/.test(spans[i].innerHTML))
    {
      alert(spans[i].innerHTML);
    }
  }
}
else
{
  alert('span tags not found');
}

You should probably start off by getting all the text nodes in the document, and working with their contents instead of on the HTML as a plain string. It really depends on the language you're working with, but you could usually use a simple XPath like //text() to do that.

In JavaScript, that would be document.evaluate('//text()', document.body, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null), than iterating over the results and working with each text node separately.

See demo

Here's how I did it, may need some tweaking...

var wrapWords = function(el) {
    var skipTags = { style: true, script: true, iframe: true, a: true },
        child, tag;

    for (var i = el.childNodes.length - 1; i >= 0; i--) {
        child = el.childNodes[i];
        if (child.nodeType == 1) {
            tag = child.nodeName.toLowerCase();
            if (!(tag in skipTags)) { wrapWords(child); }
        } else if (child.nodeType == 3 && /\w+/.test(child.textContent)) {
            var si, spanWrap;
            while ((si = child.textContent.indexOf(' ')) >= 0) {
                if (child != null && si == 0) {
                    child.splitText(1);
                    child = child.nextSibling;
                } else if (child != null) {
                    child.splitText(si);
                    spanWrap = document.createElement("span");
                    spanWrap.innerHTML = child.textContent;
                    child.parentNode.replaceChild(spanWrap, child);
                    child = spanWrap.nextSibling;
                }
            }
            if (child != null) {
                spanWrap = document.createElement("span");
                spanWrap.innerHTML = child.textContent;
                child.parentNode.replaceChild(spanWrap, child);
            }
        }
    }
};

wrapWords(document.body);

See demo