How do I remove these tags with JavaScript_问答_开发者

How do I remove these tags with JavaScript

开发者 https://www.devze.com 2023-01-20 12:54 出处：网络

I\'m still learning regex (obviously) and i can\'t figure it out, and i want to do it the right way rather than doing it the long way. How can I:

I'm still learning regex (obviously) and i can't figure it out, and i want to do it the right way rather than doing it the long way. How can I:

Find all  or  and replace with a \n except the first  and last  in which case replace with nothing, just remove, and for  ,   and   replace with 开发者_StackOverflow社区\n also.

With Regex OR something else. I'm getting this from a jQuery $.get() return. So, please don't flame me about it, I just don't know how to do it.

Javascript has rather nice tools for dealing with an xml (or xhtml) DOM. Use those.

In Regex perspective, to make the first  become an exception, you must identify a pattern which makes the first  fails. For example, if text before first  is abcxyz, that is, abcxyz, then you search every  which is not preceded by abcxyz, so that the first  doesn't match. Using regex, it becomes: (?<!abcxyz)

To make the last  become an exception, you must identify a pattern which makes the last  fails. For example, if text after last  is abcxyz, that is, abcxyz, then you search every  which is not followed by abcxyz, so that the last  doesn't match. Using regex, it becomes: (?!abcxyz)

Although JavaScript support positive and negative look-ahead, unfortunately, JavaScript regex doesn't support neither positive nor negative look-behind. Indeed, there are some dirty tricks to mimic look-behind in JavaScript, however, not all look-behind construct can be mimicked.

Thus, if possible, try to identify a pattern which makes the first  fails, but use negative look-ahead.

To replace the first  and the last  with nothing, you can inverse the logic we use above, and you have to do this in separate step.

To replace  ,  ,   with \n, search for: <br\s*\/?>, and replace with \n.

One way to do this would be to allow the browser to do it for you. In IE and WebKit, you could assign your HTML as the innerHTML of a <div> and get its innerText. However, that won't work in Firefox or Opera. Here's a slightly bizarre use of the Selection object that will do it:

function getInnerText(html) {
    var text = "";
    var div = document.createElement("div");
    div.innerHTML = html;

    document.body.appendChild(div);
    if (typeof window.getSelection != "undefined") {
        var sel = window.getSelection();
        sel.removeAllRanges();
        var range = document.createRange();
        range.selectNodeContents(div);
        sel.addRange(range);
        text = sel.toString();
        sel.removeAllRanges();
    } else if (document.body.createTextRange != "undefined") {
        var range = document.body.createTextRange();
        range.moveToElementText(div);
        text = range.text;
    }
    document.body.removeChild(div);
    return text.replace(/\r\n/g, "\n").replace(/\r/g, "\n");
}