How can I improve the performance of my JavaScript text formatter?_问答_开发者

I am allowing my users to wrap words with "*", "/", "_", and "-" as a shorthand way to indicate they'd like to bold, italicize, underline, or strikethrough their text. Unfortunately, when the page is filled with text using this markup, I'm seeing a noticeable (borderline acceptable) slow down.

Here's the JavaScript I wrote to handle this task. Can you please provide feedback on how开发者_如何学JAVA I could speed things up?

function handleContentFormatting(content) {
    content = handleLineBreaks(content);

    var bold_object = {'regex': /\*(.|\n)+?\*/i, 'open': '<b>', 'close': '</b>'};
    var italic_object = {'regex': /\/(?!\D>|>)(.|\n)+?\//i, 'open': '<i>', 'close': '</i>'};
    var underline_object = {'regex': /\_(.|\n)+?\_/i, 'open': '<u>', 'close': '</u>'};
    var strikethrough_object = {'regex': /\-(.|\n)+?\-/i, 'open': '<del>', 'close': '</del>'};

    var format_objects = [bold_object, italic_object, underline_object, strikethrough_object];

    for( obj in format_objects ) {
        content = handleTextFormatIndicators(content, format_objects[obj]);
    }

    return content;
}

//@param obj --- an object with 3 properties:
//      1.) the regex to search with
//      2.) the opening HTML tag that will replace the opening format indicator
//      3.) the closing HTML tag that will replace the closing format indicator
function handleTextFormatIndicators(content, obj) {
    while(content.search(obj.regex) > -1) {
        var matches = content.match(obj.regex);
        if( matches && matches.length > 0) {
            var new_segment = obj.open + matches[0].slice(1,matches[0].length-1) + obj.close;
            content = content.replace(matches[0],new_segment);
        }
    }
    return content;
}

Change your regex with the flags /ig and remove the while loop.
Change your for(obj in format_objects) loop with a normal for loop, because format_objects is an array.

Update

Okay, I took the time to write an even faster and simplified solution, based on your code:

function handleContentFormatting(content) {
    content = handleLineBreaks(content);

    var bold_object = {'regex': /\*([^*]+)\*/ig, 'replace': '<b>$1</b>'},
        italic_object = {'regex': /\/(?!\D>|>)([^\/]+)\//ig, 'replace': '<i>$1</i>'},
        underline_object = {'regex': /\_([^_]+)\_/ig, 'replace': '<u>$1</u>'},
        strikethrough_object = {'regex': /\-([^-]+)\-/ig, 'replace': '<del>$1</del>'};

    var format_objects = [bold_object, italic_object, underline_object, strikethrough_object],
        i = 0, foObjSize = format_objects.length;

    for( i; i < foObjSize; i++ ) {
        content = handleTextFormatIndicators(content, format_objects[i]);
    }

    return content;
}

//@param obj --- an object with 2 properties:
//      1.) the regex to search with
//      2.) the replace string
function handleTextFormatIndicators(content, obj) {
    return content.replace(obj.regex, obj.replace);
}

Here is a demo.

This will work with nested and/or not nested formatting boundaries. You can omit the function handleTextFormatIndicators altogether if you want to, and do the replacements inline inside handleContentFormatting.

Your code is forcing the browser to do a whole lot of repeated, wasted work. The approach you should be taking is this:

Concoct a regex that combines all of your "target" regexes with another that matches a leading string of characters that are not your special meta-characters.
Change the loop so that it does the following:
1. Grab the next match from the source string. That match, due to the way you changed your regex, will be a string of non-meta characters followed by your matched portion.
2. Append the non-meta characters and the replacement for the target portion onto a separate array of strings.
At the end of that process, the separate accumulator array can be joined and used to replace the content.

As to how to combine the regular expressions, well, it's not very pretty in JavaScript but it looks like this. First, you need a regex for a string of zero or more "uninteresting" characters. That should be the first capturing group in the regex. Next should be the alternates for the target strings you're looking for. Thus the general form is:

var tokenizer = /(uninteresting pattern)?(?:(target 1)|(target 2)|(target 3)| ... )?/;

When you match that against the source string, you'll get back a result array that will contain the following:

result[0] - entire chunk of string (not used)
result[1] - run of uninteresting characters
result[2] - either an instance of target type 1, or null
result[3] - either an instance of target type 2, or null
...

Thus you'll know which kind of replacement target you saw by checking which of the target regexes are non empty. (Note that in your case the targets can conceivably overlap; if you intend for that to work, then you'll have to approach this as a full-blown parsing problem I suspect.)

You can do things like:

function formatText(text){
    return text.replace(
        /\*([^*]*)\*|\/([^\/]*)\/|_([^_]*)_|-([^-]*)-/gi,
        function(m, tb, ti, tu, ts){
            if(typeof(tb) != 'undefined')
                return '<b>' + formatText(tb) + '</b>';
            if(typeof(ti) != 'undefined')
                return '<i>' + formatText(ti) + '</i>';
            if(typeof(tu) != 'undefined')
                return '<u>' + formatText(tu) + '</u>';
            if(typeof(ts) != 'undefined')
                return '<del>' + formatText(ts) + '</del>';
            return 'ERR('+m+')';
        }
    );
}

This will work fine on nested tags, but will not with overlapping tags, which are invalid anyway.

Example at http://jsfiddle.net/m5Rju/