开发者

Removing duplicates in a comma-separated list with a regex?

开发者 https://www.devze.com 2022-12-16 06:07 出处:网络
I\'m trying to figure out how to filter out duplicates in a string with a regular expression, where the string is comma separated. I\'d like to do this in 开发者_如何学Cjavascript, but I\'m getting ca

I'm trying to figure out how to filter out duplicates in a string with a regular expression, where the string is comma separated. I'd like to do this in 开发者_如何学Cjavascript, but I'm getting caught up with how to use the back-references.

For example:

1,1,1,2,2,3,3,3,3,4,4,4,5

Becomes:

1,2,3,4,5

Or:

a,b,b,said,said, t, u, ugly, ugly

Becomes

a,b,said,t,u,ugly


Why use regex when you can do it in javascript code? Here is sample code (messy though):

var input = 'a,b,b,said,said, t, u, ugly, ugly';
var splitted = input.split(',');
var collector = {};
for (i = 0; i < splitted.length; i++) {
   key = splitted[i].replace(/^\s*/, "").replace(/\s*$/, "");
   collector[key] = true;
}
var out = [];
for (var key in collector) {
   out.push(key);
}
var output = out.join(','); // output will be 'a,b,said,t,u,ugly'

p/s: that one regex in the for-loop is to trim the tokens, not to make them unique


If you insist on RegExp, here's an example in Javascript:

"1,1,1,2,2,3,3,3,3,4,4,4,5".replace (
    /(^|,)([^,]+)(?:,\2)+(,|$)/ig, 
    function ($0, $1, $2, $3) 
    { 
        return $1 + $2 + $3; 
    }
);

To handle trimming of whitespace, modify slightly:

"1,1,1,2,2,3,3,3,3,4,4,4,5".replace (
    /(^|,)\s*([^,]+)\s*(?:,\s*\2)+\s*(,|$)\s*/ig, 
    function ($0, $1, $2, $3) 
    { 
        return $1 + $2 + $3; 
    }
);

That said, it seems better to tokenise via split and handle duplicates.


Here's a example:

s/,([^,]+),\1/,$1/g;

Perl regex substitution, but should be convertible to JS-style by anyone who knows the syntax.


I don't use Regular Expressions for that.

Here's the function I use. It accepts a string containing comma separated values and returns an array of unique values regardless of position in the original string.

Note: If you pass CSV string containing quoted values, Split will not treat commas inside quoted values any differently. So if you want to handle real CSV, you are best to use a 3rd party CSV parser.

function GetUniqueItems(s)
{
    var items=s.split(",");

    var uniqueItems={};

    for (var i=0;i<items.length;i++)
    {           
        var key=items[i];
        var val=items[i];
        uniqueItems[key]=val;
    }

    var result=[];

    for(key in uniqueItems)
    {
        // Assign to output result field using hasOwnProperty so we only get 
        // relevant items
        if(uniqueItems.hasOwnProperty(key))
        {
            result[result.length]=uniqueItems[key];
        }
    }    
    return result;
}


With javascript regex

x="1,1,1,2,2,3,3,3,3,4,4,4,5"

while(/(\d),\1/.test(x))
    x=x.replace(/(\d),\1/g,"$1")

1,2,3,4,5


x="a,b,b,said,said, t, u, ugly, ugly"

while(/\s*([^,]+),\s*\1(?=,|$)/.test(x))
    x=x.replace(/\s*([^,]+),\s*\1(?=,|$)/g,"$1")

a,b,said, t, u,ugly

Not well tested, let me know if there is any issue.

0

精彩评论

暂无评论...
验证码 换一张
取 消