开发者

JavaScript regular expression iterator to extract groups

开发者 https://www.devze.com 2023-02-14 13:08 出处:网络
let\'s say we have the following text: \"1 a,2 b,3 c,4 d\" and the following expression: /\\d (\\w)/g

let's say we have the following text: "1 a,2 b,3 c,4 d" and the following expression: /\d (\w)/g

what we want to do is to extract a, b, c, d as denoted by the regular expression.

unfortunately "1 a,2 b,3 c,4 d".match(/\d (\w)/g) will produce an array: 1 a,2 b,3 c,4 d and RegExp.$1 will contain only the groups from the last match, i.e. RegExp.$1 == 'd'.

how can I iterate over this regex so th开发者_JAVA技巧at I can extract the groups as well... I am looking for a solution that is also memory efficient, i.e. some sort of iterator object

EDIT: It needs to be generic. I am only providing a simple example here. One solution is to loop over the array and reapply the regex for each item without the global flag but I find this solution a bit stupid although it seems to be like the only way to do it.


var myregexp = /\d (\w)/g;
var match = myregexp.exec(subject);
while (match != null) {
    // matched text: match[0]
    // match start: match.index
    // capturing group n: match[n]
    match = myregexp.exec(subject);
}

(shamelessly taken from RegexBuddy)


A shorter, simpler (though likely less efficient) solution is to use String.prototype.replace. replace is unique in that it implicitly iterates over all matches and executes a function for each match. Sure, you can use that function to actually replace text, but despite the function name that's not really required:

"1 a,2 b,3 c,4 d".replace(/\d (\w)/g, function(complete_match, matched_letter) {
    console.log(matched_letter);
});

This will log a, b, c, then d to the console. (It will also happen to return "undefined,undefined,undefined,undefined", but we don't care about that here.)

More generally, the function argument to replace is called with the following parameters:

function(match, p1, p2, [...], offset, string)
  • match is the matching substring.
  • p1 etc. are the match's captured groups, if any. The groups are in order of the opening parenthesis they correspond to (i.e. leftmost first, outer first). If the group matches multiple substrings (i.e. in a (.)+ scenario), only the last (rightmost) substring is captured.
  • offset is the index in the original string of this match
  • string is the string on which replace was called.

Manual iteration is likely more efficient, but this method is not slow and it's shorter and (IMHO) easier to read; I tend to use this pattern over a manual loop.


This'll work:

"1 a,2 b,3 c,4 d".match(/\w(?:,|$)/g).join(' '); // => "a, b, c, d"

If you have a need to iterate:

var r = /\d (\w)/g,
    s = "1 a,2 b,3 c,4 d",
    m;

while ( m = r.exec(s) ) {
    // `m` is your match, `m[1]` is the letter
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号