开发者

tr1::regex regex_search problem

开发者 https://www.devze.com 2023-01-13 06:54 出处:网络
I\'m using tr1::regex to try to extract some matches from a string. An example string could be asdf werq \"one two three\" asdf

I'm using tr1::regex to try to extract some matches from a string. An example string could be

asdf werq "one two three" asdf

And I would want to get out of that:

asdf  
werq  
one two three  
asdf  

With stuff in quotes grouped together, so I'm trying to use the regex \"(.+?)\"|([^\\s]+). The code I'm using is:

cmatch res;
regex reg("\"(.+?)\"|([^\\s]+)", regex_constants::icase);
regex_search("asdf werq \"one two three\" asdf", res, reg);

cout << res.size() << endl;
for (unsigned int i = 0; i < 开发者_运维技巧res.size(); ++k) {
    cout << res[i] << endl;
}

but that outputs

3
asdf

asdf

What am I doing wrong?


It appears that your regex engine does not support lookbehind assertions. To avoid using lookbehinds, you can try the following:

"([^"]*)"|(\S+)

or quoted:

"\"([^\"]*)\"|(\\S+)"

This regex will work, but each match will have two captures, one of which will be empty (either the first -- in case of a non-quoted word, or the second -- in case of a quoted string).

To be able to use this you need to iterate over all matches, and for each match use the non-empty capture.

I don't know enough about TR1, so I don't know exactly how one iterates over all matches. But if I'm not mistaken, the res.size() will be always equal to 3.

For example, for the string asdf "one two three" werq the first match will be:

res[0] = "asdf"              // the entire match
res[1] = ""                  // the first capture
res[2] = "asdf"              // the second capture

The second match will be:

res[0] = "\"one two three\"" // the entire match including leading/trailing quotes
res[1] = "one two three"     // the first capture
res[2] = ""                  // the second capture

and the third match will be:

res[0] = "werq"              // the entire match
res[1] = ""                  // the first capture
res[2] = "werq"              // the second capture

HTH.


You may want to try the following regex instead:

(?<=")[^"]*(?=")|[^"\s]\S*

When quoted, it of course needs to be escaped:

"(?<=\")[^\"]*(?=\")|[^\"\\s]\\S*"

Btw, the code you used probably matches only the first word in the target string, since it does not use match_any. The 3 items you are getting in the result are probably (1) the entire match, (2) the first capture -- which is empty, and (3) the second capture, which is the source of the match.

0

精彩评论

暂无评论...
验证码 换一张
取 消