I'm using tr1::regex to try to extract some matches from a string. An example string could be
asdf werq "one two three" asdf
And I would want to get out of that:
asdf
werq
one two three
asdf
With stuff in quotes grouped together, so I'm trying to use the regex \"(.+?)\"|([^\\s]+)
. The code I'm using is:
cmatch res;
regex reg("\"(.+?)\"|([^\\s]+)", regex_constants::icase);
regex_search("asdf werq \"one two three\" asdf", res, reg);
cout << res.size() << endl;
for (unsigned int i = 0; i < 开发者_运维技巧res.size(); ++k) {
cout << res[i] << endl;
}
but that outputs
3
asdf
asdf
What am I doing wrong?
It appears that your regex engine does not support lookbehind assertions. To avoid using lookbehinds, you can try the following:
"([^"]*)"|(\S+)
or quoted:
"\"([^\"]*)\"|(\\S+)"
This regex will work, but each match will have two captures, one of which will be empty (either the first -- in case of a non-quoted word, or the second -- in case of a quoted string).
To be able to use this you need to iterate over all matches, and for each match use the non-empty capture.
I don't know enough about TR1, so I don't know exactly how one iterates over all matches. But if I'm not mistaken, the res.size()
will be always equal to 3.
For example, for the string asdf "one two three" werq
the first match will be:
res[0] = "asdf" // the entire match
res[1] = "" // the first capture
res[2] = "asdf" // the second capture
The second match will be:
res[0] = "\"one two three\"" // the entire match including leading/trailing quotes
res[1] = "one two three" // the first capture
res[2] = "" // the second capture
and the third match will be:
res[0] = "werq" // the entire match
res[1] = "" // the first capture
res[2] = "werq" // the second capture
HTH.
You may want to try the following regex instead:
(?<=")[^"]*(?=")|[^"\s]\S*
When quoted, it of course needs to be escaped:
"(?<=\")[^\"]*(?=\")|[^\"\\s]\\S*"
Btw, the code you used probably matches only the first word in the target string, since it does not use match_any. The 3 items you are getting in the result are probably (1) the entire match, (2) the first capture -- which is empty, and (3) the second capture, which is the source of the match.
精彩评论