I came across this regular expression in the jQuery source code:
...
rmozilla = /(mozilla)(?:.*? rv:([\w.]+))?/,
...
I was wondering why it was rather complicated. I'm especially interested in the reason behind the second part:
(?:.*? rv:([\w.]+))?
I did some research but I could not figure out what this part of the regular expression adds.
(?:) to match but not capture
.*? any amount of any character
rv: something l开发者_如何学Pythoniteral
([\w.]+) one or more word characters or a dot
? appear 0 or 1 time
Particularly, that last ?
doesn't make much sense to me. The whole second part matches if there is or is not a substring as defined by that second part. With some trial and error the regular expression does not seem to differ from just:
/(mozilla)/
Could someone shed some light on what the second part of the regular expression is supposed to do? What does it constrain; what string fails that passes /(mozilla)/
or the other way round?
The two regexes would match the same strings, but would store different information in their capturing groups.
for the string: mozilla asdf rv:sadf
/(mozilla)(?:.*? rv:([\w.]+))?/
$0 = 'mozilla asdf rv:sadf'
$1 = 'mozilla'
$2 = 'sadf'
/(mozilla)/
$0 = 'mozilla'
$1 = 'mozilla'
$2 = ''
Note: I now notice that this answer might be a bit out of scope. I will still leave it for further information, but if you think it is too much out of scope, just comment and I will remove it.
@arnaud is right, it is to get the version. Here is the code where the expressions is used:
uaMatch: function( ua ) {
ua = ua.toLowerCase();
var match = rwebkit.exec( ua ) ||
ropera.exec( ua ) ||
rmsie.exec( ua ) ||
ua.indexOf("compatible") < 0 && rmozilla.exec( ua ) ||
[];
return { browser: match[1] || "", version: match[2] || "0" };
},
You can see that the function returns the version if found and 0
if not. This might be necessary for some browsers or is just provided as additional information for developers.
The function is called here:
browserMatch = jQuery.uaMatch( userAgent );
if ( browserMatch.browser ) {
jQuery.browser[ browserMatch.browser ] = true;
jQuery.browser.version = browserMatch.version;
}
First, I'd like to clarify the difference between:
.*? - non-greedy match
.* - greedy match
The non-greedy will match the smallest number of bytes possible (given the rest of the search string), and the greedy one will match the most.
Given the string:
mozilla some text here rv:abc xyz
The regex will return both 'mozilla' and 'abc'. But if the 'rv:' doesn't exist, the regex will still return 'mozilla'.
The ([\w.]+)
inside of (?:.*? rv:([\w.]+))
is capturing, so maybe this regex was used to get the revision number in the past (however it seems that currently jquery only checks if the regex matches).
(pat) is a pattern delimiter for matching an full contained pattern. (?:pat) is the negation of above, just like the Character set bracket [^ ] is the negation of [ ]. In javascript the negation occurs with ! . matches any character, * is a quantifier of matches, and can in newer Regex Engines also written as {0,} (but those three additional characters may likely result in an earlier death of your keyboard!) ? redundant match quantifier: may match zero or one time rv: .... literal rv
another submatch, may match zero or one time within the parent match ([\w.]+))? [\w.]... character set, with escapted w "\w": any alphanumerical character, aka [a-zA-Z0-9_] followed by a literal dot, and per match quantifier +, may occur one or more times
To reverse engineer the meaning of the pattern match: just evaluate from left on right, in a text editor and substitute the letters by random literals that come to mind and for which each sub-expression matches. Then take a step back and ponder what the regex might have been for.
精彩评论