javascript - What does this regular expression part add?

I came across this regular expression in the jQuery source code:

...
rmozilla = /(mozilla)(?:.*? rv:([\w.]+))?/,
...

I was wondering why it was rather plicated. I'm especially interested in the reason behind the second part:

(?:.*? rv:([\w.]+))?

I did some research but I could not figure out what this part of the regular expression adds.

(?:)      to match but not capture
.*?       any amount of any character
 rv:      something literal
([\w.]+)  one or more word characters or a dot
?         appear 0 or 1 time

Particularly, that last ? doesn't make much sense to me. The whole second part matches if there is or is not a substring as defined by that second part. With some trial and error the regular expression does not seem to differ from just:

/(mozilla)/

Could someone shed some light on what the second part of the regular expression is supposed to do? What does it constrain; what string fails that passes /(mozilla)/ or the other way round?

I came across this regular expression in the jQuery source code:

...
rmozilla = /(mozilla)(?:.*? rv:([\w.]+))?/,
...

I was wondering why it was rather plicated. I'm especially interested in the reason behind the second part:

(?:.*? rv:([\w.]+))?

I did some research but I could not figure out what this part of the regular expression adds.

(?:)      to match but not capture
.*?       any amount of any character
 rv:      something literal
([\w.]+)  one or more word characters or a dot
?         appear 0 or 1 time

/(mozilla)/

Could someone shed some light on what the second part of the regular expression is supposed to do? What does it constrain; what string fails that passes /(mozilla)/ or the other way round?

Share Improve this question asked Aug 19, 2011 at 18:50 pimvdb 155k80 gold badges311 silver badges356 bronze badges

I suspect it's to work around some browsers faking Mozilla by putting it in their user-agent string. – Rafe Kettler Commented Aug 19, 2011 at 18:55
Can you provide a bit more context? Was this part of a jQuery plugin? If so, which one? Knowing where this code appears might shed some light onto /why/ the author wanted this particular pattern, and therefore what the pattern is doing. – jefflunt Commented Aug 19, 2011 at 18:57
@Rafe Kettler: I'm not sure I understand you correctly. What does the regexp add to prevent fakers? – pimvdb Commented Aug 19, 2011 at 19:00
@normalocity: In fact it's part of jQuery itself: code.jquery./jquery-1.6.2.js, line 66. – pimvdb Commented Aug 19, 2011 at 19:00
1 For those who don't want to count lines: github./jquery/jquery/blob/1.6.2/src/core.js#L45 . – Felix Kling Commented Aug 19, 2011 at 19:02

| Show 1 more ment

5 Answers 5

Sorted by: Reset to default 4

The two regexes would match the same strings, but would store different information in their capturing groups.

for the string: mozilla asdf rv:sadf

/(mozilla)(?:.*? rv:([\w.]+))?/
$0 = 'mozilla asdf rv:sadf'
$1 = 'mozilla'
$2 = 'sadf'

/(mozilla)/
$0 = 'mozilla'
$1 = 'mozilla'
$2 = ''

Note: I now notice that this answer might be a bit out of scope. I will still leave it for further information, but if you think it is too much out of scope, just ment and I will remove it.

@arnaud is right, it is to get the version. Here is the code where the expressions is used:

uaMatch: function( ua ) {
    ua = ua.toLowerCase();

    var match = rwebkit.exec( ua ) ||
                ropera.exec( ua ) ||
                rmsie.exec( ua ) ||
                ua.indexOf("patible") < 0 && rmozilla.exec( ua ) ||
                [];

    return { browser: match[1] || "", version: match[2] || "0" };
},

You can see that the function returns the version if found and 0 if not. This might be necessary for some browsers or is just provided as additional information for developers.

The function is called here:

browserMatch = jQuery.uaMatch( userAgent );
if ( browserMatch.browser ) {
    jQuery.browser[ browserMatch.browser ] = true;
    jQuery.browser.version = browserMatch.version;
}

First, I'd like to clarify the difference between:

.*? - non-greedy match
.* - greedy match

The non-greedy will match the smallest number of bytes possible (given the rest of the search string), and the greedy one will match the most.

Given the string:

mozilla some text here rv:abc xyz

The regex will return both 'mozilla' and 'abc'. But if the 'rv:' doesn't exist, the regex will still return 'mozilla'.

The ([\w.]+) inside of (?:.*? rv:([\w.]+)) is capturing, so maybe this regex was used to get the revision number in the past (however it seems that currently jquery only checks if the regex matches).

(pat) is a pattern delimiter for matching an full contained pattern. (?:pat) is the negation of above, just like the Character set bracket [^ ] is the negation of [ ]. In javascript the negation occurs with ! . matches any character, * is a quantifier of matches, and can in newer Regex Engines also written as {0,} (but those three additional characters may likely result in an earlier death of your keyboard!) ? redundant match quantifier: may match zero or one time rv: .... literal rv

another submatch, may match zero or one time within the parent match ([\w.]+))? [\w.]... character set, with escapted w "\w": any alphanumerical character, aka [a-zA-Z0-9_] followed by a literal dot, and per match quantifier +, may occur one or more times

To reverse engineer the meaning of the pattern match: just evaluate from left on right, in a text editor and substitute the letters by random literals that e to mind and for which each sub-expression matches. Then take a step back and ponder what the regex might have been for.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - What does this regular expression part add? - Stack Overflow

5 Answers 5

与本文相关的文章

评论列表(0)