开发者

Ambiguous regex escape \num

开发者 https://www.devze.com 2023-01-27 09:28 出处:网络
According to this reference, there are two escape sequences \\n and \\num where n is an octal number, and num is a positive integer. The former is an escape value that gets converted to a character, a

According to this reference, there are two escape sequences \n and \num where n is an octal number, and num is a positive integer. The former is an escape value that gets converted to a character, and the latter is a back-reference.

Isn't that ambiguous? How can the regex tell them apart? When does it decide to use one over th开发者_JS百科e other?


The rules of disambiguation are described in http://msdn.microsoft.com/en-us/library/thwdfzxy.aspx:


Note the ambiguity between octal escape codes (such as \16) and \number backreferences that use the same notation. This ambiguity is resolved as follows:

  • The expressions \1 through \9 are always interpreted as backreferences, and not as octal codes.

  • If the first digit of a multidigit expression is 8 or 9 (such as \80 or \91), the expression as interpreted as a literal.

  • Expressions from \10 and greater are considered backreferences if there is a backreference corresponding to that number; otherwise, they are interpreted as octal codes.

  • If a regular expression contains a backreference to an undefined group number, a parsing error occurs, and the regular expression engine throws an ArgumentException.


Yes, it's ambiguous.. but if you look through MSDN documentation here it's explained how it is solved:

Note the ambiguity between octal escape codes (such as \16) and \number backreferences that use the same notation. This ambiguity is resolved as follows:

The expressions \1 through \9 are always interpreted as backreferences, and not as octal codes.

If the first digit of a multidigit expression is 8 or 9 (such as \80 or \91), the expression as interpreted as a literal.

Expressions from \10 and greater are considered backreferences if there is a backreference corresponding to that number; otherwise, they are interpreted as octal codes.

If a regular expression contains a backreference to an undefined group number, a parsing error occurs, and the regular expression engine throws an ArgumentException.

Really a silly choice to have the same syntax for both elements, since you can generate weird bugs if you don't follow this precise reference to how the ambiguity is solved.


Yes it is ambiguous. I will venture that the ambiguity is resolved in favour of interpreting it as backreferences. If an octal number is truly desired, it can always be prefixed with 0.


I think it tries to match a backreference and if it fails, it attempts a octal number match.

http://msdn.microsoft.com/en-us/library/1400241x%28VS.85%29.aspx

0

精彩评论

暂无评论...
验证码 换一张
取 消