开发者

Javascript RegEx non-capturing prefix

开发者 https://www.devze.com 2023-03-03 11:13 出处:网络
I am trying to do some string replacement with RegEx in Javascript.The scenario is a single line string containing long comma-delimited list of numbers, in which duplicates are possible.

I am trying to do some string replacement with RegEx in Javascript. The scenario is a single line string containing long comma-delimited list of numbers, in which duplicates are possible.

An example string is: 272,2725,2726,272,2727,297,272 (The end may or may not end in a comma)

In this example, I am trying to match each occurrence of the whole number 272. (3 matches expected) The example regex I'm trying to use is: (?:^|,)272(?=$|,)

The problem I am having is that the second and third matches are including the leading comma, which I do not want. I am confused because I thought (?:^|,) would match, but not capture. Can someone shed light on this for me? An interesting bit is that the trailing comma is excluded from the re开发者_开发技巧sult, which is what I want.

For what it is worth, if I were using C# there is syntax for prefix matching that does what I want: (?<=^|,) However, it appears to be unsupported in JavaScript.

Lastly, I know I could workaround it using string splitting, array manipulation and rejoining, but I want to learn.


Use word boundaries instead:

\b272\b

ensures that only 272 matches, but not 2725.

(?:...) matches and doesn't capture - but whatever it matches will be part of the overall match.

A lookaround assertion like (?=...) is different: It only checks if it is possible (or impossible) to match the enclosed regex at the current point, but it doesn't add to the overall match.


Here is a way to create a JavaScript look behind that has worked in all cases I needed.

This is an example. One can do many more complex and flexible things.

The main point here is that in some cases, it is possible to create a RegExp non-capturing prefix (look behind) construct in JavaScript .

This example is designed to extract all fields that are surrounded by braces '{...}'. The braces are not returned with the field.

This is just an example to show the idea at work not necessarily a prelude to an application.

    function testGetSingleRepeatedCharacterInBraces()
      {
        var leadingHtmlSpaces = '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;' ;
        // The '(?:\b|\B(?={))' acts as a prefix non-capturing group.
        // That is, this works (?:\b|\B(?=WhateverYouLike))
        var regex  = /(?:\b|\B(?={))(([0-9a-zA-Z_])\2{4})(?=})/g ;
        var string = '' ;

        string = 'Message has no fields' ;
        document.write( 'String => "' + string 
                                      + '"<br>'  + leadingHtmlSpaces + 'fields => '
                                      + getMatchingFields( string, regex )
                                      + '<br>' ) ;

        string = '{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}' ;
        document.write( 'String => "' + string
                                      + '"<br>'  + leadingHtmlSpaces + 'fields => '
                                      + getMatchingFields( string, regex )
                                      + '<br>' ) ;
      } ;

    function getMatchingFields( stringToSearch, regex )
      {
         var matches = stringToSearch.match( regex ) ;
         return matches ? matches : [] ;
      } ;

    Output:
    String => "Message has no fields"
         fields =>
    String => "{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}"
         fields => LLLLL,11111,22222,EEEEE,_____,55555
0

精彩评论

暂无评论...
验证码 换一张
取 消