开发者

Powershell regex question involving balancing parenthesis

开发者 https://www.devze.com 2022-12-13 06:06 出处:网络
How can I convert a table column definition into an array of columns using regex\'s without taking formatting into account?

How can I convert a table column definition into an array of columns using regex's without taking formatting into account? I'm confused how to split on "," as they can also appear in the datatype definition part (between matching parenthesis).

Sample input:

CREATE table test ( DISTRICT VARCHAR(3开发者_StackOverflow) CHARACTER SET LATIN NOT CASESPECIFIC, CUSTOMER_ACCOUNT DECIMAL(8,0), CUSTOMER_SUB_ACCOUNT DECIMAL(3,0), SERVICE_SEQ_NUM DECIMAL(7,0), EFFECTIVE_DATE TIMESTAMP(0), SUBSCRIBER_SEQ_NUM DECIMAL(7,0) )

Thanks!

Frederik


Traditionally regular expressions are not capable of testing for/matching balanced parentheses. But several libraries have made extensions that allow some recursion. E.g. pcre with its ?R "quantifier". And Microsoft has added "balancing groups". You can find an example of that feature for matching (...) at http://oreilly.com/catalog/regex2/chapter/ch09.pdf


I think you could use negative look-behind for this scenario since the commas that are not of interest to you seem to be preceeded by a number:

$str = @'
CREATE table test ( DISTRICT VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
  CUSTOMER_ACCOUNT DECIMAL(8,0), 
  CUSTOMER_SUB_ACCOUNT DECIMAL(3,0), 
  SERVICE_SEQ_NUM DECIMAL(7,0), 
  EFFECTIVE_DATE TIMESTAMP(0), 
  SUBSCRIBER_SEQ_NUM DECIMAL(7,0) )
'@

$str -split '(?<!\d),'
CREATE table test ( DISTRICT VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC
 CUSTOMER_ACCOUNT DECIMAL(8,0)
 CUSTOMER_SUB_ACCOUNT DECIMAL(3,0)
 SERVICE_SEQ_NUM DECIMAL(7,0)
 EFFECTIVE_DATE TIMESTAMP(0)
 SUBSCRIBER_SEQ_NUM DECIMAL(7,0) )

Note this is using PowerShell 2.0's -split operator.

For paren matching, you might try something like this:

$re = [regex]@'
(?x)
\(
   (?>
       [^()]+
     |
       \( (?<DEPTH>)
     |
       \) (?<-DEPTH>)
   )*
   (?(DEPTH)(?!))
\)
'@

if ($str -match $re) {
    $matches[0]
}

Outputs:

( 
    DISTRICT VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,  
    CUSTOMER_ACCOUNT DECIMAL(8,0),   
    CUSTOMER_SUB_ACCOUNT DECIMAL(3,0),   
    SERVICE_SEQ_NUM DECIMAL(7,0),   
    EFFECTIVE_DATE TIMESTAMP(0),   
    SUBSCRIBER_SEQ_NUM DECIMAL(7,0) 
)

See this blog post for more help on paren matching.

0

精彩评论

暂无评论...
验证码 换一张
取 消