开发者

Writing regular expression to skip a line if specific set of characters are present?

开发者 https://www.devze.com 2023-04-11 14:16 出处:网络
I am trying to write a regex in python to parse a file having contents like this :- static const PropertyID PROPERTY_X = 10225;

I am trying to write a regex in python to parse a file having contents like this :-

static const PropertyID PROPERTY_X = 10225;
//static const PropertyID PROPERTY_Y = 10226;
   //static const PropertyID PROPERTY_Z = 10227;

I want to extract the property name and number for only non commented properties. This is the expression I wrote

tuples = re.findall(r"[^/]*static[ \t]*const[ \t]*PropertyID[ \t]*(\w+)[ \t]*=[ \t]*(\d+).*",fileContents)

where fileContents has the data of file as string.

But this rege开发者_StackOverflow社区x is even matching the commented(lines with //) lines. How to make it avoid matching the commented lines.


Try:

r"(?m)^(?!//)static\s+const\s+PropertyID\s+(\S+)\s+=\s+(\d+);"

A couple notes.

^ matches beginning of line

(?!//) is a negative lookahead, asserting that it is NOT followed by //

\s is any space character

\S is any non-space character


You could specify that, after the start of the line, you only want spaces before the first static:

tuples = re.findall(r"^\s*static[ \t]*const[ \t]*PropertyID[ \t]*(\w+)[ \t]*=[ \t]*(\d+).*",fileContents)


If you're parsing C code, you can use something like pycparser. Regular expressions aren't suited (or possible) to parse any programming language.

Alternatively, I think this code is simpler for what you're doing:

import re
string = "   //static const PropertyID PROPERTY_Z = 10227;"
results = re.split("\s*",string)
#results = ['//static', 'const', 'PropertyID', 'PROPERTY_Z', '=', '10227;']

if results[0].startswith("\\") or results[0].startswith("/*"):
    pass
0

精彩评论

暂无评论...
验证码 换一张
取 消