开发者

Comments in string and strings in comments

开发者 https://www.devze.com 2022-12-25 09:23 出处:网络
I am trying to count characters in comments included in C code using Python and Regex, but no success. I can erase strings first to get rid of comments in strings, but this will erase string in commen

I am trying to count characters in comments included in C code using Python and Regex, but no success. I can erase strings first to get rid of comments in strings, but this will erase string in comments too and result will be bad ofc. Is there any chance to ask by using regex to not match stri开发者_C百科ngs in comments or vice versa?


No, not really.

Regex is not the correct tool to parse nested structures like you describe; instead you will need to parse the C syntax (or the "dumb subset" of it you're interested in, anyway), and you might find regex helpful in that. A relatively simple state machine with three states (CODE, STRING, COMMENT) would do it.


Regular expressions are not always a replacement for a real parser.


You can strip out all strings that aren't in comments by searching for the regular expression:

'[^'\r\n]+'|(//.*|/\*(?s:.*?)\*/)

and replacing with:

$1

Essentially, this searches for the regex string|(comment) which matches a string or a comment, capturing the comment. The replacement is either nothing if a string was matched or the comment if a comment was matched.

Though regular expressions are not a replacement for a real parser you can quickly build a rudimentary parser by creating a giant regex that alternates all of the tokens you're interested in (comments and strings in this case). If you're writing a bit of code to handle comments, but not those in strings, iterate over all the matches of the above regex, and count the characters in the first capturing group if it participated in the match.

0

精彩评论

暂无评论...
验证码 换一张
取 消