开发者

Ruby regex for finding comments?

开发者 https://www.devze.com 2023-03-02 22:55 出处:网络
I\'ve been at this all day and I can\'t figure it out. I have some Ruby code in a string below and would only only like to match lines with code on them and the first comment for the code if it exists

I've been at this all day and I can't figure it out. I have some Ruby code in a string below and would only only like to match lines with code on them and the first comment for the code if it exists.

# Some ignored comment.
1 + 1 # Simple math (this comment would be collected) # ignored 
# ignored

user = User.new
user.name = "Ryan" # Setting an attribute # Another ignored comment

And this would capture:

    1. "1 + 1"
    2. "Simple math"
    1. "user = User.new"
    2. nil
    1. "user.name = "Ryan"
    2. "Setting an attribute"

I'm using /^\x20*(.+)\x20*(#\x20*.+\x20*){1}$/ to match agai开发者_Go百科nst each line but it doesn't seem to work for all code.


Kobi's answer partially works, but does not match lines of code that lack a comment at the end.

It will also fail when it encounters string interpolation, e.g.:

str = "My name is #{first_name} #{last_name}" # first comment

...will be erroneously matched as: str = "My name is #{first_name}

You need a more comprehensive regex. Here's one idea:

/^[\t ]*([^#"'\r\n]("(\\"|[^"])*"|'(\\'|[^'])*'|[^#\n\r])*)(#([^#\r\n]*))?/
  • ^[\t ]* - Leading whitespace.
  • ([^#"'\r\n]("(\\"|[^"])*"|'(\\'|[^'])*'|[^#\n\r])*) - Matches a line of code.
    Breakdown:
    • [^#"'\r\n] - the first character in a line of code, and...
    • "(\\"|[^"])*" - a double-quoted string, or...
    • '(\\'|[^'])*' - a single-quoted string, or...
    • [^#\n\r] - any other character outside a quoted string that is not a # or line ending.
  • (#([^#\r\n]*))? - Matches first comment at the end of a line of code, if any.

Due to the more complex logic, this will capture 6 subpatterns for each match. Subpattern 1 is the code, subpattern 6 is the comment, and you can ignore the others.

Given the following block of code:

# Some ignored comment.
1 + 1 # Simple math (this comment would be collected) # ignored 
# ignored

user = User.new
user.name = "Ryan #{last_name}" # Setting an attribute # Another ignored comment

The above regex would produce the following (I excluded subpatterns 2, 3, 4, 5 for brevity):


  1. 1. 1 + 1
    6. Simple math (this comment would be collected)

  2. 1. user = User.new
    6.

  3. 1. user.name = "Ryan #{last_name}"
    6. Setting an attribute

Demo: http://rubular.com/r/yKxEazjNPC


While the underlying problem is quite difficult, you can find what you need here using the pattern:

^[\t ]*[^\s#][^#\n\r]*#([^#\n\r]*)

Which reads:

  • [\t ]* - leading spaces.
  • [^\s#] - one actual character. This should match the code.
  • [^#\n\r]* - Characters until the # sign. Anything besides hash or newlines.
  • #([^#\n\r]*) - The "first" comment, captured in group 1.

Working example: http://rubular.com/r/wNJTMDV9Bw

0

精彩评论

暂无评论...
验证码 换一张
取 消