开发者

Ruby RegEx issue

开发者 https://www.devze.com 2023-03-20 11:03 出处:网络
I\'m having a problem getting my RegEx to work with my Ruby script. Here is what I\'m trying to match:

I'm having a problem getting my RegEx to work with my Ruby script.

Here is what I'm trying to match:

http://my.test.website.com/{GUID}/{GUID}/

Here is the RegEx that I've tested and should be matching the string as shown above:

/([-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)/

3 capturing groups:

group 1: ([-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/]开发者_开发知识库)*?\/)
group 2: (\/[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)
group 3: ([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])

Ruby is giving me an error when trying to validate a match against this regex:

empty range in char class: (My RegEx goes here) (SyntaxError)

I appreciate any thoughts or suggestions on this.


You could simplify things a bit by using URI to deal parsing the URL, \h in the regex, and scan to pull out the GUIDs:

uri   = URI.parse(your_url)
path  = uri.path
guids = path.scan(/\h{8}-\h{4}-\h{4}-\h{4}-\h{12}/)

If you need any of the non-path components of the URL the you can easily pull them out of uri.

You might need to tighten things up a bit depending on your data or it might be sufficient to check that guids has two elements.


You have several errors in your RegEx. I am very sleepy now, so I'll just give you a hint instead of a solution:

...[\/\/[0-9a-fA-F]....

the first [ does not belong there. Also, having \/\/ inside [] is unnecessary - you only need each character once inside []. Also,

...[-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}...

is greedy, and includes a period - indeed, includes all chars (AFAICS) that can come after it, effectively swallowing the whole string (when you get rid of other bugs). Consider {2,256}? instead.

0

精彩评论

暂无评论...
验证码 换一张
取 消