开发者

Making part of the regex optional

开发者 https://www.devze.com 2023-02-19 23:14 出处:网络
Here is my regex: /On.* \\d{1,2}\\/\\d{1,2}\\/\\d{1,4} \\d{1,2}:\\d{开发者_运维问答1,2} (?:AM|PM),.*wrote:/

Here is my regex:

/On.* \d{1,2}\/\d{1,2}\/\d{1,4} \d{1,2}:\d{开发者_运维问答1,2} (?:AM|PM),.*wrote:/

to match:

On 3/14/11 2:55 PM, XXXXX XXXXXX wrote:

I need this Regex to also match:

On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote:

So I tried this:

/On.* \d{1,2}\/\d{1,2}\/\d{1,4}(, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:/

But that breaks the other matches

Am I making the (, at)? optional set right?

Thanks


I changed you Regex just slightly, and I am able to match both strings. The regex I have is:

/On.* \d{1,2}\/\d{1,2}\/\d{1,4}(?:, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:/ 

Comparing the results of the two:

irb(main):023:0> s1 = "On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote:"
=> "On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote:"
irb(main):024:0> s2 = "On 3/14/11 2:55 PM, XXXXX XXXXXX wrote:"
=> "On 3/14/11 2:55 PM, XXXXX XXXXXX wrote:"
#Your previous Regex
irb(main):025:0> m = /On.* \d{1,2}\/\d{1,2}\/\d{1,4}(, at)? \d{1,2}:\d{1,2}(?:AM|PM),.*wrote:/
=> /On.* \d{1,2}\/\d{1,2}\/\d{1,4}(?:, at) \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:/
irb(main):026:0> s1.match(m)
=> #<MatchData "On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote">
irb(main):027:0> s2.match(m)
=> nil

#The updated Regex
irb(main):028:0> m = /On.* \d{1,2}\/\d{1,2}\/\d{1,4}(?:, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*wrote/
=> /On.* \d{1,2}\/\d{1,2}\/\d{1,4}(?:, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*wrote/
irb(main):029:0> s1.match(m)
=> #<MatchData "On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote">
irb(main):030:0> s2.match(m)
=> #<MatchData "On 3/14/11 2:55 PM, XXXXX XXXXXX wrote">


The following regex works for both cases:

On\s*\d{1,2}\/\d{1,2}\/\d{1,4}[\s,]*(at)?\s*\d{1,2}:\d{1,2}\s*(?:AM|PM),\s*.*wrote:


The problem with other input strings may be caused by the .* idiom. It's greedy and want to consume as much as it can from the input.

If your input e.g. is a date, followed by some random text, and then another date -- then your regex will think that the two dates and the random text is one single date. Most of it will be consumed by .*.

In most cases it's better to use a lazy quantifier. Syntactically you write .*? instead of .*. You have two .*. Try to replace both with .*?

/On.*? \d{1,2}\/\d{1,2}\/\d{1,4}(, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*?wrote:/

If that doesn't work, you'll have to post the failing dates here and you will most certainly get more feedback from this community.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号