开发者

Regex to match a String with optional Conditions [duplicate]

开发者 https://www.devze.com 2023-02-13 21:31 出处:网络
This question already has answers h开发者_JAVA百科ere: Closed 11 years ago. Possible Duplicate: How do I make part of a regular expression optional in Ruby?
This question already has answers h开发者_JAVA百科ere: Closed 11 years ago.

Possible Duplicate:

How do I make part of a regular expression optional in Ruby?

I'm trying to build a regular expression with rubular to match:

On Feb 23, 2011, at 10:22 , James Bond wrote:

OR

On Feb 23, 2011, at 10:22 AM , James Bond wrote:

Here's what I have so far, but for some reason it's not matching? Ideas?

(On.* (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, [12]\d{3}.* at \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:)

How can I make the AM/PM text optional? Either match AM/PM or neither?


This seems to catch the date info. I purposely captured in groups, making it easier to build a real date:

regex = /^On (\w+ \d+, \d+), \w+ (\S+) (\w*)\s*,/

[
  'On Feb 23, 2011, at 10:22 , James Bond wrote:',
  'On Feb 23, 2011, at 10:22 AM , James Bond wrote:'  
].each do |ary|
  ary =~ regex
  puts "#{$1} #{$2} #{$3}"
end
# >> Feb 23, 2011 10:22 
# >> Feb 23, 2011 10:22 AM

I purposed didn't try to match on the months. Your sample strings look like quote headers from email messages. Those are very standard and generated by software, so you should see a lot of consistency in the format, allowing some simplification in the regex. If you can't trust those, then go with the matches on month name abbreviations to help ignore false-positive matches. The same things apply for the day, year, and time values.

The important thing in the regex is how to deal with the AM/PM when it's missing.


maybe this

(On\s+(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+[12]\d{3},\s+at\s+\d{1,2}:\d{1,2}\s+(?:AM|PM)*,.*wrote:)

however, if you can be verify and be sure that only these lines are unique, you don't have to use a whole range of regex. Maybe it starts with "On" and ends with "wrote:" , your regex might then simple be /^On.*wrote:/


Just use the question mark operator after any group you want to be optional, so in this case:

(?:(?:AM|PM) )?

Be sure to match the space as well, otherwise the strings without AM/PM need to include two spaces. The solution with (?:AM|PM)* would also match AMAMPM, so that's probably not what you want. But why do you match those group without creating backreferences? Aren't you going to use the values?

For info on backreferences: http://www.regular-expressions.info/brackets.html

0

精彩评论

暂无评论...
验证码 换一张
取 消