Possible Duplicate:
How do I make part of a regular expression optional in Ruby?
I'm trying to build a regular expression with rubular to match:
On Feb 23, 2011, at 10:22 , James Bond wrote:
OR
On Feb 23, 2011, at 10:22 AM , James Bond wrote:
Here's what I have so far, but for some reason it's not matching? Ideas?
(On.* (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, [12]\d{3}.* at \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:)
How can I make the AM/PM text optional? Either match AM/PM or neither?
This seems to catch the date info. I purposely captured in groups, making it easier to build a real date:
regex = /^On (\w+ \d+, \d+), \w+ (\S+) (\w*)\s*,/
[
'On Feb 23, 2011, at 10:22 , James Bond wrote:',
'On Feb 23, 2011, at 10:22 AM , James Bond wrote:'
].each do |ary|
ary =~ regex
puts "#{$1} #{$2} #{$3}"
end
# >> Feb 23, 2011 10:22
# >> Feb 23, 2011 10:22 AM
I purposed didn't try to match on the months. Your sample strings look like quote headers from email messages. Those are very standard and generated by software, so you should see a lot of consistency in the format, allowing some simplification in the regex. If you can't trust those, then go with the matches on month name abbreviations to help ignore false-positive matches. The same things apply for the day, year, and time values.
The important thing in the regex is how to deal with the AM/PM when it's missing.
maybe this
(On\s+(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+[12]\d{3},\s+at\s+\d{1,2}:\d{1,2}\s+(?:AM|PM)*,.*wrote:)
however, if you can be verify and be sure that only these lines are unique, you don't have to use a whole range of regex. Maybe it starts with "On" and ends with "wrote:" , your regex might then simple be /^On.*wrote:/
Just use the question mark operator after any group you want to be optional, so in this case:
(?:(?:AM|PM) )?
Be sure to match the space as well, otherwise the strings without AM/PM need to include two spaces. The solution with (?:AM|PM)*
would also match AMAMPM
, so that's probably not what you want. But why do you match those group without creating backreferences? Aren't you going to use the values?
For info on backreferences: http://www.regular-expressions.info/brackets.html
精彩评论