开发者

Getting to grips with regex in Perl

开发者 https://www.devze.com 2023-02-06 23:18 出处:网络
I\'m trying to write a regular expression to match this line: DD MONTH YEAR at HH:MM as an example: 21 May 2009 at 19:09

I'm trying to write a regular expression to match this line:

DD MONTH YEAR at HH:MM

as an example:

21 May 2009 at 19:09

So I have:

[0-30-9] for the day

[0-20-90-90-9] for the year

[0-90-9:0-90-9] for the time

I don't understand how to put these all together to form one single regex. I want to do

if($string =~ /myregex/) { }

But can't form the entire thing. Also I don't know how to write a regex for the month, it has to match one of the 12 m开发者_开发知识库onths of the year.

I am a Perl noob (this is my first day) and a regex noob, so help appreciated!


[0-30-9] doesn't do what you think it does. :)

[0-3][0-9] is what you're after. Similar steps for each of the other inputs...

[0-3]?\d (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d\d\d at [012]\d:[0-5]\d

The ? is to say the leading digit might be there.

The \d means 'digit', sometimes more legible.

(foo|bar|baz) is called 'alternation'.

The time is a problem :) This is good and simple, but would match a time like 29:59. Hehe. You could do this better with alternation: (\d|1\d|2[0-3]) -- less legible but more correct.

And my advice for a Perl neophyte working with regexp is to start small and built them iteratively. It takes work. :)


Well, the parts you have aren't quite correct. Instead of [0-30-9] I think you mean [0-3][0-9], and similarly for the other numbers.

However, usually it suffices to be a little looser and just use \d which is equivalent to [0-9].

You string the parts together one after the other:

/\d\d (MONTH) \d\d\d\d at \d\d:\d\d/

Which can be written more succinctly as:

/\d\d (MONTH) \d{4} at \d\d:\d\d/

Or if you really need it to be more strict as in your formulation:

/[0-3]\d (MONTH) [0-2]\d{3} at \d\d:\d\d/

I've left the month bit for last, since it is the more complicated bit. Again you can be loose or strict.

Loosely:

/[0-3]\d [A-Za-z]+ [0-2]\d{3} at \d\d:\d\d/

For a strict match we can use an alternation, each alternative is separated by a '|' and the list of choices is enclosed in parenthesis (although beware, parenthesis also have another extra meaning; don't worry it won't interfere in this case):

/[0-3]\d (January|February|March|April|May|June|July|August|September|October|November|December) [0-2]\d{3} at \d\d:\d\d/

Finally, if the day is not 0-padded (meaning the 1st is just '1' rather than '01') then you need to make that optional:

/[0-3]?\d (January|February|March|April|May|June|July|August|September|October|November|December) [0-2]\d{3} at \d\d:\d\d/

Crib sheet

  • [] are used to create a character class, a set of matching characters
  • \d is a built-in character class equivalent to [0-9]
  • () are used to create a group, useful for delimiting an alternation (amongst other things)
  • | is used to create alternation, a list of alternative character sequences that should be matched
  • {n} is a modifier, saying exactly 'n' of the preceding character or character class should be matched
  • + is a modifier, saying 1 or more of the preceding character or character class should be matched
  • ? is a modifier, saying 0 or 1 of the preceding character or character class should be matched


CPAN has some common Regexes in the Regexp::Common::* branch. For your case check out http://search.cpan.org/perldoc?Regexp::Common::time .

Perhaps I should add, since you are so new to Perl, CPAN is Perl's collection of user-contributed modules for tasks. Many things that people may want to do have already been done before and collected for you. To install things you can do sudo cpan modulename (assuming you are on Linux, I'm sure you can find instructions for CPAN on mac and windows, but I don't know them).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号