In Perl, how would o开发者_Go百科ne efficiently parse the output of unix's date command, taking into account time zone, and also convert to UTC?
I've read many similar questions on stackoverflow, but few seem to take into account parsing multiple time zones. Instead they seem to set the timezone manually and assume it to stay fixed.
# Example Input Strings:
my @inputs = (
'Tue Oct 12 06:31:48 EDT 2010',
'Tue Oct 12 07:49:54 BST 2010',
);
I tried the following to no avail:
foreach my $input ( @inputs ) {
my $t = Time::Piece->strptime( $input,
'%a %b %d %T %Z %Y' );
print $t->cdate, "\n";
}
It seems the problem is the time zone (%Z). Additionally, a time zone field does not seem to exist in Time::Piece, which would require me to write custom code to convert to UTC, which just seems... wrong.
Context: I'm attempting to parse legacy logs from a variety of sources that use the unix date command for timestamps. Ideally, I'd like to convert all timestamps to UTC.
Any help would be greatly appreciated.
If you know how to disambiguate the TZs, just pop them into a dispatch table:
use strict; use warnings;
use DateTime::Format::Strptime ();
my @inputs = (
'Tue Oct 12 06:31:48 EDT 2010',
'Tue Oct 12 07:49:54 BST 2010',
);
my %tz_dispatch = (
EDT => build_parser( 'EST5EDT' ),
BST => build_parser( '+0100' ),
# ... etc
default => build_parser( ),
);
for my $input (@inputs) {
my ($parser, $date) = parse_tz( $input, %tz_dispatch );
print $parser->parse_datetime( $date ), "\n";
}
sub build_parser {
my ($tz) = @_;
my %conf = (
pattern => '%a %b %d %T %Z %Y',
on_error => 'croak',
);
@conf{qw/time_zone pattern/} = ($tz, '%a %b %d %T %Y')
if $tz;
return DateTime::Format::Strptime->new( %conf );
}
sub parse_tz {
my ($date, %tz_dispatch) = @_;
my (@date) = split /\s/, $date;
my $parser = $tz_dispatch{splice @date, 4, 1};
return $parser
? ($parser, join ' ', @date)
: ($tz_dispatch{default}, $date);
}
The Perl DateTime FAQ on timezones has a good background on why EDT and EST cannot be used in most conversions. The issue is that other countries also have an Eastern time zone with the same 3 letter abbreviation. EST EDT is ambiguous without other clues.
You might look at other modules, or just assume that "EDT" is the same as "EST5EDT" if that is true.
If you are using Date::Time::Strptime, you can use %O
for the Olson Time Zone name and do a manual fixup before parsing.
i.e. if you know that EDT in your input means America/New_York, do this:
$time_in =~ s{EDT}{America/New_York};
and instead of
%a %b %d %T %Z %Y
for your time zone spec use
%a %b %d %T %O %Y
I've always found Date::Manip::ParseDate to be good for these sorts of situations.
use strict;
use warnings qw<FATAL all>;
use Date::Manip qw<ParseDate UnixDate>;
my @inputs = (
q<Tue Oct 12 06:31:48 EDT 2010>,
q<Tue Oct 12 07:49:54 BST 2010>,
);
sub date2epoch($) {
my $user_string = shift();
my $timestamp = ParseDate($user_string);
my $seconds = UnixDate($timestamp, "%s");
return $seconds;
}
sub epoch2utc($) {
my $seconds = shift();
return gmtime($seconds) . q< UTC>;
}
for my $random_date (@inputs) {
my $epoch_seconds = date2epoch($random_date);
my $normal_date = epoch2utc($epoch_seconds);
print "$random_date == $normal_date\n";
}
When run, that produces this:
Tue Oct 12 06:31:48 EDT 2010 == Tue Oct 12 10:31:48 2010 UTC
Tue Oct 12 07:49:54 BST 2010 == Tue Oct 12 06:49:54 2010 UTC
which seem to be what you're looking for.
I'm a little late on this, but GNU date
itself is good at parsing dates:
$ date -u -d 'Thu Oct 14 01:17:00 EDT 2010'
Thu Oct 14 05:17:00 UTC 2010
I don't know how it resolves the EDT ambiguity though.
I agree with Jander on date command. -d and -u are great and save a lot of code lines.
精彩评论