I want' to parse dates in this format, but ignore parts of the string. 'Wed, 27 Oct 1770 22:17:00 GMT' From what I have gathered, datetime does not support time zones very well. Which is fine, I really just want to ignore the timezone part of the string, without having to do string manipulation on it. Is there something I can replace %Z with below to say "any string here" and parse dates as such? Also, I don't understand why it will parse timezones like PST, GMT but not EST. It doesn't seem to attach tzinfo in any case anyways, not sure what types of string its really looking for the %Z portion.
>>> import datetime
>>> y = datetime.datetime.strptime('Wed, 27 Oct 1770 22:17:00 GMT', '%a, %d %b %Y %H:%M:%S %Z')
>>> y = datetime.datetime.strptime('Wed, 27 Oct 1770 22:17:00 PST', '%a, %d %b %Y %H:%M:%S %Z')
>>> y = datetime.datetime.strptime('Wed, 27 Oct 1770 22:17:00 EST', '%a, %d %b %Y %H:%M:%S %Z')
Traceback (most recent call last):
File "<stdin&开发者_如何学Gogt;", line 1, in <module>
File "/opt/brazil-pkg-cache/packages/Python/Python-2.5.1.17.1/RHEL5_64/DEV.STD.PTHREAD/build/lib/python2.5/_strptime.py", line 331, in strptime
(data_string, format))
ValueError: time data did not match format: data=Wed, 27 Oct 1770 22:17:00 EST fmt=%a, %d %b %Y %H:%M:%S %Z
Note: dateutil is not an option for me, I want to support numerous formats and can't allow dateutil to accidentally interpret dates wrong. (i.e. dateutil seems to take a guess when it sees dates like 01/02/2010, Feb 1? or Jan 2?). I basically want to just try accepting formats I specify in an order until I get a match.
Have you actually looked at the docs for dateutil?
dateutil.parser.parse()
does have arguments which let you control the precedence in its format guesser and it also has an ignoretz=True
argument.
If that's not enough, there's probably some class you can override to implement your own precedence rules.
Of course, if not, you probably will have to resort to string parsing since Python's strptime() implementation calls the underlying C implementation to resolve the timezone names. (I don't know why it isn't understanding EST for you, but it's probably system-wide and not a problem on some systems)
val = str.join(' ', 'Wed, 17 Oct 2011 22:22:22 +0300'.split(None)[1:7])
val = datetime.datetime.strptime(val, '%d %b %Y %H:%M:%S')
I dont think it is possible to do that completly without string manipulations, but maybe this is an option. Take a look at time and try something like this:
datetime(*(time.strptime('Wed, 27 Oct 1770 22:17:00 GMT', '%a, %d %b %Y %H:%M:%S %Z')[0:5]))
There doesn't appear to be a way to do that in strptime(). I know you said you didn't want to do string manipulation, but you may not have a choice. You can either perform data-cleaning where you first snarf the date/time string from the input, or you can create mystrptime()
and only do the manipulation in the exception. The following code is incorrect in that it does not handle the general case of %Z occurring anywhere in the string, but you get the idea.
import re, datetime
def mystrptime(time_str, format):
try:
return datetime.datetime.strptime(time_str, format)
except ValueError:
if not '%Z' in format:
raise # it must have been something else
new_time_str = re.sub(r'\s*\w+\s*$', '', time_str)
new_format = re.sub(r'\s*%Z\s*$', '', format)
return datetime.datetime.strptime(new_time_str, new_format)
精彩评论