开发者

Sort a CSV by date in Python

开发者 https://www.devze.com 2022-12-11 21:52 出处:网络
I\'m trying to sort a CSV file and I want the items sorted by date in reverse order: newest first. def SortCsvByField( f开发者_StackOverflow中文版ilename, fieldNo, sep = \',\' ):

I'm trying to sort a CSV file and I want the items sorted by date in reverse order: newest first.

def SortCsvByField( f开发者_StackOverflow中文版ilename, fieldNo, sep = ',' ):
   records = [line.split(sep) for line in file(filename)]

As far as that it's pretty easy, but how do I compare dates?


I'd recommend installing the excellent dateutil module. (In Ubuntu/Debian, it is provided by the python-dateutil package).

dateutil can parse date strings into datetime objects: It can handle many different date formats without you having to lift a finger(*):

import dateutil.parser as dparser
date=dparser.parse("Mon May 7 1883 10:36:28")
print(date)
# 1883-05-07 10:36:28

date=dparser.parse("1685-3-21")
print(date)
# 1685-03-21 00:00:00

date=dparser.parse("12/17/1770")
print(date)
# 1770-12-17 00:00:00

Note that parse is interpretting "12/17/1770" as being of the form "MM/DD/YYYY". You can change this behavior using parse's dayfirst and yearfirst options. (See http://labix.org/python-dateutil)

print(type(date))
# <type 'datetime.datetime'>

datetime objects can be sorted easily:

dates=[dparser.parse("Mon May 7 1883 10:36:28"),dparser.parse("1685-3-21"),dparser.parse("12/17/1770"),]
dates.sort()
print(dates)
# [datetime.date(1685, 3, 21), datetime.date(1770, 12, 17), datetime.date(1833, 5, 7)]

If you prefer to not install the dateutil package, then you'll have to roll your own method of converting date strings into datetime objects. This requires more work since you'll have to define the format. Below, '%Y-%m-%d' defines the YYYY-MM-DD format. See http://au2.php.net/strftime (or the man page of strftime) for more information on available format codes.

For example,

dates=[datetime.datetime.strptime(date_str,'%Y-%m-%d') for date_str in
       ('1883-5-7','1685-3-21','1770-12-17',)]
print([str(date) for date in dates])
# ['1883-05-07 00:00:00', '1685-03-21 00:00:00', '1770-12-17 00:00:00']
dates.sort()
print([str(date) for date in dates])
# ['1685-03-21 00:00:00', '1770-12-17 00:00:00', '1883-05-07 00:00:00']

To control the format when converting datetime objects back into printable strings, you can use the datetime.datetime.strftime() method.


If your dates are in ISO-8601 format (YYYY-MM-DD) then you can sort them as strings, otherwise you will have to parse them first (datetime.strptime).

Then you can sort using for example sorted(records, key=lambda a:a[1]), if the date is the second field.


Assuming that you know the format of the dates, and that they're in column 1 of your CSV file:

>>> import csv
>>> from datetime import datetime
>>> def date_key(row):
        return datetime.strptime(row[1].strip(), "%m/%d/%Y")

>>> with open('c:\\temp\\test\\date_test.csv', 'rb') as f:
        data = list(csv.reader(f))

>>> data
[['foo', ' 3/11/2004'], ['bar', ' 2/15/2001'], ['baz', '11/15/2007'], ['bat', '10/13/2002']]
>>> data.sort(key=date_key)
>>> data
[['bar', ' 2/15/2001'], ['bat', '10/13/2002'], ['foo', ' 3/11/2004'], ['baz', '11/15/2007']]
0

精彩评论

暂无评论...
验证码 换一张
取 消