开发者

Python DictReader - Skipping rows with missing columns?

开发者 https://www.devze.com 2022-12-31 06:38 出处:网络
I have a Excel .CSV file I\'m attempting to read in with DictReader. All seems to be well, except it seems to omit rows, specifically those with missing columns.

I have a Excel .CSV file I'm attempting to read in with DictReader.

All seems to be well, except it seems to omit rows, specifically those with missing columns.

Our input looks like:

mail,givenName,sn,lorem,ipsum,dolor,telephoneNumber
ian.bay@blah.com,ian,bay,3424,8403,2535,+65(2)34523534545
mike.gibson@blah.com,mike,gibson,3424,8403,2535,+65(2)34523534545
ross.martin@blah.com,ross,martin,,开发者_运维技巧,,+65(2)34523534545
david.connor@blah.com,david,connor,,,,+65(2)34523534545
chris.call@blah.com,chris,call,3424,8403,2535,+65(2)34523534545

So some of the rows have missing lorem/ipsum/dolor columns, and it's just a string of commas for those.

We're reading it in with:

def read_gd_dump(input_file="blah 20100423.csv"):
    gd_extract = csv.DictReader(open('blah 20100423.csv'), restval='missing', dialect='excel')
    return dict([(row['something'], row) for row in gd_extract])

And I checked that "something" (the key for our dict) isn't one of the missing columns, I had originally suspected it might be that. It's one of the columns after that.

However, DictReader seems to completely skip over the rows. I tried setting restval to something, didn't seem to make any difference. I can't seem to find anything in Python's CSV docs (http://docs.python.org/library/csv.html) that would explain this behaviour, but I may have misread something.


Can't reproduce your problem -- when I save that data and then assign list(gd_extract), I see:

[{'telephoneNumber': '+65(2)34523534545', 'ipsum': '8403', 'sn': 'bay', 'dolor': '2535', 'mail': 'ian.bay@blah.com', 'givenName': 'ian', 'lorem': '3424'}, {'telephoneNumber': '+65(2)34523534545', 'ipsum': '8403', 'sn': 'gibson', 'dolor': '2535', 'mail': 'mike.gibson@blah.com', 'givenName': 'mike', 'lorem': '3424'}, {'telephoneNumber': '+65(2)34523534545', 'ipsum': '', 'sn': 'martin', 'dolor': '', 'mail': 'ross.martin@blah.com', 'givenName': 'ross', 'lorem': ''}, {'telephoneNumber': '+65(2)34523534545', 'ipsum': '', 'sn': 'connor', 'dolor': '', 'mail': 'david.connor@blah.com', 'givenName': 'david', 'lorem': ''}, {'telephoneNumber': '+65(2)34523534545', 'ipsum': '8403', 'sn': 'call', 'dolor': '2535', 'mail': 'chris.call@blah.com', 'givenName': 'chris', 'lorem': '3424'}]

five dicts, including those with missing ipsum etc. I fear that in your laudable attempt at simplifying the problem you've simplified it excessively, so that your bug has gone away.

If you have duplicates in column something (can't check, since you don't have that column in your sample data) that would of course explain the "apparently missing" rows -- they're not missing from the csv reader's returned stream, they get "overwritten" in the dict you're returning. Could that be the issue?


This may be nothing to do with your problem, and Alex's analysis is quite plausible given the lack of information, but you should ALWAYS open a csv file with "rb" or "wb" mode (assuming Python 2.X). If you don't, you run the risk of various mysterious happenings. A csv file is not a text file, it's a BINARY file.

In any case, please edit your question to show:
(1) (a) a sample file (b) a script (c) output -- which together demonstrate the alleged problem
(2) what version of Python you are running
(3) what OS

Update: For Python 3.X, do as the blessed manual says: """If csvfile is a file object, it should be opened with newline=''. Although this advice is included only with csv.reader, it applies equally to csv.writer, csv.DictReader, and csv.DictWriter.

0

精彩评论

暂无评论...
验证码 换一张
取 消