I have a huge csv file, separated by comma's and I want to do a analysis with glm in R. In one column there exists data with a comma implied, something like: bla,blabla When reading the file in R with read.csv.sql there comes a error-message: RS-DBI driver: (RS_sqlite_import: ./agp.csv line 47612 expected 37 columns of data but found 38) This is due to the 'extra' comma in some of the data, not the whole column has an extra column. How can I fix this? I want to remove this extra 开发者_运维百科superfluous comma. Thanks for the reaction, André
The CSV format is very simple and can easily be hand edited. In order to include a comma in a value, you must surround the value with quotes quotes. Try this: "bla,blabla"
. If that data happens to contain any quotes, eg. blah,"thequotedblah",blah
, those quotes need to be escaped with another quote, like this: "blah,""thequotedblah"",blah"
.
Although there is no official standard around it, there isn't much to the CSV format. Wikipedia has a great CSV reference that I have personally used to implement CSV support in applications. Spend 5-10 minutes reading it and you'll know everything you ever need to know to manually create/read/repair CSV data.
Is it just this one line that contains a non-quoted comma - or are there several such lines? Editing the .csv with an editor that can handle large files (e.g. Ultraedit) to sanitize that one record would certainly help. Asaph's suggestion of quoting is also a good 'un.
精彩评论