The dataset I want to read in contains numbers with and without a comma as thousand separator:
"Sudan", "15,276,000", "14,098,000", "13,509,000"
"Chad", 209000, 196000, 190000
and I am looking for a way to read this data in.
Any hint appreciated!
since there is an "r" tag under the question, I assume this is an R question. In R, you do not need to do anything to handle the quoted commas:
> read.csv('t.csv', header=F)
V1 V2 V3 V4
1 Sudan 15,276,000 14,098,000 13,509,000
2 Chad 209000 196000 190000
# if you want to convert them to numbers:
> df <- read.csv('t.csv', header=F, stringsAsFactor=F)
> df$V2 <- as.numeric(gsub(',', '', df$V2))
Looking at that set of data you could parse it using ", " (note the extra space) as the seperator intead of ","
You could use the following regular expression to remove the commas and any surrounding quote marks to leave plain csv content
,(?=[0-9])|"
then process it as normal
How about doing it as a two step process. 1. Replace the "," with a TAB character 2. Split on tab.
I'm assuming .NET here but the sample principle would apply in any language
精彩评论