开发者

Can scan (or any import function) return partial results after it bump into errors?

开发者 https://www.devze.com 2023-01-17 06:20 出处:网络
Is there anything I can do to get partial results from after bumping into errors in a big file? I am using the following command to import d开发者_如何学编程ata from files. This is the fastest way I k

Is there anything I can do to get partial results from after bumping into errors in a big file? I am using the following command to import d开发者_如何学编程ata from files. This is the fastest way I know, but it's not robust. It can easily screw up everything because of a small error. I hope at least there is way that scan(or any reader) can quickly return which row/line has the error, or partial results it read (than I will have an idea where the error is). Then, I can skip enough lines to recover over 99% good data.

rawData = scan(file = "rawData.csv", what = scanformat, sep = ",", skip = 1, quiet = TRUE, fill = TRUE, na.strings = c("-", "NA", "Na","N"))

All importing data tutorials I found seem to assume the files are in good shape. I didn't find a useful hint to deal with dirty files.

I will sincerely appreciate any hint or suggestion! It was really frustrating.


Idea1: Open a file connection (with file function) and then scan line by line (with nlines=1). Put each scan into try to recover after reading a bad line.

Idea2: Use readLines to read the file in raw format; then use strsplit to parse. You can analyse this output to find bad lines and remove it.


The count.fields function will preprocess a table like file and give you how many fields it found on each line (in the sense that read.table will look for fields). This is often a quick way to identify lines that have a problem because they will show a different number of fields from what is expected (or just different from the majority of other lines).

0

精彩评论

暂无评论...
验证码 换一张
取 消