开发者

Treatment of 'empty' values

开发者 https://www.devze.com 2023-01-14 16:59 出处:网络
I am importing a csv file into R using the sqldf-package. I have several missing values for both numeric and string variables. I notice that missing values are left empty in the dataframe (as opposed

I am importing a csv file into R using the sqldf-package. I have several missing values for both numeric and string variables. I notice that missing values are left empty in the dataframe (as opposed to being filled with NA or something else). I want to replace the missing values with an user defined value. Obviously, a function like is.na() will not work in this case.

Toy dataframe with three columns:

A  B  C  
3  4  
2  4  6   
34 23 43   
2  5   

I want:

A  B  C  
3  4  NA  
2  4  6   
34 23 43   
2  5  开发者_如何转开发NA 

Thank you in advance.


Assuming you are using read.csv.sql in sqldf with the default sqlite database it is producing a factor column for C so

(1) just convert the values to numeric using as.numeric(as.character(...)) like this:

> Lines <- "A,B,C
+ 3,4,
+ 2,4,6
+ 34,23,43
+ 2,5,
+ "
> cat(Lines, file = "stest.csv")
> library(sqldf)
> DF <- read.csv.sql("stest.csv")
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: Factor w/ 3 levels "","43","6": 1 3 2 1
> DF$C <- as.numeric(as.character(DF$C))
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: num  NA 6 43 NA

(2) or if we use sqldf(..., method = "raw") then we can just use as.numeric:

> DF <- read.csv.sql("stest.csv", method = "raw")
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: chr  "" "6" "43" ""
> DF$C <- as.numeric(DF$C)
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: num  NA 6 43 NA

(3) If its feasible for you to use read.csv then we do get NA filling right off:

> str(read.csv("stest.csv"))
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: int  NA 6 43 NA
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号