I am importing a csv file into R using the sqldf
-package. I have several missing values for both numeric and string variables. I notice that missing values are left empty in the dataframe (as opposed to being filled with NA or something else). I want to replace the missing values with an user defined value. Obviously, a function like is.na()
will not work in this case.
Toy dataframe with three columns:
A B C
3 4
2 4 6
34 23 43
2 5
I want:
A B C
3 4 NA
2 4 6
34 23 43
2 5 开发者_如何转开发NA
Thank you in advance.
Assuming you are using read.csv.sql
in sqldf
with the default sqlite
database it is producing a factor column for C so
(1) just convert the values to numeric using as.numeric(as.character(...))
like this:
> Lines <- "A,B,C
+ 3,4,
+ 2,4,6
+ 34,23,43
+ 2,5,
+ "
> cat(Lines, file = "stest.csv")
> library(sqldf)
> DF <- read.csv.sql("stest.csv")
> str(DF)
'data.frame': 4 obs. of 3 variables:
$ A: int 3 2 34 2
$ B: int 4 4 23 5
$ C: Factor w/ 3 levels "","43","6": 1 3 2 1
> DF$C <- as.numeric(as.character(DF$C))
> str(DF)
'data.frame': 4 obs. of 3 variables:
$ A: int 3 2 34 2
$ B: int 4 4 23 5
$ C: num NA 6 43 NA
(2) or if we use sqldf(..., method = "raw")
then we can just use as.numeric
:
> DF <- read.csv.sql("stest.csv", method = "raw")
> str(DF)
'data.frame': 4 obs. of 3 variables:
$ A: int 3 2 34 2
$ B: int 4 4 23 5
$ C: chr "" "6" "43" ""
> DF$C <- as.numeric(DF$C)
> str(DF)
'data.frame': 4 obs. of 3 variables:
$ A: int 3 2 34 2
$ B: int 4 4 23 5
$ C: num NA 6 43 NA
(3) If its feasible for you to use read.csv
then we do get NA
filling right off:
> str(read.csv("stest.csv"))
'data.frame': 4 obs. of 3 variables:
$ A: int 3 2 34 2
$ B: int 4 4 23 5
$ C: int NA 6 43 NA
精彩评论