开发者

Numeric Column in data.frame returning "num" with str() but not is.numeric()

开发者 https://www.devze.com 2022-12-17 15:50 出处:网络
I have a data.frame, d1, that has 7 columns, the 5th through 7th column are supposed to be numeric: str(d1[5])

I have a data.frame, d1, that has 7 columns, the 5th through 7th column are supposed to be numeric:

str(d1[5])
'data.frame':   871 obs. of  1 variable:
 $ Latest.Assets..Mns.: num  14008 1483 11524 1081 2742 ... 

is.numeric(d1[5])
[1] FALSE

as.numeric(d1[5])
Error: (list) object cannot be coerced to type 'double'

How can this be? If str identifies it as numeric, how can it not be nu开发者_运维知识库meric? I'm importing from CSV.


> is.numeric_data.frame=function(x)all(sapply(x,is.numeric))

> is.numeric_data.frame(d1[[5]])
[1] TRUE 

Why

d1 is a list, hence d1[5] is a list of length 1, and in this case contains a data.frame. to get the data frame, use d1[[5]].

Even if a data frame contains numeric data, it isn't numeric itself:

> x = data.frame(1:5,6:10)
> is.numeric(x)
[1] FALSE

Individual columns in a data frame are either numeric or not numeric. For instance:

> z <- data.frame(1:5,letters[1:5])

> is.numeric(z[[1]])
[1] TRUE
> is.numeric(z[[2]])
[1] FALSE

If you want to know if ALL columns in a data frame are numeric, you can use all and sapply:

> sapply(z,is.numeric)
    X1.5 letters.1.5. 
    TRUE        FALSE 

> all(sapply(z,is.numeric))
[1] FALSE

> all(sapply(x,is.numeric))
[1] TRUE

You can wrap this all up in a convenient function:

> is.numeric_data.frame=function(x)all(sapply(x,is.numeric))

> is.numeric_data.frame(d1[[5]])
[1] TRUE 


d1[5] is not a single value. It's a vector (possibly a list?) of values. If you grab a single value I bet it is numeric. For example:

is.numeric(d1[5][[1]])
as.numeric(d1[5][[1]])

So I think the confusion is between the column object and the elements in the column. R makes a distinction between those two ideas while other languages, like SQL, functionally assume that when discussing the column you're usually referring to the elements of the column.

This discussion of indexing from the R Language Definition doc really helped me wrap my head around how to reference items in R.


It may be a list (based on the error message). Have you tried class(d1[5])? If it's a list, then you would expect either d1[[5]] or d1[5][[1]] to be numeric.

Edit:

Given that d1[5] is itself a data frame, you need to treat it as such. Something like this should work:

is.numeric(d1[5][,1])
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号