开发者

R: elegant way to determine numeric variables in a data frame

开发者 https://www.devze.com 2023-03-14 14:22 出处:网络
Here\'s the code I use to find numeric variables in a data frame: Data <- iris numericvars <- NULL

Here's the code I use to find numeric variables in a data frame:

Data <- iris
numericvars <- NULL
for (Var in names(Data)) {
    if(class(Data[,Var]) == 'integer' | class(Data[,Var]) == 'numeric') {
        numericvars <- c(numericvars,Var)
    }
}开发者_运维百科
numericvars

Is there a less loopy way to do this?


This is a pretty simple one-liner with sapply:

sapply(Data, is.numeric)
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#         TRUE         TRUE         TRUE         TRUE        FALSE

# is.numeric should pick up integer columns too
Data$Species <- as.integer(Data$Species)
sapply(Data, is.numeric)
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#         TRUE         TRUE         TRUE         TRUE         TRUE


This is a little tighter:

R> sapply(colnames(iris), function(x) inherits(iris[,x], c("numeric","integer")))
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
        TRUE         TRUE         TRUE         TRUE        FALSE 
R> 


The use of sapply() or lapply() seems logical here:

sapply(iris, function(x) class(x) %in% c("integer","numeric"))

which gives:

> sapply(iris, function(x) class(x) %in% c("integer","numeric"))
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
        TRUE         TRUE         TRUE         TRUE        FALSE

Worth noting that in your loop, you are growing the numericvars vector at each iteration of the loop; in R, that is a big no-no! It forces R to copy and expand the vector each time. Allocate sufficient storage before hand and fill in the object; here that would mean creating numericvars as

numericvars <- character(length = ncol(iris))

then in the loop doing

nams <- names(iris)
for(i in seq_len(ncol(iris))) {
    if(class(iris[, i]) == 'integer' | class(iris[, i]) == 'numeric') {
        numericvars[i] <- nams[i]
    }
}

A little bit more work, but far more efficient, though you'll only see it when the number of iterations becomes larger.


There's also colwise(), numcolwise() and catcolwise() in plyr. colwise() turns a function that operates on a vector into a function that works column-wise on a dataframe. numcolwise and catcolwise provide versions that operate only on numeric and discrete variables respectively.

library(plyr)
colwise(is.numeric)(Data)

> colwise(is.numeric)(Data)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1         TRUE        TRUE         TRUE        TRUE   FALSE
0

精彩评论

暂无评论...
验证码 换一张
取 消