开发者

Error in Math.data.frame.....non-numeric variable in data frame:

开发者 https://www.devze.com 2023-03-26 02:16 出处:网络
I am reading a csv file into R and trying to do tak开发者_JS百科e the log of the data.The csv file has columns of data with the first row having text headers and the rest numeric data.

I am reading a csv file into R and trying to do tak开发者_JS百科e the log of the data. The csv file has columns of data with the first row having text headers and the rest numeric data.

data<-read.csv("rawdata.csv",header=T)
trans<-log(csv2)

I get the following error when I do this:

Error in Math.data.frame(list(Revenue = c(18766L, 20197L, 20777L, 23410L, : non-numeric variable in data frame: Costs

Output of str should have been inserted in Q-body:

data.frame': 167 obs. of 3 variables: 
 $ X: int 18766 20197 20777 23410 23434 22100 22337 21511 22683 23151 ... 
 $ Y: Factor w/ 163 levels "1,452.70","1,469.00",..: 22 9 55 109 158 82 131 112 119 137 ...
 $ Z: num 564 608 636 790 843 ...

How do I correct this?


Tada! Y is a factor - big problem. The commas shouldn't be in there.

Also, your original question has some anomalies: data is the loaded data.frame, yet the transformation is applied to csv2. Did you rename the columns? If so, you've not given a full summary of the steps involved. Anyway, the issue is that you have commas in your second column.


EDIT: removed speculation about structure given that it has now been offered.

Dataframes are lists, so lapply will loop over them columns and return the math function done on them.

If the column is a factor (and here str(Costs) would tell you) then you could do the possibly inefficient approach of converting all columns as if they were factors:

Costs_logged <- lapply(Costs, function(x) log(as.numeric(as.character(x))) )
Costs_logged

(See the FAQ about factor conversion to numeric.)

EDIT2: If you want to convert the factor variable with commas in the labels use this method:

data$Y <- as. numeric( gsub("\\,", "", as.character(data$Y)  ) )

The earlier version of this only had a single-backslash, but since both regex and R use backslashes as escape characters, "special regex characters" (see ?regex for listing) need to be doubly escaped.


Can you give use the first few values for the variable that is giving you trouble? If the "Costs" variable is giving you trouble (what it looks like from your example), execute something like this:

data <- read.csv("rawdata.csv",header=T)
data[c(1:5),"Costs"]

It sounds as though you have a column of values in the csv file -- column Y -- that has commas in the numbers. That is, it sounds like your csv file looks like this:

X,Y,Z
"18766","1,452.70","564"
"20197","1,469.00","608"

or X,Y,Z 18766,"1,452.70",564 20197,"1,469.00",608

or something similar. If this is the case, the problem is that column Y can't be read easily by R with a comma in it (even though it makes it easier for us humans to read). You need to get rid of those commas; that is, make your data file look like this:

X,Y,Z
18766,1452.70,564
20197,1469.00,608

(you can leave the quotes in -- just get rid of the commas in the numbers themselves).

There are a number of ways to do this. If you exported your data from excel, format that column differently. Or, alternatively, open the csv in excel, save it as a tab-delimited file, open the file in your favorite text editor, and find-and-delete the commas ("find and replace with nothing").

Then try to pull it back into R with your original command.


Clearly the columns are not all numeric, so just ensure that they are. You can do this by forcing the class of every column when read in:

data <- read.csv("rawdata.csv", colClasses = "numeric")

(read.csv is just a wrapper on read.table, and header = TRUE by default)

That will ensure all columns are of class numeric if that is in fact possible.

If they really are not numeric columns, exclude the ones you don't want to transform, or just work on the columns individually:

x <- data.frame(x = 1:10, y = runif(1, 2, 10), z = letters[1:10])

colClasses can be used to ignore columns by specifying "NULL" if that makes things simpler.

These are equivalent since "x" and "y" are the first 2 columns:

log(x[ , 1:2])


log(x[ , c("x", "y")])

Individually:

log(x$x)

log(x$y)

It's always important to check assumptions about the data read from external sources. Basic checks like summary(x), head(x) and str(x) will show you what the data actually are.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号