开发者

data.frame and change of variable class R

开发者 https://www.devze.com 2023-02-21 20:04 出处:网络
I´m trying to get a lot of regression coefficients into a dataframe for latex´ing afterwards. However, I´m running into the following problem that I cannot understand after pasting together some va

I´m trying to get a lot of regression coefficients into a dataframe for latex´ing afterwards. However, I´m running into the following problem that I cannot understand after pasting together some values into confidence intervals:

> str(q2)
'data.frame':   5 obs. of  7 variables:
 $ name     : Factor w/ 5 levels "1","2",..: 1 2 3 4 5
 $ Intercept: Factor w/ 5 levels "15.4533848220452",..: 1 2 3 4 5
 $ Int.lb   : Factor w/ 5 levels "14.2125590292247",..: 1 2 开发者_C百科3 4 5
 $ Int.ub   : Factor w/ 5 levels "17.1483176230248",..: 1 2 3 4 5
 $ BAC      : Factor w/ 5 levels "-0.317030740768092",..: 1 2 3 4 5
 $ Bac.lb   : Factor w/ 5 levels "-0.789518593140102",..: 1 2 3 4 5
 $ Bac.ub   : Factor w/ 5 levels "0.0844578956839408",..: 1 2 3 4 5
> str(q3)
'data.frame':   5 obs. of  2 variables:
 $ CI: Factor w/ 5 levels "(12.17,14.34)",..: 2 1 5 4 3
 $ ci: Factor w/ 5 levels "(-0.31,0.74)",..: 3 5 2 4 1
> q4<-as.data.frame(cbind(name=q2$name,Intercept=q2$Intercept,Interecpt.95.CI=q3$CI,BAC=q2$BAC,BAC.95.CI=q3$ci))
> q4
  name Intercept Interecpt.95.CI BAC BAC.95.CI
1       1         1               2   1         3
2       2         2               1   2         5
3       3         3               5   3         2
4       4         4               4   4         4
5       5         5               3   5         1

> str(q4)
'data.frame':   5 obs. of  5 variables:
 $ name        : int  1 2 3 4 5
 $ Intercept      : int  1 2 3 4 5
 $ Interecpt.95.CI: int  2 1 5 4 3
 $ BAC            : int  1 2 3 4 5
 $ BAC.95.CI      : int  3 5 2 4 1

I.e. Why did the q4 variables all of a sudden change?


The short answer is the factors got converted to their internal numeric codes. It happened during the cbind() call:

R> set.seed(1)
R> dat <- data.frame(A = factor(sample(1:5, 10, rep = TRUE)), 
+                    B = factor(sample(100:200, 10, rep = TRUE)))
R> head(dat)
  A   B
1 2 120
2 2 117
3 3 169
4 5 138
5 2 177
6 5 150
R> str(dat)
'data.frame':   10 obs. of  2 variables:
 $ A: Factor w/ 5 levels "1","2","3","4",..: 2 2 3 5 2 5 5 4 4 1
 $ B: Factor w/ 9 levels "117","120","138",..: 2 1 5 3 7 4 6 9 3 8
R> cbind(name = dat$A, foo = dat$B)
      name foo
 [1,]    2   2
 [2,]    2   1
 [3,]    3   5
 [4,]    5   3
 [5,]    2   7
 [6,]    5   4
 [7,]    5   6
 [8,]    4   9
 [9,]    4   3
[10,]    1   8

The reason is that cbind() produces a matrix and that is where the conversion happens. It would be easier to create a new data frame in this instance:

R> dat2 <- data.frame(name = dat$A, foo = dat$B)
R> dat2
   name foo
1     2 120
2     2 117
3     3 169
4     5 138
5     2 177
6     5 150
7     5 172
8     4 200
9     4 138
10    1 178

rather than a cbind() followed by an as.data.frame() pair of calls.

But the real source of the problem is the numeric data stored as a factor in q2. How were these data read in or generated in R? If the were read in to R, why do the end up as a factor? Usually is the data are all numeric in a column R will read in the values as numerics. If there is anything text-like in the data column though, it will get converted to a factor. So I'd try to solve that issue - why were the data in q2 factors - as it might indicate some issues with reading or generating the data that you are not aware of.

0

精彩评论

暂无评论...
验证码 换一张
取 消