How to create a linear regression with R?_问答_开发者

How to create a linear regression with R?

开发者 https://www.devze.com 2023-03-23 09:26 出处：网络

I have a simple matrix like: [,1] [开发者_Go百科,2] [,3] [1,]123 [2,]456 [3,]789 [4,]101112 I have to calculate a linear regression of these columns, like: lm(x ~ y)

相关专题：

I have a simple matrix like:

     [,1] [开发者_Go百科,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12

I have to calculate a linear regression of these columns, like: lm(x ~ y)

Where thefirst column is the X, and the other are the Y? I mean... can I do something to use the other with one variable(y)

do i have to use something like: lm(x~y+z+c+b) etc etc ?

Thank you

Yes, but I wouldn't really recommend it:

> set.seed(2)
> mat <- matrix(runif(12), ncol = 3, byrow = TRUE)
> mat
          [,1]      [,2]      [,3]
[1,] 0.1848823 0.7023740 0.5733263
[2,] 0.1680519 0.9438393 0.9434750
[3,] 0.1291590 0.8334488 0.4680185
[4,] 0.5499837 0.5526741 0.2388948
> mod <- lm(mat[,1] ~ mat[,-1])
> mod

Call:
lm(formula = mat[, 1] ~ mat[, -1])

Coefficients:
(Intercept)   mat[, -1]1   mat[, -1]2  
     1.0578      -1.1413       0.1177

Why is this not recommended? Well, you are abusing the formula interface here; it works but the model coefficients have odd names and you are incurring a lot of overhead of working with the formula interface, which is designed for extracting response/covariates from a data frame or list object referenced in the symbolic formula.

The usual way of working is:

df <- data.frame(mat)
names(df) <- c("Y","A","B")
## specify all terms:
lm(Y ~ A + B, data = df)
## or use the `.` shortcut
lm(Y ~ ., data = df)

If you don't want to go via the data frame, then you can call the workhorse function behind lm(), lm.fit(), directly with a simple manipulation:

lm.fit(cbind(rep(1, nrow(mat)), mat[,-1]), mat[, 1])

here we bind on a vector of 1s to columns 2 and 3 of mat (cbind(rep(1, nrow(mat)), mat[,-1])); this is the model matrix. mat[, 1] is the response. Whilst it doesn't return an "lm" classed object, it will be very quick and can relatively easily be converted to one if that matters.

By the way, you have the usual notation back to front. Y is usually the response, with X indicating the covariates used to model or predict Y.