I wonder if there is a better way to put two data.frames into one, treating the rownames as if it was a column and then merge by this column with some other data.frame. I know I could do the following
df1$rn <- row(df1)
all <- merge(df1,df2, by.x="rn", by.y="some_column")
I mean this produces redu开发者_高级运维ndant data (rownames as column) which is not needed at all. So what´s the smarter way to do it?
You can use "row.names" or 0 as the index for row names.
An example using the authors
and books
from merge
help:
rownames(authors) <- authors$surname
merge(authors, books, by.x = "row.names", by.y = "name")
"A smarter way" really depends on your data, which we don't have. but
df1 <- data.frame(
X1 = 1:10,
id = letters[1:10]
)
df2 <- data.frame(
X2 = 10:1,
X3 = letters[11:20]
)
rownames(df2) <- df1$id
df2 <- df2[sample.int(10),]
cbind(df1,df2[match(df1$id,rownames(df2)),])
Edit: Vitoshka's answer is the one you're looking for. If I'd have bothered looking at the help files of ?merge
, I would have known that as well...
I leave my solution here just in case somebody needs a speedy alternative to merge:
> system.time(replicate(1000,cbind(df1,df2[match(df1$id,rownames(df2)),])))
user system elapsed
0.57 0.00 0.57
> system.time(replicate(1000,merge(df1,df2,by.x="id",by.y="row.names")))
user system elapsed
2.36 0.02 2.37
精彩评论