I am trying to merge two data.frames
together, based on a common column name in each of them called series_id
. Here is my merge statement:
merge(test_growth_series_LUT, test_growth_series, by = intersect(series_id, series_id))
The error I'm getting is
Error in as.vector(y) : object 'series_id' not found
The help gives this description, but I can't see why it can't find the series_id
. Example data is below.
### S3 method for class 'data.frame':
#merge(x, y, by = intersect(names(x), names(y)),
# by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
# sort = TRUE, suffixes = c(".x",".y"), ...)
# Create a long data.frame to store data...
test_growth_series = data.frame ("read_day" = c(0, 3, 9, 0, 3, 9, 0, 2, 8),
"series_id" = c("p1s1", "p1s1", "p1s1", "p1s2", "p1s2", "p1s2", "p3s4", "p3s4", "p3s4"),
"mean_od" = c(0.6, 0.9, 1.3, 0.3, 0.6, 1.0, 0.2, 0.5, 1.2),
"sd_od" = c(0.1, 0.2, 0.2, 0.1, 0.1, 0.3, 0.04, 0.1, 0.3),
"n_in_stat" = c(8, 8, 8, 8, 7, 5, 8, 7, 2)
)
# Create a name LUT
test_growth_series_LUT = data.frame ("series_id" = c("p1s1", "p1s2", "p3s4", "p4s2", "p5s2", "p6s2", "p7s4", "p8s4", "p9s4"),"description" = c("blah1", "blah2", "blah3", "blah4", "blah5", "blah6", "blah7", "blah8", "blah9")
)
> test_growth_series
read_day series_id mean_od sd_od n_in_stat
1 0 p1s1 0.6 0.10 8
2 3 p1s1 0.9 0.20 8
3 9 p1s1 1.3 0.20 8
4 0 p1s2 0.3 0.10 8
5 3 p1s2 0.6 0.10 7
6 9 p1s2 1.0 0.30 5
7 0 p3s4 0.2 0.04 8
8 2 p3s4 0.5 0.10 7
9 8 p3s4 1.2 0.30 2
&开发者_如何学运维gt; test_growth_series_LUT
series_id description
1 p1s1 blah1
2 p1s2 blah2
3 p3s4 blah3
4 p4s2 blah4
5 p5s2 blah5
6 p6s2 blah6
7 p7s4 blah7
8 p8s4 blah8
9 p9s4 blah9
>
this is what I'm trying to achieve:
> new_test_growth_series
read_day series_id mean_od sd_od n_in_stat description
1 0 p1s1 0.6 0.10 8 blah1
2 3 p1s1 0.9 0.20 8 blah1
3 9 p1s1 1.3 0.20 8 blah1
4 0 p1s2 0.3 0.10 8 blah2
5 3 p1s2 0.6 0.10 7 blah2
6 9 p1s2 1.0 0.30 5 blah2
7 0 p3s4 0.2 0.04 8 blah3
8 2 p3s4 0.5 0.10 7 blah3
9 8 p3s4 1.2 0.30 2 blah3
You can just do this:
merge(test_growth_series_LUT, test_growth_series)
It will automatically match the names. If you need to specify the column, you do it like this:
merge(test_growth_series_LUT, test_growth_series, by = "series_id")
Or this way if you need to specify on both sides (only needed if they have different names that you want to match on):
merge(test_growth_series_LUT, test_growth_series, by.x = "series_id", by.y = "series_id")
I recommend looking at the examples (and walking through them) by going to the help for merge (?merge
) or by calling example("merge", "base")
(less useful that actually walking through it yourself.
Two notes:
- You would never need to use the intersect function here. Use
c()
to specify multiple column names explicitly. Or use theall
,all.x
, andall.y
parameters to specify what kind of join you want. - You would use quotes to specify a column name in most cases unless you have attached the data. Otherwise it will complain about not being able to locate the name. In particular, the name needs to be in the search path when you aren't using quotes.
The error I'm getting is "Error in as.vector(y) : object 'series_id' not found"
A column in your data.frame can be referred to like this: test_growth_series$series_id
, which returns the vector of series_id's. Doing the intersect is unnecessary, but would be correctly written like this:
intersect(test_growth_series$series_id, test_growth_series_LUT$series_id)
To be slightly more correct, you probably want to do a left join by using all.x=TRUE
. This covers you in case a series_id from test_growth_series doesn't appear in your look up table. Without it, you might end up missing rows in your result.
merge(test_growth_series, test_growth_series_LUT, by = "series_id", all.x=TRUE)
This topic is also discussed in How to join data frames in R (inner, outer, left, right)?
精彩评论