Being a R user, I am learning Stata now using this resource, and am puzzled about the merge
command.
In R, I don't have to worry about merging data wrongly, because it merges everything anyway. I don't need to worry if the common columns contain any duplicates, because the Y
dataframe will merge to each of the duplicated row in X
dataframe. (using all=FALSE
in merge
)
开发者_如何转开发But for Stata, I need to remove the duplicate rows from X
before proceeding to merge.
Is it being assumed in Stata that, in order for merge
to proceed, the common column in the master table must be unique?
The answer to your question is No. I will try to explain why.
The link you mention covers only one type of merge that is possible with Stata, namely the one-to-many merge.
merge 1:m varlist using filename
Other types of merge are possible:
One-to-one merge on specified key variables
merge 1:1 varlist using filename
Many-to-one merge on specified key variables
merge m:1 varlist using filename
Many-to-many merge on specified key variables
merge m:m varlist using filename
One-to-one merge by observation
merge 1:1 _n using filename
Details, explanations and examples can be found in help merge
.
If you do not know if observations are unique in a dataset, you can do the following check:
bysort idvar: gen N = _N
ta N
If you find values of N that are greater than 1, you know that observations are not unique with respect to idvar.
This is in fact the new syntax of the merge
command that has been introduced with Stata 11. Before Stata 11, the merge command was a bit simpler. You simply had to sort your data, and then you could do:
merge varlist using filename
By the way, you can still use this old syntax in Stata 11 or higher.
joinby, unmatched(both) is the command that corresponds to the R command merge.
In particular merge m:m DOES NOT do a many to many merge (ie full join) contrary to what the documentation implies.
精彩评论