开发者

merge command comparison between R and Stata

开发者 https://www.devze.com 2023-04-02 20:35 出处:网络
Being a R user, I am learning Stata now using this resource, and am puzzled about the merge command. In R, I don\'t have to worry about merging data wrongly, because it merges everything anyway. I do

Being a R user, I am learning Stata now using this resource, and am puzzled about the merge command.

In R, I don't have to worry about merging data wrongly, because it merges everything anyway. I don't need to worry if the common columns contain any duplicates, because the Y dataframe will merge to each of the duplicated row in X dataframe. (using all=FALSE in merge)

开发者_如何转开发But for Stata, I need to remove the duplicate rows from X before proceeding to merge.

Is it being assumed in Stata that, in order for merge to proceed, the common column in the master table must be unique?


The answer to your question is No. I will try to explain why.

The link you mention covers only one type of merge that is possible with Stata, namely the one-to-many merge.

merge 1:m varlist using filename

Other types of merge are possible:

One-to-one merge on specified key variables

merge 1:1 varlist using filename

Many-to-one merge on specified key variables

merge m:1 varlist using filename

Many-to-many merge on specified key variables

merge m:m varlist using filename

One-to-one merge by observation

merge 1:1 _n using filename

Details, explanations and examples can be found in help merge.

If you do not know if observations are unique in a dataset, you can do the following check:

bysort idvar: gen N = _N

ta N

If you find values of N that are greater than 1, you know that observations are not unique with respect to idvar.

This is in fact the new syntax of the merge command that has been introduced with Stata 11. Before Stata 11, the merge command was a bit simpler. You simply had to sort your data, and then you could do:

merge varlist using filename

By the way, you can still use this old syntax in Stata 11 or higher.


joinby, unmatched(both) is the command that corresponds to the R command merge.

In particular merge m:m DOES NOT do a many to many merge (ie full join) contrary to what the documentation implies.

0

精彩评论

暂无评论...
验证码 换一张
取 消