开发者

Merging datasets based on 2 variables in SAS

开发者 https://www.devze.com 2022-12-23 02:34 出处:网络
I\'m working with different databases. All of them contain information about 1000+ companies. A company is defined by its ticker code (the short version of the name (i.e. Ford as F) usually seen on st

I'm working with different databases. All of them contain information about 1000+ companies. A company is defined by its ticker code (the short version of the name (i.e. Ford as F) usually seen on stock quotation boards).

Aside from the ticker code to merge on I also have to merge on the time. I used month as a count variable throughout my time series. The final purpose is to have a regression in the kind of

Y(jt) = c + X(jt) +X1(jt) etc with j = company (ticker) and t = time (month).

So imagine I have 2 databases, one of which is the base database with variables such as Tickers, months, betas of a company (risk measure) etc. and a second database which has an extra variable (let's say market capitalisation).

What I want to do then is to merge these 2 databases based on the ticker and the month.

Example: Base database:

Ticker ____ Month ____ Betas

AA ____ 4 ____ 1.2

BB ____ 8 ____ 1.18

Second database:

Ticker ____ Month ____ M开发者_如何转开发CAP

AA ____ 4 ____ 8542

BB ____ 6 ____ 1245

Then after merge I would like to have something like this:

Ticker ____ Month ____ Betas ____ MCAP

AA ____ 4 ____ 1.2 ____ 8542

So all observations that do not match BOTH the date and ticker have to be dropped. I'm sure this is possible, just can't find the right type of code.

PS: I'm guessing the underscores have something to do with font layout but both the bold as italic is supposed to be normal :)


Agree with Jonathan... after sorting both datasets independently by ticker and time, the data step of merging is what I would use..... little modification

data want; 
   merge base(in = b) mcap(in = m); 
   by ticker time;
   if m & b; 
run;

Records that don't have common ticker and time in both datasets would be dropped automatically..


Calling the two datasets base and mcap, and assuming that they have both been sorted by ticker and month, you can do it this way:

data want;
  merge base(in = b)
        mcap(in = m);
  if m & b;
run;

The subsetting if will not accept any row that does not match in bath datasets.


Ok so it appears you can just do it very easily by:

proc sort data=work;
by ticker month;
run;
proc sort data=wsize;
by ticker month;
run;
data test;
merge work(in=a) wsize(in=b);
by ticker month;
frommerg=a;
fromwtvol=b;
run;
data test;
set test;
if frommerg=0 then delete;
run;
data test;
set test;
if fromwtvol = 0 then delete;
run;
data test;
set test;
drop frommerg fromwtvol;
run;

That's the code I used, I tried this before posting because I didn't want to look like a leecher but it so happens that the 2 databases i tried had nothing in common (what are the odds with 70.000 observations :D), I retried it and it works (for now!)

Thanks anyway!


proc sort data=database1;
by ticker month;
run;

proc sort data=database2;
by ticker month;
run;
data gh;
merge database1(in=a) database2(in=b);
by ticker month;
if a and b;
run;
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号