I have a dataset as follows and I need to retrieve two things: 1) the sum of VALUE between (date-1) and (date-3) for each date and 2) whether, during the 5 days, there are >= two days where the VALUE is 0. I think PROC SQL should be used but I'm not sure how to implement this. INPUT DATASET:
ID DATE VALU开发者_如何学编程E
1 20110101 0
1 20110102 0
1 20110103 1
1 20110104 2
2 20110101 1
2 20110102 2
2 20110103 3
2 20110104 4
Output should be 1) 1 (0+0+1) for ID1, 20110104 and 6 (1+2+3) for ID2, 20110104. and 2) a mark for ID1, 20110104, since there are 2 days with a value of 0 during the 3-day window.
Any help is greatly appreciated!
Both problems can be solved with a similar SQL query. Your second question is a bit confusing, because you once mention a 5 day periode and once a 3 day window. I used the same 3 day window for both queries, so modify the start and end date if you need another window.
1)
proc sql;
select t1.id, t1.date, sum(t2.value) as totalvalue
from _input t1
left join _input t2
on t1.date-4 lt t2.date
and t1.date gt t2.date
and t1.id = t2.id
group by t1.id, t1.date;
quit;
2)
proc sql;
select t1.id, t1.date
from _input t1
left join _input t2
on t1.date-4 lt t2.date
and t1.date gt t2.date
and t1.id = t2.id
and t2.value = 0
group by t1.id, t1.date
having count(*) ge 2
;
quit;
Here is an alternate way that just uses a data step. I'm assuming that you don't want sums and marks for ranges of less than three records so the data step explicitly sets them to undefined.
proc sort data=sample;
by id date;
run;
data result(drop=k count);
retain count;
set sample;
by id;
if first.id then count=0;
sum=lag1(value) + lag2(value) + lag3(value);
if count<3 then sum=.;
k=0;
if lag1(value)=0 then k=k+1;
if lag2(value)=0 then k=k+1;
if lag3(value)=0 then k=k+1;
if k ge 2 then mark=1;
count=count+1;
run;
proc print data=result;
run;
精彩评论