I have two data sets in which some records are relevant to each other.
E.g.
Dataset1
Var1
abcde
bad man
big bang
strange
everyday
exactly
Dataset2
var1
abc
cde
bad
bad man a
stranger
Now I want to compare those records using a loop logic, and here is my code.
%let id1=%sysfunc(open(dataset2,in)); %let colterm=%sysfunc(varnum(&id1,var1)); %do %while(%sysfunc(fetch(&id1)) eq 0); %let vterm=%sysfunc(getvarc(&id1,&colterm));
data dataset1;
set dataset1;
if index(strip(var1),strip("&vterm"))>0 or index(strip("&vterm"),strip(var1))>0 then do;/*when one contains the other*/
match="Fuzzy";
cnt=cnt+1;
end;
run;
%end;
proc sql noprint;
select max(cnt) into:maxnum/*to get max cnt*/
from dataset1;
quit;
Now dataset1 looks like below
Var1 cnt match
abcde 2 Fuzzy
bad man 2 Fuzzy
big bang 0
strange 1 Fuzzy
everyday 0
exactly 0
I want to merge those relevant records in dataset2 into dataset1, and the new dataset1 should look like below
Var1 cnt match FM_dataset2_1 FM_dataset2_2
abcde 2 Fuzzy abc cde
bad man 2 Fuzzy bad bad man a
big bang 0
strange 1 Fuzzy stranger
everyday 0开发者_运维百科
exactly 0
As you can see the new variables FM_dataset2_1 and FM_dataset2_2 are auto-reassigned ones based one counter, cnt. But I just couldn't think out a proper way of realizing this step using SAS code.
Further more, I need to output the dataset into an xml file. And the result should look like below
<text>abcde</text>
<match>Fuzzy</match>
<matchitem>abc</matchitem>
<matchitem>tecde</matchitem>
The problem, as with the issue above, is also about how to determine the number of matchitem element and write into the file. In xml map file, I can determine the position as follows
<COLUMN name="FM_dataset2_1">
<PATH syntax="XPath">/../matchitem[position()=**1**]</PATH>
...
<COLUMN name="FM_dataset2_2">
<PATH syntax="XPath">/../matchitem[position()=**2**]</PATH>
But this has to be done mannually case by case. Is it possible to customize map file based on cnt counter(maxnum) automatically?
Can anybody suggest?
I'm sure there is more efficient code than the following, but I tried to stay with your line of thought. I am not familiar with working with the XML engine, so I'll leave that part to someone else. Otherwise, if you need to create it manually then you were on the right track creating the MAXNUM macro variable, then you can use it in a loop.
%let id1=%sysfunc(open(dataset2,in));
%let colterm=%sysfunc(varnum(&id1,var1));
%do %while(%sysfunc(fetch(&id1)) eq 0);
%let vterm=%sysfunc(getvarc(&id1,&colterm));
data dataset1;
set dataset1;
format vterm $20.;
if match eq "Fuzzy" then output;
if index(strip(var1),strip("&vterm"))>0 or index(strip("&vterm"),strip(var1))>0 then do;
cnt=sum(cnt,1);
match="Fuzzy";
vterm = "&vterm";
output;
end;
else do;
cnt=sum(cnt,0);
output;
end;
run;
proc sort data=dataset1;
by var1 match vterm descending cnt;
proc sort data=dataset1 nodupkey;
by var1 match vterm;
run;
%end;
proc sql;
create table maxcnt as
select
var1,
match,
max(cnt) as cnt
from dataset1
group by 1,2
;
quit;
run;
proc transpose data=dataset1 out=dataset1(drop=FM_dataset2_0 _name_) prefix=FM_dataset2_;
by var1 match;
id cnt;
var vterm;
run;
data dataset1;
merge dataset1 maxcnt;
by var1 match;
run;
%let id2=%sysfunc(close(&id1)); /*closes out dataset2 in case you need it later */
精彩评论