I am creating a SAS table in which one of the fields has to holds a huge string.
Following is my table (TABLE name = MKPLOGS4):
OBS RNID DESCTEXT
--------- -----------
1 123 This is some text which is part of the record. I want this to appear
2 123 concatenated kinda like concat_group() from MYSQL but for some reason
3 123 SAS does not have such functionality. Now I am having trouble with
4 123 String concatenation.
5 124 Hi there old friend of mine, hope you are enjoying the weather
6 124 Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
and I have to get a Table similar to this (table name = MKPLOGSA):
OBS RNID DESCTEXT
--------- -----------
1 123 This is some text which is part of the record. I want this to appear concatenated kinda like concat_group() from MYSQL but for some reason SAS does not have such functionality. Now I am having trouble with String concatenation.
2 124 Hi, there old friend of mine, hope you are enjoying the weather Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
So, after trying unsuccessfully with SQL, I came up with the following SAS code (please note I am very new at SAS):
DATA MKPLOGSA (DROP = DTEMP DTEXT);
SET MKPLOGS4;
BY RNID;
RETAIN DTEXT;
IF FIRST.RNID THEN
DO;
DTEXT = DESCTEXT;
DELETE;
END;
ELSE IF LAST.RNID THEN
DO;
DTEMP = CATX(' ',DTEXT,DESCTEXT);
DESCTEXT = DTEMP;
END;
ELSE
DO;
DTEMP = CATX(' ',DTEXT,DESCTEXT);
DTEXT = DTEMP;
DELETE;
END;
The SAS log is producing this warning message:
WARNING: IN A CALL TO THE CATX FUNCTION, THE BUFFER ALLOCATED
FOR THE RESULT WAS NOT LONG ENOUGH TO CONTAIN THE CONCATENATION
OF ALL THE ARGUMENTS. THE CORRECT RESULT WOULD CONTAIN 229 CHARACTERS,
BUT THE ACTUAL RESULT MAY EITHER BE TRUNCATED TO 200 CHARACTER(S) OR
BE COMPLETELY BLANK, DEPENDING ON THE CALLING ENVIRONMENT. THE
FOLLOWING NOTE INDICATES THE LEFT-MOST ARGUMENT THAT CAUSED TRUNCATION.
Followed by the message (for the SAS data step I posted here):
NOTE: ARGUMENT 3 TO FUNCTION CATX AT LINE 100 COLUMN 15 IS INVALID.
Please note that in my sample data table (开发者_如何转开发MKPLOGS4), each line of string for the field DESCTEXT can be upto 116 characters and there is no limit as to how many lines of description text/recordID.
The output I am getting has only the last line of description:
OBS RNID DESCTEXT
---- --------
1 123 String concatenation.
2 124 Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
I have the following questions:
. is there something wrong with my code? . is there a limit to SAS string concatenation? Can I override this? If yes, please provide code.
If you have a suggestion, I would really appreciate if you can post your version of code. This is not school work/homework.
Since SAS stores character data as blank-padded fixed length strings, it is usually not a good idea to store a large amount of text in the dataset. However, if you must, then you can create a character type variable with a length of up to 32767 characters. If you don't mind doing some extra I/O, here is an easy way.
/* test data -- same id repeated over multiple observations i.e., in a "long-format" */
data one;
input rnid desctext & :$200.;
cards;
123 This is some text which is part of the record. I want this to appear
123 concatenated kinda like concat_group() from MYSQL but for some reason
123 SAS does not have such functionality. Now I am having trouble with
123 String concatenation.
124 Hi there old friend of mine, hope you are enjoying the weather
124 Are you sure this is not your jacket, okay then. Will give charity.
;
run;
/* re-shape from the long to the wide-format. assumes that data are sorted by rnid. */
proc transpose data=one out=two;
by rnid;
var desctext;
run;
/* concatenate col1, col2, ... vars into single desctext */
data two;
length rnid 8 desctext $1000;
keep rnid desctext;
set two;
desctext = catx(' ', of col:);
run;
The documentation for the catx function specifies that it will (by default) only return 200 characters unless you have already specified a length for the string you are storing the result to.
All you need to do is add either a length or an attrib statement somewhere in your datastep.
Here is how I would have coded it (untested):
data mkplogsa (rename=dtext=desctext);
length dtext $32767 ;
set mkplogs4;
by rnid;
retain dtext;
if first.rnid then do;
dtext = "";
end;
dtext = catx(' ',dtext,desctext);
if last.rnid then do;
output;
end;
keep dtext;
run;
Note that 32767 is the largest string size for a character value in a SAS dataset. If your string is larger than that you're out of luck.
Cheers Rob
Thanks guys, I was able to solve this problem by using PROC TRANSPOSE and then using concatenation. Here is the code:
/*
THIS TRANSPOSE STEP TAKES THE MKPLOGS4 TABLE AND
CREATES A NEW TEMPORARY TABLE CALLED MKPLOGSA. SINCE
THE DESCRIPTION TEXT IS STORED IN MULTIPLE LINES (OBSERVATIONS)
IN THE ITEXT FILE, IN ORDER TO COMBINE THEM TO A SINGLE ROW,
WE USE TRANSPOSE. HOWEVER, AFTER THIS STEP, THE DESCRIPTION TEXT
SPREAD OVER MULTIPLE LINES ALTHOUGH ON SAME ROW (OBSERVATION)
ARE STILL SEPARATED INTO MULTIPLE COLUMNS (ON THE SAME ROW)
ALL PREFIXED IN THIS CASE BY 'DESCTEXT'. WE DROP THE AUTO-CREATED
COLUMN _NAME_
*/
PROC TRANSPOSE DATA = MKPLOGS4 OUT = MKPLOGSA (DROP = _NAME_)
PREFIX = DESCTEXT;
VAR DESCTEXT;
BY PLOG;
RUN;
/*
THIS DATA STEP CREATES A NEW TABLE CALLED MKPLOGSB WHICH
TAKES ALL THE SEPARATED DESCRIPTION TEXT COLUMNS AND
CONCATENATES THEM INTO A SINGLE COLUMN - LONG_DESCRIPTION.
*/
DATA MKPLOGSB (DROP = DESCTEXT:);
SET MKPLOGSA;
/* CONCATENATED DESC. TEXT SET TO MAX. 27000 CHARS. */
LENGTH LONG_DESCRIPTION $27000;
LONG_DESCRIPTION = CATX(' ',OF DESCTEXT:);
RUN;
精彩评论