|
|||
|
Eason,
Since nobody has responded, I presume that everyone is thinking that you simply ought to ask for a better dataset. I agree. However, if your data has NO missing values, and ALWAYS follows the pattern shown in your example, then the following just might work. The code is definitely NOT guaranteed and, without question, is far from optimal. Hopefully, though, it will give you an idea of how to solve your problem: data _null_; file 'c:\have.txt'; input; put @1 _infile_; cards; 1|2|3|4 4 4 |5|6 6 2|2|3|4 4 4|5|6 6 3|2|3|4 4 4|5|6 6 ; data _null_; array hold(6) $; array lengths(6) (1 1 1 5 1 3); infile 'c:\have.txt' missover; file 'c:\have_modified.txt'; i=0; j=0; do until (i eq 6); i+1; j+1; if scan(_infile_,1,'|') eq '' or scan(_infile_,j,'|') eq '' then do; input; j=1; end; hold(i)=catt(hold(i),scan(_infile_,j,'|')); x=length(hold(i)); if length(hold(i)) lt lengths(i) then do; i=i-1; j=0; _infile_=''; end; end; do i=1 to 6; put hold(i) @; call missing(hold(i)); if i lt 6 then put '|' @; else put; end; _infile_=''; i=0; j=0; run; HTH, Art -------- On Mon, 22 Jun 2009 21:11:48 -0700, Eason Chu <Shen-Jian.Zhu@SC.COM> wrote: >Hi, all SAS-Ls > >I have made a Macro to manupilate _infile_ variable from input buffer, >which intends to read in a raw data that is sperated by a specific >delimiter but a record of it may be broken into lines due to line feed >or carriage return characters contained in a field value. >Data sample as below: > > 1|2|3|4 4 > 4 > |5|6 6 > 2|2|3|4 4 4|5|6 > 6 > 3|2|3|4 4 4|5|6 6 > >It should appear like below in table, > > 1|2|3|4 4(LF or CR) 4(LF or CR)|5|6 6 > 2|2|3|4 4 4|5|6(LF or CR) 6 > 3|2|3|4 4 4|5|6 6 > >but the LF or CR cause the raw data broken into lines. >The Macro I made below is to solve this situation. > > %Macro BLRDR(dlm,dlm_n,span_cut); > format tmp_infile_line $32767.; > informat tmp_infile_line $32767.; > retain tmp_infile_line; > input @; > do while (&dlm_n - count(trimn >(tmp_infile_line),"&dlm") >= &span_cut.); > tmp_infile_line = trimn >(tmp_infile_line)||_infile_; > input; > input @; > if &dlm_n - count(trimn >(tmp_infile_line)||_infile_,"&dlm") < &span_cut. then do; > input @@; > _infile_ = tmp_infile_line; > tmp_infile_line = ""; > end; > end; > *drop tmp_infile_line; > %mend; > >&dlm: specify the delimiter; >&dlm_n: indicate the delimiter number in one complete record >&span_cut: set a broken line without &dlm (like " 6" between line4 and >line6 in the data sample) as partial value of last field of last >record or first field of next record. 0 as last field of last record;1 >as belonging to first field of next record. For example, when >&span_cut = o then line4-6 records will be read as > 2|2|3|4 4 4|5|6 6 > 3|2|3|4 4 4|5|6 6 >when &span_cut = 1 then line4-6 records will be read as > 2|2|3|4 4 4|5|6 > 63|2|3|4 4 4|5|6 6 > >If it worked as what I expect, many of my broken raw data would be >read in correctly. However the result run out does not appear like >that. >I put this Macro into raw data reading in code. > > Data rst.test; > infile "C:\My SAS\SD.txt" dlm="|" dsd; > length n m l o p q $10.; > %BLRDR(|,5,1); > input n m l o p q; > put tmp_infile_line= n= m= l= o= p= q=; > Run; > >Logs as below after run, > >61 Data rst.test; >62 infile "C:\My SAS\SD.txt" dlm="|" dsd; >63 length n m l o p q $10.; >64 %BLRDR(|,5,1); >MLOGIC(BLRDR): Beginning execution. >MLOGIC(BLRDR): Parameter DLM has value | >MLOGIC(BLRDR): Parameter DLM_N has value 5 >MLOGIC(BLRDR): Parameter SPAN_CUT has value 1 >MPRINT(BLRDR): format tmp_infile_line $32767.; >MPRINT(BLRDR): informat tmp_infile_line $32767.; >MPRINT(BLRDR): retain tmp_infile_line; >MPRINT(BLRDR): input @; >SYMBOLGEN: Macro variable DLM_N resolves to 5 >SYMBOLGEN: Macro variable DLM resolves to | >SYMBOLGEN: Macro variable SPAN_CUT resolves to 1 >MPRINT(BLRDR): do while (5 - count(trimn(tmp_infile_line),"|") >= >1); >MPRINT(BLRDR): tmp_infile_line = trimn(tmp_infile_line)||_infile_; >MPRINT(BLRDR): input; >MPRINT(BLRDR): input @; >SYMBOLGEN: Macro variable DLM_N resolves to 5 >SYMBOLGEN: Macro variable DLM resolves to | >SYMBOLGEN: Macro variable SPAN_CUT resolves to 1 >MPRINT(BLRDR): if 5 - count(trimn(tmp_infile_line)||_infile_,"|") < >1 then do; >MPRINT(BLRDR): input @@; >MPRINT(BLRDR): _infile_ = tmp_infile_line; >MPRINT(BLRDR): tmp_infile_line = ""; >MPRINT(BLRDR): end; >MPRINT(BLRDR): end; >MPRINT(BLRDR): *drop tmp_infile_line; >MLOGIC(BLRDR): Ending execution. >66 input n m l o p q; >69 put _infile_ tmp_infile_line= n= m= l= o= p= q=; >70 Run; > >NOTE: The infile "C:\My SAS\SD.txt" is: > File Name=C:\My SAS\SD.txt, > RECFM=V,LRECL=256 > >NOTE: 6 records were read from the infile "C:\My SAS\SD.txt". > The minimum record length was 2. > The maximum record length was 17. >NOTE: The data set RST.TEST has 0 observations and 7 variables. > > >NO obs was read in! At first I guess that _infile_ turned missing just >before input statement. So I inserted some put statemnt into this >Macro and data steps. Codes inserted as below, > > %Macro BLRDR(dlm,dlm_n,span_cut); > format tmp_infile_line $32767.; > informat tmp_infile_line $32767.; > retain tmp_infile_line; > input @; > put _infile_; > do while (&dlm_n - count(trimn >(tmp_infile_line),"&dlm") >= &span_cut.); > tmp_infile_line = trimn >(tmp_infile_line)||_infile_; > input; > input @; > put _infile_; > if &dlm_n - count(trimn >(tmp_infile_line)||_infile_,"&dlm") < &span_cut. then do; > input @@; > put _infile_; > _infile_ = tmp_infile_line; > put _infile_; > tmp_infile_line = ""; > end; > end; > *drop tmp_infile_line; > %mend; > > Data rst.test; > infile "C:\My SAS\SD.txt" dlm="|" dsd; > length n m l o p q $10.; > %BLRDR(|,5,1); > put "######"; > input n m l o p q; > put "######"; > put _infile_; > put _infile_ tmp_infile_line= n= m= l= o= p= q=; > Run; > >And here the logs, > >61 Data rst.test; >62 infile "C:\My SAS\SD.txt" dlm="|" dsd; >63 length n m l o p q $10.; >64 %BLRDR(|,5,1); >MLOGIC(BLRDR): Beginning execution. >MLOGIC(BLRDR): Parameter DLM has value | >MLOGIC(BLRDR): Parameter DLM_N has value 5 >MLOGIC(BLRDR): Parameter SPAN_CUT has value 1 >MPRINT(BLRDR): format tmp_infile_line $32767.; >MPRINT(BLRDR): informat tmp_infile_line $32767.; >MPRINT(BLRDR): retain tmp_infile_line; >MPRINT(BLRDR): input @; >MPRINT(BLRDR): put _infile_; >SYMBOLGEN: Macro variable DLM_N resolves to 5 >SYMBOLGEN: Macro variable DLM resolves to | >SYMBOLGEN: Macro variable SPAN_CUT resolves to 1 >MPRINT(BLRDR): do while (5 - count(trimn(tmp_infile_line),"|") >= >1); >MPRINT(BLRDR): tmp_infile_line = trimn(tmp_infile_line)||_infile_; >MPRINT(BLRDR): input; >MPRINT(BLRDR): input @; >MPRINT(BLRDR): put _infile_; >SYMBOLGEN: Macro variable DLM_N resolves to 5 >SYMBOLGEN: Macro variable DLM resolves to | >SYMBOLGEN: Macro variable SPAN_CUT resolves to 1 >MPRINT(BLRDR): if 5 - count(trimn(tmp_infile_line)||_infile_,"|") < >1 then do; >MPRINT(BLRDR): input @@; >MPRINT(BLRDR): put _infile_; >MPRINT(BLRDR): _infile_ = tmp_infile_line; >MPRINT(BLRDR): put _infile_; >MPRINT(BLRDR): tmp_infile_line = ""; >MPRINT(BLRDR): end; >MPRINT(BLRDR): end; >MPRINT(BLRDR): *drop tmp_infile_line; >MLOGIC(BLRDR): Ending execution. >65 put "######"; >66 input n m l o p q; >67 put "######"; >68 put _infile_; >69 put _infile_ tmp_infile_line= n= m= l= o= p= q=; >70 Run; > >NOTE: The infile "C:\My SAS\SD.txt" is: > File Name=C:\My SAS\SD.txt, > RECFM=V,LRECL=256 > >1|2|3|4 4 > 4 >|5|6 6 >|5|6 6 >1|2|3|4 4 >4 > > >2|2|3|4 4 4|5|6 >2|2|3|4 4 4|5|6 >1|2|3|4 4 >4 > > > 6 >3|2|3|4 4 4|5|6 6 >3|2|3|4 4 4|5|6 6 >1|2|3|4 4 4 >6 >NOTE: 6 records were read from the infile "C:\My SAS\SD.txt". > The minimum record length was 2. > The maximum record length was 17. >NOTE: The data set RST.TEST has 0 observations and 7 variables. > > > >We can see that no put into log after the Macro execution, which seems >the statements after the Macro doesn't work and no error msg here. It >confused me a lot. Is there anyone knowing why? |
|
|
||||
|
||||
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|