|
|||
|
Sig,
Interesting idea. I took it a bit further going on to code: data out_hv / view = out_hv ; dcl hash q (hashexp:16, ordered:'y') ; dcl hiter qi ('q') ; q.definekey ('id', 'yr') ; q.definedata ('id', 'yr', 'date') ; q.definedone () ; do until ( eof ) ; set have end = eof ; yr = year (date) ; q.replace () ; end ; do _iorc_ = qi.next() by 0 while ( _iorc_ = 0 ) ; output ; _iorc_ = qi.next() ; end ; stop ; run ; proc sql _method ; create table out as select t1.* from have (sortedby = id date) as t1 inner join out_hv (sortedby = id yr ) as t2 on t1.ID = t2.ID and t1.date = t2.date ; quit ; This compares OK within the margin of SAS log stats' time measurement error with the writing of OUT_H using the .OUTPUT() method and consuming OUT_H by SQL. Though the Data step alone with .OUTPUT() get the job done, it is nice to know that there is no problem generating a Data step view even if the Data step contains a hash object. Kind regards ---------------- Paul M. Dorfman Jacksonville, FL ---------------- > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Sigurd Hermansen > Sent: Wednesday, July 13, 2005 6:00 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Views and passes (was RE: Output last record of > fantom by group) > > Roy (and others entangled in this thread): > So far as I know a view reads and writes data just like a > Data step or query, but only after being invoked by another > Data step or query. I've expanded Paul's examples to > illustrate the time required for each process (in real time seconds): > > Data step 1 Data step 2 SQL Query > sort 8:14 8:14 > hash order 12.50 > select 21.40 13.53 8.00 > ------- -------- ------- > 29.54 21.67 20.50 > > I've fattened up the rows of data by adding a long text > variable to each row. In practical situations, we tend to > select more than key values. > > A sort orders the dataset prior to the Data steps that rely > on ordering properties. A slightly modified hash routine > selects and orders a list of ID,date pairs that the SQL query > uses to select required records. In general the more rows, > the faster the hash routine will work relative to a physical > sort. The Data step that groups by date clearly takes > substantially longer than either the Data step or SQL query > that groups by year. > > I would expect to see some variation in the difference > between the Data step 2 and SQL query times. Taking time > required for data ordering into account, a Data step > processing a Data step view and the SQL query work equally fast. > > For this type of data subsetting problem, the idea of using a > hash selector to select a subset of key values, then using > the subset of key values to select rows from a dataset, has > interesting implications for SAS SQL query optimization. I > wonder about hashing pointers to observations in SAS datasets .... > > I've posted relevant sections of the log below: > > 294 data have ; > 295 do date = -1e4 to 1e4 by 1 ; > 296 do id = 1 to 1e2 ; > 297 stuff=repeat('X',999); > 298 output ; > 299 end ; > 300 end ; > 301 run ; > > INFO: Character variables have defaulted to a length of 200 > at the places given by: > (Line) Column). Truncation may result.> 297:12 stuff > NOTE: The data set WORK.HAVE has 2000100 observations and 3 variables. > NOTE: DATA statement used (Total process time): > real time 54.25 seconds > cpu time 13.67 seconds > > > 302 proc sort data=have; > 303 by id date; > 304 run; > > NOTE: There were 2000100 observations read from the data set > WORK.HAVE. > NOTE: SAS sort was used. > NOTE: The data set WORK.HAVE has 2000100 observations and 3 variables. > NOTE: PROCEDURE SORT used (Total process time): > real time 8:14.01 > cpu time 46.65 seconds > > > 305 data out_g ; > 306 format date year4. ; > 307 set have ; > 308 by id date groupformat ; > 309 if first.date ; > 310 run ; > > NOTE: There were 2000100 observations read from the data set > WORK.HAVE. > NOTE: The data set WORK.OUT_G has 5600 observations and 3 variables. > NOTE: DATA statement used (Total process time): > real time 21.40 seconds > cpu time 5.73 seconds > > > 311 > 312 data _null_ ; > 313 dcl hash q (hashexp:16, ordered:'y') ; > 314 q.definekey ('id', 'yr') ; > 315 q.definedata ('id', 'yr', 'date') ; > 316 q.definedone () ; > 317 do until ( eof ) ; > 318 set have end = eof ; > 319 yr = year (date) ; > 320 q.replace () ; > 321 end ; > 322 q.output (dataset: 'out_h') ; > 323 stop ; > 324 run ; > > NOTE: The data set WORK.OUT_H has 5600 observations and 3 variables. > NOTE: There were 2000100 observations read from the data set > WORK.HAVE. > NOTE: DATA statement used (Total process time): > real time 12.50 seconds > cpu time 5.76 seconds > > > 325 proc print data=out_h (obs=10); > 326 run; > > NOTE: There were 10 observations read from the data set WORK.OUT_H. > NOTE: PROCEDURE PRINT used (Total process time): > real time 0.28 seconds > cpu time 0.03 seconds > > > 327 > 328 proc sql ; > 329 create view with_year as > 330 select id > 331 , year (date) as yr > 332 from have > 333 quit ; > NOTE: SQL view WORK.WITH_YEAR has been defined. > 334 > NOTE: PROCEDURE SQL used (Total process time): > real time 0.34 seconds > cpu time 0.00 seconds > > > 335 data out_v ; > 336 set with_year ; > 337 by id yr ; > 338 if last.yr ; > 339 run ; > > NOTE: There were 2000100 observations read from the data set > WORK.HAVE. > NOTE: There were 2000100 observations read from the data set > WORK.WITH_YEAR. > NOTE: The data set WORK.OUT_V has 5600 observations and 2 variables. > NOTE: DATA statement used (Total process time): > real time 13.53 seconds > cpu time 8.43 seconds > > > 340 > 341 proc sql ; > 342 create table out as > 343 select t1.* > 344 from have as t1 inner join out_h as t2 > 345 on t1.ID=t2.ID and t1.date=t2.date > 346 ; > NOTE: Table WORK.OUT created, with 5600 rows and 3 columns. > > 347 quit ; > NOTE: PROCEDURE SQL used (Total process time): > real time 8.00 seconds > cpu time 3.43 seconds > > > > -----Original Message----- > From: owner-sas-l@listserv.uga.edu > [mailto wner-sas-l@listserv.uga.edu]> On Behalf Of Pardee, Roy > Sent: Tuesday, July 12, 2005 2:20 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Views and passes (was RE: Output last record of fantom by > group) > > > So a dstep view doesn't get 'realized' when it's used, so you > really do save a pass through the data? > > My expectation would be that a SQL view *would* get realized, > so the savings would be illusory. Anybody know for sure? > <snip> > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Dale McLerran > Sent: Tuesday, July 12, 2005 11:09 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Output last record of fantom by group > > > Michael, > > A data step view is perfect for this sort of problem. You > describe year as a fantom (sic) variable. Well, a view > creates phantom > (virtual) data. > > > /* Add year to view. Note that view does not execute until it */ > /* is invoked in the next data step. The view is virtual data */ > data addyear / view=addyear; > set have; > year = year(date); > run; > > > /* Set view by permno and year. Year is created as the data */ > /* in have are read by the view but before the data are made */ > /* available to the PDV of the current data step. Thus, we */ > /* can set the data by year. */ > data want; > set addyear; > by permno year; > if last.year; > run; > > > Dale > > > --- Michael Murff <mjm33@MSM1.BYU.EDU> wrote: > > > Hi SAS-L, > > > > > > > > I want the last record of each id/year as shown below; year > is a sort > > of fantom by-group since it only exists within another > variable, date. > > > The dataset "have" is 6 gigs+ so I would prefer to only > pass through > > the data once. The dataset "have" is sorted by permno and date. > > Obviously, I could create a year variable with one pass and > then, in a > > subsequent step, use a subsetting if statement to > conditionally output > > last.year, but given the size of have this is undesirable. > Any ideas? > > > > > > > > TIA, > > > > > > > > Michael Murff > > > > Provo, UT > > > > > > > > > > > > > > > > > > > > data have; > > > > input id $ date date9. type $; > > > > format date date9.; > > > > cards; > > > > a 01jan2001 F > > > > a 02jan2001 M > > > > a 03jan2001 L > > > > a 01jan2002 F > > > > a 02jan2002 L > > > > b 01jan2001 F > > > > b 02jan2001 M > > > > b 03jan2001 L > > > > b 01jan2002 F > > > > b 02jan2002 L > > > > b 03jan2003 F > > > > b 04jan2003 M > > > > b 05jan2003 M > > > > b 06jan2003 M > > > > b 07jan2003 L > > > > ; > > > > run; > > > > > > > > > > > > data want; > > > > input id $ date date9. type $; > > > > format date date9.; > > > > cards; > > > > a 03jan2001 L > > > > a 02jan2002 L > > > > b 03jan2001 L > > > > b 02jan2002 L > > > > b 07jan2003 L > > > > ; > > > > run; > > > > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra@NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > --------------------------------------- > > > > __________________________________________________ __ > Sell on Yahoo! Auctions - no fees. Bid on great items. > yahoo.com/" target="_blank">http://auctions.yahoo.com/ > > -----Original Message----- > From: owner-sas-l@listserv.uga.edu > [mailto wner-sas-l@listserv.uga.edu]> On Behalf Of Pardee, Roy > Sent: Tuesday, July 12, 2005 2:20 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Views and passes (was RE: Output last record of fantom by > group) > > > So a dstep view doesn't get 'realized' when it's used, so you > really do save a pass through the data? > > My expectation would be that a SQL view *would* get realized, > so the savings would be illusory. Anybody know for sure? > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Dale McLerran > Sent: Tuesday, July 12, 2005 11:09 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Output last record of fantom by group > > > Michael, > > A data step view is perfect for this sort of problem. You > describe year as a fantom (sic) variable. Well, a view > creates phantom > (virtual) data. > > > /* Add year to view. Note that view does not execute until it */ > /* is invoked in the next data step. The view is virtual data */ > data addyear / view=addyear; > set have; > year = year(date); > run; > > > /* Set view by permno and year. Year is created as the data */ > /* in have are read by the view but before the data are made */ > /* available to the PDV of the current data step. Thus, we */ > /* can set the data by year. */ > data want; > set addyear; > by permno year; > if last.year; > run; > > > Dale > > > --- Michael Murff <mjm33@MSM1.BYU.EDU> wrote: > > > Hi SAS-L, > > > > > > > > I want the last record of each id/year as shown below; year > is a sort > > of fantom by-group since it only exists within another > variable, date. > > > The dataset "have" is 6 gigs+ so I would prefer to only > pass through > > the data once. The dataset "have" is sorted by permno and date. > > Obviously, I could create a year variable with one pass and > then, in a > > subsequent step, use a subsetting if statement to > conditionally output > > last.year, but given the size of have this is undesirable. > Any ideas? > > > > > > > > TIA, > > > > > > > > Michael Murff > > > > Provo, UT > > > > > > > > > > > > > > > > > > > > data have; > > > > input id $ date date9. type $; > > > > format date date9.; > > > > cards; > > > > a 01jan2001 F > > > > a 02jan2001 M > > > > a 03jan2001 L > > > > a 01jan2002 F > > > > a 02jan2002 L > > > > b 01jan2001 F > > > > b 02jan2001 M > > > > b 03jan2001 L > > > > b 01jan2002 F > > > > b 02jan2002 L > > > > b 03jan2003 F > > > > b 04jan2003 M > > > > b 05jan2003 M > > > > b 06jan2003 M > > > > b 07jan2003 L > > > > ; > > > > run; > > > > > > > > > > > > data want; > > > > input id $ date date9. type $; > > > > format date date9.; > > > > cards; > > > > a 03jan2001 L > > > > a 02jan2002 L > > > > b 03jan2001 L > > > > b 02jan2002 L > > > > b 07jan2003 L > > > > ; > > > > run; > > > > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra@NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > --------------------------------------- > > > > __________________________________________________ __ > Sell on Yahoo! Auctions - no fees. Bid on great items. > yahoo.com/" target="_blank">http://auctions.yahoo.com/ > |
|
|
||||
|
||||
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: Views and passes (was RE: Output last record of fantom by | Jack Hamilton | Newsgroup comp.soft-sys.sas | 0 | 07-15-2005 05:43 AM |
| Re: Views and passes (was RE: Output last record of fantom by | Sigurd Hermansen | Newsgroup comp.soft-sys.sas | 0 | 07-13-2005 10:00 PM |
| Re: Views and passes (was RE: Output last record of fantom by | nospam@HOWLES.COM (Howard Schreier | Newsgroup comp.soft-sys.sas | 0 | 07-12-2005 11:10 PM |
| Re: Views and passes (was RE: Output last record of fantom by | Dale McLerran | Newsgroup comp.soft-sys.sas | 0 | 07-12-2005 10:20 PM |
| Views and passes (was RE: Output last record of fantom by group) | Pardee, Roy | Newsgroup comp.soft-sys.sas | 0 | 07-12-2005 06:19 PM |