Go Back   Rhinocerus > Newsgroup > Newsgroup comp.soft-sys.sas

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 01-06-2005, 11:51 PM
Sigurd Hermansen
Guest
 
Posts: n/a
Default Re: Big Table / Little Table Update

Steve:
I've been on the sidelines working on pressing tasks. Ordinarily I would
have jumped in an suggested a SQL solution....

Not this time.... I would go with a hash index, or even a bit array if you
have enough free memory to hold the ID's of the patients in each update:

- put the ID's for an update in a hash index / bit array as prescribed in
the Dorfman Letters
to SAS-L (using a temporary array in a Data step);
- UPDATE the flag in the master file IF the ID in a row gets a hit on the
index/array.
- append the update dataset to the master dataset.

It should take less than a minute to build the index. Scanning a 42M row
dataset should take no more than a few minutes elapsed time on a reasonably
quick machine. The index/array search should take a negligible amount of
time. The whole process should scale up linearly.

I don't particularly like the design of your database. I prefer to see
separate tables for patient data and for events related to patients. As the
database stands, the flag will make it easy to select the last update to the
database for a patient.
Sig
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of James,
Steve
Sent: Tuesday, January 04, 2005 3:10 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Big Table / Little Table Update


Mike,

The master data set has about 42 million records in it. I don't know how
representative the numbers from this run are since it's just test code and
the transaction table is actually created from the master (see log below).

150 records per patient sounds high to me. As I mentioned before I don't
know the data that well. I'll have to find out more. I'm trying to run the
program again and make some changes to bufsize and ibufsize that Michael
Raithel suggested. It'll be interesting to see if it makes a difference.

Steve



8 data Master (index=(key)) Trans ;
9 set j.vaDetailsPre04;
10 if datepart(storedate) ge mdy(12,31,2004) then output Trans
;
11 else output Master ;
12 run;

NOTE: There were 42179057 observations read from the data set
J.VADETAILSPRE04.
NOTE: The data set WORK.MASTER has 42156526 observations and 29 variables.
NOTE: The data set WORK.TRANS has 22531 observations and 29 variables.
NOTE: DATA statement used (Total process time):
real time 1:06:05.36
cpu time 17:24.85




-----Original Message-----
From: Mike Rhoads [mailto:RHOADSM1@WESTAT.com]
Sent: Monday, January 03, 2005 5:39 PM
To: James, Steve; SAS-L@LISTSERV.UGA.EDU
Subject: RE: Big Table / Little Table Update

Steve,

Glad you're making (some) progress! A few thoughts / suggestions /
questions follow.

While you're still testing things out, I might use a separate PROC SQL step
for each PROC SQL statement. The extra overhead should be minimal, and that
would give you a better idea of how each statement is performing time-wise.
You can remove the extra PROC SQLs when / if you put the code into
production.

If I'm understanding the log, you have 22531 transactions for 5323 patients.
Your final DATA step reports that 780,472 records were updated, which
implies that your master file averages around 150 records per patient --
does that sound correct?

Do you know about how many patients, and records, are in your master? It
would be interesting to know approximately what percentage get updated each
day.

Mike Rhoads
Westat
RhoadsM1@Westat.com

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of James,
Steve
Sent: Monday, January 03, 2005 5:00 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Big Table / Little Table Update


Thanks for the helpful tips last week about my problem concerning updating a
big table with duplicate key values from a much smaller transaction table.
Several people gave me some good advice as well as code. I'm testing a
solution from Mike Rhoads (see log below) and I was able to get the CPU time
down to under 4 minutes (from 42 minutes). However the clock time remains
quite long. For the run today it was over 2 hours just for the update.
When I ran it last week (after most people had left for the holiday), it ran
in about 45 minutes of clock time with about the same CPU time as before. I
can't understand why there's such a discrepancy: 4 minutes vs 2 hours? Is
it I/O?



How can I convince my co-worker that while his solution takes 42 minutes of
CPU time but roughly the same amount of real time that he should switch
methods to the one taking less CPU time?



We're running on a Sunfire 6800 w/ 8 processors running SAS 9.1 under
Solaris 64 bit.





Steve James

Centers for Disease Control and Prevention

sjames@cdc.gov





21
/*----------------------------------------------------------------------
---

22 Append Trans data to Master.

23
------------------------------------------------------------------------
-*/

24 proc sql;

25 insert into Master select * from Trans

26 ;

NOTE: 22531 rows were inserted into WORK.MASTER.



27

28
/*----------------------------------------------------------------------
---

29 Create a table containing the Keys that are in Master that
also are in

30 Trans. Since you have already appended Trans to Master,
there should be

31 no IDs in Trans that are not in Master.

32
------------------------------------------------------------------------
-*/

33 create table KeysInTrans as

34 select key, max(storedate) as date

35 from Trans

36 group by key

37 ;

NOTE: Table WORK.KEYSINTRANS created, with 5323 rows and 2 columns.



38

39
/*----------------------------------------------------------------------
--

40 Get the most recent visit date for each Key.

41
------------------------------------------------------------------------
--*/

42 CREATE TABLE MostRecentVisit AS

43 SELECT T.Key, MAX(StoreDate) AS Date

44 FROM KeysInTrans AS T, Master as M

45 where T.Key = M.Key

46 GROUP BY T.Key ;

NOTE: Table WORK.MOSTRECENTVISIT created, with 5323 rows and 2 columns.



47

48 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 1:39:35.63

cpu time 4:29.72





49

50
/*----------------------------------------------------------------------
---

51 Now read in record in Trans and update the corresponding
records in Master.

52
------------------------------------------------------------------------
-*/




3 The SAS
System 11:10 Monday, January 3, 2005



53

54 data Master ;

55 set MostRecentVisit end=lastob ;

56 do until (_iorc_ = %sysrc(_dsenom));

57 modify Master key=key ;

58 select (_iorc_);

59 when (%sysrc(_sok)) do;

60 if storedate eq date

61 then cdc_avrflag=1 ;

62 else cdc_avrflag=0 ;

63 replace Master ;

64 records_updated + 1 ;

65 end; /* end of where sysrc=_sok */

66 when (%sysrc(_dsenom)) do;

67 _error_ = 0;

68 end ;

69 otherwise;

70 end; /* end of select (_iorc_) statement */

71 end; /* end of do until () statement */

72 if lastob then put 'the number of records updated were: '
records_updated comma9. ;

73 run;



the number of records updated were: 780,472

NOTE: There were 5323 observations read from the data set
WORK.MOSTRECENTVISIT.

NOTE: The data set WORK.MASTER has been updated. There were 780472
observations rewritten, 0 observations added and 0 observations

deleted.

NOTE: DATA statement used (Total process time):

real time 2:21:58.26

cpu time 3:52.51
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Update a table on itself Sigurd Hermansen Newsgroup comp.soft-sys.sas 0 12-11-2006 02:10 PM
Re: Big Table / Little Table Update Chang Y. Chung Newsgroup comp.soft-sys.sas 0 12-29-2004 04:13 AM
Re: Big Table / Little Table Update nospam@HOWLES.COM (Howard Schreier Newsgroup comp.soft-sys.sas 0 12-29-2004 01:38 AM
Re: Big Table / Little Table Update James, Steve Newsgroup comp.soft-sys.sas 0 12-28-2004 10:15 PM
Re: Big Table / Little Table Update Mike Rhoads Newsgroup comp.soft-sys.sas 0 12-28-2004 07:40 PM



All times are GMT. The time now is 05:34 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.