Go Back   Rhinocerus > Newsgroup > Newsgroup comp.soft-sys.sas

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 10-14-2005, 01:54 AM
Paul M. Dorfman
Guest
 
Posts: n/a
Default Re: sas Performance Enhancement

Tir,

I did not know the MEMLIB option existed on the LIBNAME statement! This shows
SAS-L is THE place to learn! And even the rather quick relevation that it works
only under Windows failed to substantially curtail my enthusiasm in the light of
your testing. Heck, maybe they will implement it under on more heavyweight
platforms any time soon. I wonder whether this SAS/Windows feature is similar in
performance to that of RAM disk - I have heard raving reviews from some good
folks having tried the latter.

On the different note brought up by David Cassell, methinks compression on
memory-based data sets may work counter to the purpose of improving performance,
although it clearly increases the amount of data one can make hover in memory.
Sorting performance can be amptly magnified by merely setting MEMSIZE to
something generous, for that seems to result, for what I have observed under AIX
lately, in SAS sort changing its bookkeeping strategy to keep more temporary
data in memory and less on disk.

The area where the advantages of MEMLIB I think could be most beneficial is
lookup into comparatively small (i.e. under a couple of gigs) indexed data sets.
This could greatly simplify a number of processes typical for modern ETL
implementations.

Kind regards
-------------------
Paul M. Dorfman
Jacksonville, FL
-------------------

> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Patnaik, Tirthankar
> Sent: Thursday, October 13, 2005 5:58 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: sas Performance Enhancement
>
> Seems like using the memory to reduce I/O is an important
> option we should think about on improving SAS performance.
> And SASFILE is one way to use the RAM more. IMHO, two other
> methods deserve mention in this regard too, viz.,
> memory-based libraries, and compression. Memory-based
> libraries offer the added advantage (to SASFILE) of allowing
> modification to files in memory. This is crucial, since a
> proc sort in place is ruled out with a SASFILE statement.
> With more than 1GB of RAM, one can easily assign the default
> workspace to the RAM, be dilligent about using it, and at the
> end, copy the final dataset to the HD with a proc copy move.
>
> Here's how one can create a memory-based library in Win2k:
>
> options nocenter msglevel=i fullstimer source source2 mprint
> compress=no;
> 382 /* Use the RAM as the default workspace */
> 383 libname inmem "C:\TEMP\tmemory" memlib;
> NOTE: Libref INMEM was successfully assigned as follows:
> Engine: V9
> Physical Name: C:\TEMP\tmemory
>
> I've tried to do some analysis to illustrate this point. The
> technical aspects of the analysis are best left to experts
> like Paul Dorfman--whatever little I know would not be
> sufficient to add anything significant.
>
> Create the test-file first. This file is about 300 MB uncompressed.
>
> 703 /* Sorting a large file. */
> 704
> 705 %let N = 1000000;
> 706 %let K = 50;
> 707
> 708 data _test_;
> 709 array norm norm1-norm&k.;
> 710 do i = 1 to &N.;
> 711 do j = 1 to &k.;
> 712 norm{j} = ceil(ranuni(12345)*100);
> 713 end;
> 714 output;
> 715 end;
> 716 drop i j;
> 717 run;
>
> NOTE: The data set WORK._TEST_ has 1000000 observations and
> 50 variables.
> NOTE: DATA statement used (Total process time):
> real time 1:18.97
> user cpu time 37.40 seconds
> system cpu time 2.37 seconds
> Memory 174k
>
>
> 718
> 719 /* Creating the same file in memory */
> 720
> 721
> 722 data inmem._test_;
> 723 array norm norm1-norm&k.;
> 724 do i = 1 to &N.;
> 725 do j = 1 to &k.;
> 726 norm{j} = ceil(ranuni(12345)*100);
> 727 end;
> 728 output;
> 729 end;
> 730 drop i j;
> 731 run;
>
> NOTE: The data set INMEM._TEST_ has 1000000 observations and
> 50 variables.
> NOTE: DATA statement used (Total process time):
> real time 39.11 seconds
> user cpu time 37.16 seconds
> system cpu time 0.42 seconds
> Memory 393487k
>
> Notice that the time taken for file creation in a
> memory-based library comes down drastically since there's
> little or no disk I/O. Paul, am I right here?
>
> Memory used goes up significantly, as expected.
>
> We then sort this file (two copies) in the default-workspace,
> and in the memory-based library:
>
> 734 /* Sorting the same file in the workspace. */
> 735 proc sort data=work._test_;
> 736 by norm1;
> 737 run;
>
> NOTE: There were 1000000 observations read from the data set
> WORK._TEST_.
> NOTE: SAS sort was used.
> NOTE: The data set WORK._TEST_ has 1000000 observations and
> 50 variables.
> NOTE: PROCEDURE SORT used (Total process time):
> real time 4:16.89
> user cpu time 5.94 seconds
> system cpu time 1:01.25
> Memory 66104k
>
>
> 738
> 739 /* Sorting in inmem. */
> 740 proc sort data=inmem._test_;
> 741 by norm1;
> 742 run;
>
> NOTE: There were 1000000 observations read from the data set
> INMEM._TEST_.
> NOTE: SAS sort was used.
> NOTE: The data set INMEM._TEST_ has 1000000 observations and
> 50 variables.
> NOTE: PROCEDURE SORT used (Total process time):
> real time 2:31.14
> user cpu time 7.03 seconds
> system cpu time 26.34 seconds
> Memory 459421k
>
> Sort-time has come down from 4 minutes to about 2.5.
>
> We then see if compressing can make a difference in these
> timings. Memory-based libraries also allow compression. Also,
> AFAIK, for numeric data, binary compression works better.
>
> 743
> 744
> 745 /* Trying the same with compress=binary */
> 746
> 747 data _test2_(compress=binary);
> 748 array norm norm1-norm&k.;
> 749 do i = 1 to &N.;
> 750 do j = 1 to &k.;
> 751 norm{j} = ceil(ranuni(12345)*100);
> 752 end;
> 753 output;
> 754 end;
> 755 drop i j;
> 756 run;
>
> NOTE: The data set WORK._TEST2_ has 1000000 observations and
> 50 variables.
> NOTE: Compressing data set WORK._TEST2_ decreased size by
> 54.29 percent.
> Compressed is 11429 pages; un-compressed would require
> 25001 pages.
> NOTE: DATA statement used (Total process time):
> real time 1:21.01
> user cpu time 57.36 seconds
> system cpu time 1.20 seconds
> Memory 175k
>
>
> 757
> 758 /* Creating the same file in memory */
> 759
> 760 /* Clean the RAM first. */
> 761 proc delete data=inmem._test_;
> 762 run;
>
> NOTE: Deleting INMEM._TEST_ (memtype=DATA).
> NOTE: PROCEDURE DELETE used (Total process time):
> real time 0.25 seconds
> user cpu time 0.03 seconds
> system cpu time 0.06 seconds
> Memory 13k
>
>
> 763
> 764
> 765 data inmem._test2_(compress=binary);
> 766 array norm norm1-norm&k.;
> 767 do i = 1 to &N.;
> 768 do j = 1 to &k.;
> 769 norm{j} = ceil(ranuni(12345)*100);
> 770 end;
> 771 output;
> 772 end;
> 773 drop i j;
> 774 run;
>
> NOTE: The data set INMEM._TEST2_ has 1000000 observations and
> 50 variables.
> NOTE: Compressing data set INMEM._TEST2_ decreased size by
> 54.29 percent.
> Compressed is 11429 pages; un-compressed would require
> 25001 pages.
> NOTE: DATA statement used (Total process time):
> real time 58.31 seconds
> user cpu time 57.45 seconds
> system cpu time 0.16 seconds
> Memory 180444k
>
> We see that the time for file-creation does not improve with
> file-compression. This is probably due to the extra
> CPU-cycles needed for the compression.
>
> We next see the effect on proc sort:
>
> 775
> 776
> 777 /* Sorting the same file in the workspace. */
> 778 proc sort data=work._test2_ out=work._test2_(compress=binary);
> 779 by norm1;
> 780 run;
>
> NOTE: There were 1000000 observations read from the data set
> WORK._TEST2_.
> NOTE: SAS sort was used.
> NOTE: The data set WORK._TEST2_ has 1000000 observations and
> 50 variables.
> NOTE: Compressing data set WORK._TEST2_ decreased size by
> 54.28 percent.
> Compressed is 11431 pages; un-compressed would require
> 25001 pages.
> NOTE: PROCEDURE SORT used (Total process time):
> real time 2:28.41
> user cpu time 33.37 seconds
> system cpu time 6.73 seconds
> Memory 66107k
>
> Sorting a compressed file on default workspace (HD-based) has
> brought down the sort-time from 4 minutes 16 seconds to about
> 2 minutes 28 seconds. Quite an improvement, and this is
> especially important for laptops (IMHO, the original machine
> was a laptop) that have pretty slow disk I/O. I have a
> thinkpad myself, and I realize how memory-based techniques
> can help you each day. They really work.
>
> 781
> 782 /* Sorting in inmem. */
> 783 proc sort data=inmem._test2_ out=inmem._test2_(compress=binary);
> 784 by norm1;
> 785 run;
>
> NOTE: There were 1000000 observations read from the data set
> INMEM._TEST2_.
> NOTE: SAS sort was used.
> NOTE: The data set INMEM._TEST2_ has 1000000 observations and
> 50 variables.
> NOTE: Compressing data set INMEM._TEST2_ decreased size by
> 54.28 percent.
> Compressed is 11431 pages; un-compressed would require
> 25001 pages.
> NOTE: PROCEDURE SORT used (Total process time):
> real time 1:21.44
> user cpu time 33.91 seconds
> system cpu time 3.49 seconds
> Memory 246378k
>
> And the sort-time comes down even further when do this in the
> RAM: From 2 min 31, we've come down to 1:21.
>
> So from 4:16 in the worst case, to about 1:21 in the best
> case, it's quite a change.
>
> Lastly, I'd say that one should keep all files in a
> compressed state. At the risk of diluting the flow of the
> mail, I'd like to put in a snippet of log I always use to
> illustrate the effects of compress=binary on essentially numeric data:
>
> ---------------------------------------------------
> 64 data ukctfnw.ita_rev
> 65 ukctfnw.ita_trn;
> 66 set ukctfnw.ita_tot;
> 67 if revflag = 1 then output ukctfnw.ita_rev; else
> output ukctfnw.ita_trn;
> 68 run;
>
> NOTE: There were 210000 observations read from the data set
> UKCTFNW.ITA_TOT.
> NOTE: The data set UKCTFNW.ITA_REV has 102483 observations
> and 929 variables.
> NOTE: The data set UKCTFNW.ITA_TRN has 107517 observations
> and 929 variables.
> NOTE: DATA statement used (Total process time):
> real time 5:37.31
> cpu time 32.32 seconds
>
>
> 69 data ukctfnw.ita_rev(compress=binary)
> 70 ukctfnw.ita_trn(compress=binary);
> 71 set ukctfnw.ita_tot;
> 72 if revflag = 1 then output ukctfnw.ita_rev; else
> output ukctfnw.ita_trn;
> 73 run;
>
> NOTE: There were 210000 observations read from the data set
> UKCTFNW.ITA_TOT.
> NOTE: The data set UKCTFNW.ITA_REV has 102483 observations
> and 929 variables.
> NOTE: Compressing data set UKCTFNW.ITA_REV decreased size by
> 73.96 percent.
> Compressed is 13346 pages; un-compressed would require
> 51249 pages.
> NOTE: The data set UKCTFNW.ITA_TRN has 107517 observations
> and 929 variables.
> NOTE: Compressing data set UKCTFNW.ITA_TRN decreased size by
> 82.79 percent.
> Compressed is 9251 pages; un-compressed would require
> 53766 pages.
> NOTE: DATA statement used (Total process time):
> real time 1:51.05
> cpu time 1:00.29
> ---------------------------------------------------
>
>
>
> HTH and best,
> -Tir
>
>
> Tirthankar Patnaik
> India Analytics Center
> Citibank, N. A.
> +91-44-5228 6385
> +91-98410 69545
>
>
>
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
> > David L Cassell
> > Sent: Thursday, October 13, 2005 10:21 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Re: sas Performance Enhancement
> >
> >
> > paul.dorfman@FCSO.COM expertly replied:
> > >Since its inception, SASFILE has been benchmarked at least

> > by one person I
> > >know of. Originally (and understandably) I was highly

> enthused when
> > >SASFILE appeared in V8, but minimal testing quickly showed

> that, with
> > >notable exceptions (see below) it has been a sort of

> disappointment,
> > >mainly because, following the common wisdom of "memory is

> > 100 faster than
> > >disk", I had anticipated much more sizeable performance

> improvements.
> > >
> > >[really useful documentation of times elided by a lazy slob]
> > >
> > >Not to say that the prebuffered file yields no improvement

> in reading
> > >performance, but in real time, it is definitely not 3 or

> > even 2 orders of
> > >magnitude. Heck, not even 1.
> > >
> > >The most sizeable improvement observed in the speed of

> > direct reads is
> > >undoubtedly owing to the fact that with SASFILE, all file

> pages are
> > >already preloaded to the buffer. Without SASFILE, if an

> > observation is
> > >requested and it is not in the currently buffered page, it must be
> > >unbuffered, and the page containing the observation must be

> > buffered in,
> > >while when the file is prebuffered, this obviously is not

> > necessary and
> > >does not happen. The observed performance differences are

> > mainly owing to
> > >the fact that same pages get buffered repeatedly. The

> proof is in the
> > >sequential read, where each page is buffered only once,

> and hence the
> > >overall speed difference is negligible. By the same token,

> > if pages are
> > >read in order, for instance, as
> > >
> > > do key = 1 to n by 10 ;
> > > set halfgig key = key nobs = n ;
> > > end ;
> > > do ptr = 1 to n by 10 ;
> > > set halfgig point = ptr nobs = n ;
> > > end ;
> > >
> > >the SASFILE performance improvements related to the

> > indexed/random reads
> > >dwindles to the almost why-bother level.

> >
> > Dale McLerran and I have found (Dale deserves most of the

> credit here)
> > that SASFILE makes a big difference in time when using PROC
> > SURVEYSELECT to do bootstrapping. That is:
> >
> >
> > sasfile targetpopulationfile open;
> >
> > proc surveyselect data=targetpopulationfile out=bootstrapfile
> > method=urs
> > seed=4954734
> > outhits
> > reps=1000;
> > run;
> >
> >
> > It appears that the proc doesn't cache the data set beforehand, so
> > this process saves a lot of I/O time.
> >
> > So I think that the use of SASFILE needs to be restrained.
> > It doesn't solve
> > all problems, and it can cause headaches if the file is too

> big. But
> > some cases, like your last example and the above, show that

> there can
> > be some merit in its use.
> >
> > David
> > --
> > David L. Cassell
> > mathematical statistician
> > Design Pathways
> > 3115 NW Norwood Pl.
> > Corvallis OR 97330
> >
> > __________________________________________________ _______________
> > Express yourself instantly with MSN Messenger! Download today
> > - it's FREE!
> > http://messenger.msn.click-url.com/g...ave/direct/01/
> >

>

Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Re: sas Performance Enhancement Richard A. DeVenezia Newsgroup comp.soft-sys.sas 0 10-15-2005 01:51 PM
Re: sas Performance Enhancement Patnaik, Tirthankar Newsgroup comp.soft-sys.sas 0 10-14-2005 12:56 PM
Re: sas Performance Enhancement ben.powell@CLA.CO.UK Newsgroup comp.soft-sys.sas 0 10-12-2005 08:49 AM
sas Performance Enhancement docsms@gmail.com Newsgroup comp.soft-sys.sas 0 10-11-2005 09:45 PM
Re: Performance Statistics Under Windows v9.12 Michael Raithel Newsgroup comp.soft-sys.sas 0 12-17-2004 02:11 PM



All times are GMT. The time now is 12:56 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.