Go Back   Rhinocerus > Newsgroup > Newsgroup comp.soft-sys.sas

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 08-14-2007, 09:10 PM
Sigurd Hermansen
Guest
 
Posts: n/a
Default Re: A better JOIN method

David et al:
I've speculated on several occasions about the inability of the SAS SQL
compiler to implement a left join using a hash index. Ken's demonstrated
in this thread a method that I've used repeatedly in one version or
another to save considerable amounts of time.

A SQL 'optimizer' that constructs an execution plan does not typically
test every alternative. Instead, it tests for conditions that would
either support or fail to support a few major alternatives. It seems to
me that the optimizer could easily check to see that a processor has
enough memory available to support a hash index on the smaller table,
that the projection doesn't cross page boundaries, and that data volume
justifies indexing. A left join that will support a hash index would, I
believe, always work faster given datasets of minimal numbers of
topples.

Perhaps some of the very large number of platforms that have a version
of V6.7+ SAS don't handle hash objects gracefully, if at all, and the
SAS SQL developers don't choose to highlight that fact. I don't recall a
case where the SAS Ballot has drifted into the method and plan
implementation of SAS SQL. Perhaps it should.
S

-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailtowner-sas-l@listserv.uga.edu]
On Behalf Of David Johnson
Sent: Tuesday, August 14, 2007 12:03 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: RE: A better JOIN method


I understood your thought, and unless I am missing something, that is
what concerned me. In the phase where the SQL Procedure defines its
actions, the actions we see specified from the _Method parameter, we
should be seeing the planned actions of the procedure. We see the same
processes defined in a different way when the Tree option is used to
draw an hierarchical chart of the actions that will be undertaken.

I would expect that in defining those actions, the procedure will have
tested the need for each branch and leaf of the method tree. If we have
a suitable index in place on table AABA, then when the SQXSRC leaf is
attached to the tree for the table, there is no intermediate sort action
defined. This is something I have seen on a number of occasions and
reassures me that the SQL plan is built with awareness of the data
sources and the data target. So it seems anomalous to me that the plan,
which can add or remove an action based on the presence of a usable
index on a source table, could not also see that the table has a single
row, and would not therefore require sorting. Granted, my comment is
solely constrained to a SAS source table, since foreign source tables do
not usually surface record counts with their header information.

I've read comments by our resident SQL guru before about the decision to
employ a hash vs. employing a sort which reassures me that the planner
is applying constraints based upon its awareness of the size of the
input tables. If the planner does not recognise single row tables, and
we can demonstrate that in some way, then it seems to me we have a
candidate for the SAS Ballot.

Kind regards

David


-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Jack
Hamilton
Sent: Tuesday, 14 August 2007 2:31 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: A better JOIN method


I wasn't thinking that sqxsort was listed but not called; I was thinking
it might be called but not actually perform a sort. It might (and, I
claim,
should) look at the input data set, see that there's only one record,
and do a copy instead. sqxsort might also do some other setup for the
next phase of the join.

I don't know why sqxsort is used instead of a hash. Perhaps that's on
the list of optimizations they haven't gotten to yet (and it wouldn't be
in my Top Five List of SQL Enhancements).


....... Original Message .......
On Mon, 13 Aug 2007 14:19:09 +1000 "David Johnson" <d@DKVJ.BIZ> wrote:
>That's an interesting perspective. I expected that if I called for the


>_Method in a SQL Procedure call that I would be told the truth and if
>SqxSort was a called method, then I would expect it had been employed.
>
>I don't like the idea that I may have an inefficient SQL Procedure that


>I was trying to improve, that told me certain methods were used, that I


>may spend some time trying to tune, only to find they weren't all used
>after all.
>
>I think I'd like a definitive answer on that, or at least a repeatable
>example of a case where a method is listed but wasn't used. I am still


>using the (now very dated) TS note by Paul Kent that looks at the
>methods for SQL procedure calls. In the absence of a Sig I can tap on
>the shoulder or telephone when I get stuck, the method and stimer
>outputs are important to me.
>
>Kind regards
>
>David
>
>
>-----Original Message-----
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
>Jack Hamilton
>Sent: Monday, 13 August 2007 1:13 PM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: A better JOIN method
>
>
>On Sun, 12 Aug 2007 22:28:11 -0400, "Ken Borowiak"
><EvilPettingZoo97@AOL.COM> said:
>
>> Regarding the join stategy for Left/Right joins, I have concocted
>> many examples and have not once observed a hash join employed. Even
>> for the single observation table involved in a left join you
>> suggested in your reply, the results from _method suggest a Join
>> Merge. It even sorts the one observation table!

>
>I don't think it's safe to assume that SAS will be suboptimal in the
>future just because it's suboptimal now. Or that some apparently small


>change won't result in a different join method being used.
>
>And for that matter, it's not safe to assume that an actual sort is
>performed even if the sqxsort method is called. I might be wrong, but
>I'd like to think that the internal SAS sort routine (or the external
>sort routine if one is called) is smart enough not to perform a sort on


>a one record file. It knows or can know how many records there are in
>a SAS data set.
>
>
>
>On Sun, 12 Aug 2007 22:28:11 -0400, "Ken Borowiak"
><EvilPettingZoo97@AOL.COM> said:
>> On Sun, 12 Aug 2007 13:00:12 -0700, Jack Hamilton
>> <jfh@STANFORDALUMNI.ORG>
>> wrote:
>>
>> >I don't think we know for sure that a LEFT JOIN will require SAS to
>> >sort one or both of the underlying data sets. The SQL optimizer
>> >might decide, for instance, to build an internal hash table, or to
>> >do a sequential scan (consider the case where one table has only one


>> >observation - no need to sort).
>> >
>> >The hash object can indeed be very fast, but doesn't work when the
>> >hash won't fit into memory. That is, it scales very well up to a
>> >certain point, and then fails abruptly. I think that's the biggest
>> >single weakness of the current implementation of hash objects (it's
>> >also something that SAS could decide to fix in a future release with


>> >no necessary changes to syntax or semantics).
>> >
>> >
>> >
>> >--
>> >Jack Hamilton

>>
>>
>> Jack,
>>
>> You are spot on noting the limitation of using a hash, that
>> sufficient amount of memory is available.
>>
>> Regarding the join stategy for Left/Right joins, I have concocted
>> many examples and have not once observed a hash join employed. Even
>> for the single observation table involved in a left join you
>> suggested in your reply, the results from _method suggest a Join
>> Merge. It even sorts the one observation table!
>>
>> 47 data DataIn3 ;
>> 48 set DataIn2( obs=1 ) ;
>> 49 run ;
>>
>> NOTE: There were 1 observations read from the data set WORK.DATAIN2.
>> NOTE: The data set WORK.DATAIN3 has 1 observations and 1 variables.
>> NOTE: DATA statement used (Total process time):
>> real time 0.01 seconds
>> cpu time 0.01 seconds
>>
>>
>> 50
>> 51 proc sql _method ;
>> 52 create table abc as
>> 53 select T1.*
>> 54 from DataIn as T1
>> 55 left join
>> 56 DataIn3 as T3
>> 57 on T1.Id=T3.Id
>> 58 where T3.Id is missing and
>> 59 T1.Wk=32 ;
>>
>> NOTE: SQL execution methods chosen are:
>>
>> sqxcrta
>> sqxfil
>> sqxjm
>> sqxsort
>> sqxsrc( WORK.DATAIN3(alias = T3) )
>> sqxsort
>> sqxsrc( WORK.DATAIN(alias = T1) )
>> NOTE: Table WORK.ABC created, with 200000 rows and 2 columns.
>>
>> 60 quit ;
>> NOTE: PROCEDURE SQL used (Total process time):
>> real time 6.04 seconds
>> cpu time 1.84 seconds
>>
>>
>> I would be most interested in seeing a PROC SQL example where a
>> Left/Right join employs a Hash join.
>>
>> Regards,
>> Ken

>--
>Jack Hamilton
>Sacramento, California
>jfh@alumni.stanford.org

___
Jack Hamilton
Sacramento, California
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

Reply

Popular Tags in the Forum
join, method

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: A better JOIN method Paul Dorfman Newsgroup comp.soft-sys.sas 0 08-14-2007 04:46 AM
Re: A better JOIN method David Johnson Newsgroup comp.soft-sys.sas 0 08-13-2007 04:19 AM
Re: A better JOIN method Duell, Bob Newsgroup comp.soft-sys.sas 0 08-10-2007 10:12 PM
Re: An Entirely General Method for Loading Parameters Stored in a Michael Johnston Newsgroup comp.soft-sys.sas 0 05-02-2006 02:09 PM
Re: An Entirely General Method for Loading Parameters Stored in a Gregg P. Snell Newsgroup comp.soft-sys.sas 0 05-01-2006 09:52 PM



All times are GMT. The time now is 03:23 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.