|
|||
|
I used to think the cost of an NLJOIN between an outer and an inner
subquery should be at least card(outer) * cost(inner), but the optimizer of DB2/NT64 9.7.5 at standard settings surprises me with the following: 2.60477e+008 NLJOIN ( 2) 350361 42426.5 /----+-----\ 3.86338e+006 67.422 TBSCAN IXSCAN ( 3) ( 6) 35725.5 25.7914 15576.4 2 According to my understanding, the resulting cost(2) (= cost of node 2) should be at least card(3) * cost(6), which is 3.86338e+006 * 25.7914 = 99.6e6; that is, 100 million instead of the measly 350361 it comes up with. The actual performance of the plan is as terrible as it looks (I aborted it after several hours) and confirms that 100 million is closer to the truth than 350k. Better plans would presumably be available, but do not get chosen as long as this plan is calculated to be so cheap. I have seen such strange cost estimates before in the context of more complex queries, but this is the first time I have seen it in a simple two-table join. Can anyone explain what is wrong here? Regards, Oliver Schoett |
|
|
||||
|
||||
|
|
|
|||
|
On 06/20/2012 07:01 PM, Oliver Schoett wrote:
> I used to think the cost of an NLJOIN between an outer and an inner subquery should be at least > > card(outer) * cost(inner), > > but the optimizer of DB2/NT64 9.7.5 at standard settings surprises me with the following: > > 2.60477e+008 > NLJOIN > ( 2) > 350361 > 42426.5 > /----+-----\ > 3.86338e+006 67.422 > TBSCAN IXSCAN > ( 3) ( 6) > 35725.5 25.7914 > 15576.4 2 > > According to my understanding, the resulting cost(2) (= cost of node 2) should be at least card(3) * cost(6), which is 3.86338e+006 * 25.7914 = 99.6e6; that is, 100 million instead of the measly 350361 it comes up with. > > The actual performance of the plan is as terrible as it looks (I aborted it after several hours) and confirms that 100 million is closer to the truth than 350k. Better plans would presumably be available, but do not get chosen as long as this plan is > calculated to be so cheap. > > I have seen such strange cost estimates before in the context of more complex queries, but this is the first time I have seen it in a simple two-table join. > > Can anyone explain what is wrong here? > > Regards, > > Oliver Schoett If the plan estimate does not agree with reality, I'd first check if statistcs are OK. /dg |
|
|||
|
danfan46 wrote:
> If the plan estimate does not agree with reality, I'd first check if > statistcs are OK. > /dg Good point - I had only inspected the statistics visually. So I created them afresh with RUNSTATS ON TABLE ... on all columns WITH DISTRIBUTION AND DETAILED INDEXES ALL ALLOW WRITE ACCESS on both the tables involved. The access plan has not changed: 2.60477e+008 NLJOIN ( 2) 350361 42426.5 /----+-----\ 3.86338e+006 67.422 TBSCAN IXSCAN ( 3) ( 6) 35725.5 25.7914 15576.4 2 Interestingly, adding "for read only optimize for 1 row" at the end of the select statement changes not the NLJOIN but its cost estimate: 2.60477e+008 NLJOIN ( 2) 757934 67353.3 /-----+------\ 3.86338e+006 67.422 TBSCAN IXSCAN ( 3) ( 4) 63038 25.7914 34066 2 The cost change of node(3) is OK, as a different index is used to obtain the set of join keys (wider, hence requiring more I/O, but not requiring a sort, hence delivering the first row more quickly). This should increase the total cost of the NLJOIN accordingly (by about 30K), as the result set of node(3) is the same as in the original plan (it is the result of a full index scan as there are no conditions except the join conditions in the query). However, the cost estimate of the NLJOIN is now ~760k instead of the correct ~100 million, so just as wrong, only different. This does not make any sense to me whatsoever. Regards, Oliver Schoett |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|