Go Back   Rhinocerus > Newsgroup > Newsgroup comp.databases.* > Newsgroup comp.databases.ms-sqlserver

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 04-27-2009, 05:07 PM
Eric
Guest
 
Posts: n/a
Default Query Performance

Hi,

I'm a bit green in trying to optimize query performance, so please
bear with me.

I've got a query built in a package that has run well over the past 3
years. It involves a large table with a left outer join to a
dimension table. The dimension table had a compound primary key made
up of field1 and field2. There was no primary key on my "left" fact
table. As I said, the query in this condition has executed well every
day for several years.

Recently, we had to add a field to the dimension table and make it
part of the primary key. So, now, the dimension table has a 3-field
compound primary key made up of field1, field2 and field3. In my
query, however, I'm still only joining to the dimension table on
field1 and field2. This has dramatically deteriorated the performance
of the query.

I'm not very good at interpreting query execution plans at this time,
though I'm trying to learn.

My question is: why would adding an additional field to the primary
key in my dimension table so deteriorate the performance of the query,
even when I am not including the newly added field in the query or as
part of the join? My problem is, I really can't include the newly
added field as part of the join, since it does not exist in my fact
table.

Sorry this is all a bit abstract. In general, I'm just trying to
figure out if primary key fields should always be part of joins as
part of a well written query.

Thanks for any help anyone can provide.

Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 04-27-2009, 08:23 PM
Erland Sommarskog
Guest
 
Posts: n/a
Default Re: Query Performance

Eric (evajen01@yahoo.com) writes:
> I'm a bit green in trying to optimize query performance, so please
> bear with me.
>
> I've got a query built in a package that has run well over the past 3
> years. It involves a large table with a left outer join to a
> dimension table. The dimension table had a compound primary key made
> up of field1 and field2. There was no primary key on my "left" fact
> table. As I said, the query in this condition has executed well every
> day for several years.
>
> Recently, we had to add a field to the dimension table and make it
> part of the primary key. So, now, the dimension table has a 3-field
> compound primary key made up of field1, field2 and field3. In my
> query, however, I'm still only joining to the dimension table on
> field1 and field2. This has dramatically deteriorated the performance
> of the query.
>
> I'm not very good at interpreting query execution plans at this time,
> though I'm trying to learn.
>
> My question is: why would adding an additional field to the primary
> key in my dimension table so deteriorate the performance of the query,
> even when I am not including the newly added field in the query or as
> part of the join? My problem is, I really can't include the newly
> added field as part of the join, since it does not exist in my fact
> table.
>
> Sorry this is all a bit abstract. In general, I'm just trying to
> figure out if primary key fields should always be part of joins as
> part of a well written query.


Performance tuning is rarely trivial, nor is understanding scenarios
like this. And it certainly does not help when one does not see the
query. I can think of a more than one possible reason, but I don't
feel like wasting time of typing guess that may be entirely off-the-mark.

I would suggest that you post the actual query, as well as the CREATE
TABLE and CREATE INDEX statemetns for the table. If you don't want
to publish intellectual property, put a script with it all in a editor
window, and use a find/replace function to replace names with something
that does not reveal the business. Just be careful that the same name
is replaced consistently.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
  #3 (permalink)  
Old 04-27-2009, 09:21 PM
Eric
Guest
 
Posts: n/a
Default Re: Query Performance

Certainly, I will post the query and the CREATE table script. They
are quite lengthly, so I avoided posting them originally.

The query is listed first, which is basically the same as the query
that we've been using successfully for years, except for the line in
the WHERE clause of SOURCE_ID = 1 (SOURCE_ID is the field that we've
added to the dimension table as part of its primary key). The CREATE
table script for the dimension table follows the query. I apologize
for all of the superfluous information included, but I thought it
would be best to post the query in its entirety. After the create
table script, I've posted a work-around query as our solution, which
seems to execute o.k., but I don't understand why we would have to use
this type of a workaround, where we are basically creating arbitrary
data that exists only to perform a join. Thanks for considering.

Query:
SELECT
1 AS DATA_TYPE_ID,
1 AS DATA_LOAD_ID,
53 AS DATA_LOAD_ADJ_ID,
'CERN' AS SETID,
A.BUSINESS_UNIT_GL AS LEGAL_ENTITY,
CASE WHEN V.CDN IS NOT NULL THEN
V.CDN
ELSE
T.CDN
END AS CDN,
CASE WHEN A.JOURNAL_ID <> ' ' THEN
A.JOURNAL_ID
ELSE
'0'
END AS JOURNAL_ID,
V.CDN AS JOURNAL_DATE_CDN,
CASE WHEN C.CONTRACT_NUM <> ' ' THEN
C.CONTRACT_NUM
ELSE
'0'
END AS CONTRACT_NUM,
C.CONTRACT_LINE_NUM,
C.INVOICE,
W.ORDER_NO,
CASE WHEN H.CUST_ID IS NOT NULL AND H.CUST_ID <> 0 THEN
H.CUST_ID
ELSE CASE WHEN U.CUST_ID <> 0 THEN
U.CUST_ID
ELSE
'0'
END END
AS CUST_ID,
A.DEPTID,
CASE WHEN Q.BUS_MODEL_CD IS NOT NULL THEN -- from contract line
Q.BUS_MODEL_CD
ELSE CASE WHEN B.BUS_MODEL_CD IS NOT NULL THEN -- from account
B.BUS_MODEL_CD
ELSE
'0000'
END END
AS BUS_MODEL_CD,
CASE WHEN C.PROJECT_ID <> ' ' THEN
C.PROJECT_ID
ELSE
'00000000'
END AS PROJECT_ID,
A.ACCOUNT,
CASE WHEN Q.PRODUCT_ID IS NOT NULL THEN
Q.PRODUCT_ID
ELSE CASE WHEN H.PRODUCT_ID IS NOT NULL THEN
H.PRODUCT_ID
ELSE
'00000000'
END END
AS PRODUCT_ID,
'Billing Detail' AS DESCR,
A.RESOURCE_TYPE,
A.RESOURCE_CATEGORY,
A.RESOURCE_SUB_CAT,
-- CASE WHEN C.PROJECT_ID IS NOT NULL THEN
-- CASE WHEN UPPER(L.PROJ_CAT_DESCR) LIKE '%CLASSIC%' THEN
CASE WHEN L.ZZ_PROJ_CAT_ID IN ('5070','5080','5090') THEN
'C'
ELSE
'M'
END
-- ELSE
-- 'M'
-- END
AS SW_VERSION_CD,
BB.BASE_CURRENCY AS CURRENCY_CD,
SUM(A.MONETARY_AMOUNT * -1) AS BASE_AMOUNT,
CASE WHEN D.RATE IS NOT NULL THEN
SUM(A.MONETARY_AMOUNT * -1 * D.RATE)
ELSE
SUM(A.MONETARY_AMOUNT * -1 * DD.RATE)
END
AS USD_AMOUNT
FROM S1_PS_BI_ACCT_ENTRY A
LEFT OUTER JOIN D_ACCOUNT_SS B ON A.ACCOUNT = B.ACCOUNT
LEFT OUTER JOIN D_S1_PS_BUS_UNIT_TBL_GL BB ON A.BUSINESS_UNIT_GL =
BB.BUSINESS_UNIT
LEFT OUTER JOIN S1_PS_BI_LINE C ON A.BUSINESS_UNIT = C.BUSINESS_UNIT
AND A.LINE_SEQ_NUM = C.LINE_SEQ_NUM AND
A.INVOICE = C.INVOICE
LEFT OUTER JOIN D_CURRENCY_CONVERSION_AVG_VW D ON A.ACCOUNTING_PERIOD
= D.MONTH_ID AND
A.FISCAL_YEAR = D.YEAR_ID AND BB.BASE_CURRENCY = D.FROM_CUR
LEFT OUTER JOIN D_CURRENCY_CONVERSION_CRRNT_VW DD ON BB.BASE_CURRENCY
= DD.FROM_CUR
--LEFT OUTER JOIN D_DEPT_SS_PRE_CONVERT F ON A.DEPTID = F.DEPTID
LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND
C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM
LEFT OUTER JOIN D_PROJECT_SS L ON C.PROJECT_ID = L.PROJECT_ID
LEFT OUTER JOIN D_PRODUCT_SS Q ON C.PRODUCT_ID = Q.PRODUCT_ID
LEFT OUTER JOIN D_DATE T ON A.ACCOUNTING_DT = T.DATE_ID
LEFT OUTER JOIN D_INVOICE_SS U ON A.BUSINESS_UNIT = U.LEGAL_ENTITY AND
A.INVOICE = U.INVOICE
LEFT OUTER JOIN D_DATE V ON A.JOURNAL_DATE = V.DATE_ID
LEFT OUTER JOIN D_ORDER_SS W ON C.CONTRACT_NUM = W.CONTRACT_NUM AND
C.CONTRACT_LINE_NUM = W.CONTRACT_LINE_NUM
WHERE
B.ACCT_LEVEL2_CD = 'CURYRERN' AND
B.ACCT_LEVEL3_CD IN ('OPERREVS','COSTREVN','NNOPREX1')
AND H.SOURCE_ID = '1'
GROUP BY
A.BUSINESS_UNIT_GL,
V.CDN,
T.CDN,
A.JOURNAL_ID,
C.CONTRACT_NUM,
C.CONTRACT_LINE_NUM,
C.INVOICE,
W.ORDER_NO,
H.CUST_ID,
U.CUST_ID,
A.DEPTID,
Q.BUS_MODEL_CD,
B.BUS_MODEL_CD,
C.PROJECT_ID,
A.ACCOUNT,
Q.PRODUCT_ID,
H.PRODUCT_ID,
A.RESOURCE_TYPE,
A.RESOURCE_CATEGORY,
A.RESOURCE_SUB_CAT,
L.ZZ_PROJ_CAT_ID,
BB.BASE_CURRENCY,
D.RATE


D_CONTRACT_SS dimension table create:

USE [LIMESTONE]
GO
/****** Object: Table [dbo].[D_CONTRACT_SS] Script Date:
04/27/2009 15:54:42 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[D_CONTRACT_SS](
[CONTRACT_NUM] [varchar](25) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_CONTRACT_NUM] DEFAULT (0),
[CONTRACT_LINE_NUM] [smallint] NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_CONTRACT_LINE_NUM] DEFAULT (0),
[SOURCE_ID] [smallint] NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_SOURCE_ID] DEFAULT (1),
[PRICING_STRUCTURE] [varchar](4) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_PRICING_STRUCTURE] DEFAULT ('NS'),
[BILL_PLAN_STRUCTURE] [varchar](15) NULL,
[CUST_ID] [varchar](8) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_CUST_ID] DEFAULT (0),
[OPTY_ID] [varchar](25) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_OPTY_ID] DEFAULT (0),
[CA_STATUS] [varchar](24) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_CA_STATUS] DEFAULT ('NOT SPECIFIED'),
[CONTRACT_TYPE] [varchar](15) NOT NULL CONSTRAINT
[DF_D_CONTRACT_CONTRACT_TYPE] DEFAULT ('NOT SPECIFIED'),
[PRODUCT_ID] [varchar](18) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_PRODUCT_ID] DEFAULT ('00000000'),
[ACCT_PLAN_ID] [varchar](5) NULL,
[BILL_PLAN_ID] [varchar](5) NULL,
[PROJECT_ID] [varchar](10) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_PROJECT_ID] DEFAULT ('00000000'),
[CONTRACT_SIGN_CDN] [int] NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_CONTRACT_SIGN_CDN] DEFAULT (0),
[CONTRACT_BOOK_CDN] [int] NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_CONTRACT_BOOK_CDN] DEFAULT (0),
[END_TERM_DT] [datetime] NULL,
[DESCR] [varchar](30) NULL,
[CURRENCY_CD] [varchar](5) NULL,
[REV] [money] NULL,
[COST] [money] NULL,
[LIMIT_AMOUNT] [numeric](18, 0) NULL,
[START_CDN] [int] NULL,
[END_CDN] [int] NULL,
[LAST_REV_CDN] [int] NULL,
[LASTUPDDTTM] [datetime] NULL,
[REV_ACCOUNT] [varchar](10) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_REV_ACCOUNT] DEFAULT (0),
[COST_ACCOUNT] [varchar](10) NOT NULL CONSTRAINT
[DF_D_CONTRACT_SS_COST_ACCOUNT] DEFAULT (0),
[ZZ_BILL_PN_REF] [varchar](10) NULL,
[ZZ_CONTRACT_REF] [varchar](25) NULL,
[ZZ_REV_PN_ONLY_FLG] [varchar](1) NULL,
[UPDT_DT_TM] [smalldatetime] NULL CONSTRAINT
[DF_D_CONTRACT_UPDT_DT_TM] DEFAULT (getdate()),
[UPDT_PRSNL] [char](10) NULL CONSTRAINT [DF_D_CONTRACT_UPDT_PRSNL]
DEFAULT ('SYSTM'),
CONSTRAINT [PK_D_CONTRACT_SS] PRIMARY KEY CLUSTERED
(
[CONTRACT_NUM] ASC,
[CONTRACT_LINE_NUM] ASC,
[SOURCE_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY
= OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF


Current work-around query:

SELECT
1 AS DATA_TYPE_ID,
1 AS DATA_LOAD_ID,
53 AS DATA_LOAD_ADJ_ID,
'CERN' AS SETID,
A.BUSINESS_UNIT_GL AS LEGAL_ENTITY,
CASE WHEN V.CDN IS NOT NULL THEN
V.CDN
ELSE
T.CDN
END AS CDN,
CASE WHEN A.JOURNAL_ID <> ' ' THEN
A.JOURNAL_ID
ELSE
'0'
END AS JOURNAL_ID,
V.CDN AS JOURNAL_DATE_CDN,
CASE WHEN C.CONTRACT_NUM <> ' ' THEN
C.CONTRACT_NUM
ELSE
'0'
END AS CONTRACT_NUM,
C.CONTRACT_LINE_NUM,
C.INVOICE,
W.ORDER_NO,
CASE WHEN H.CUST_ID IS NOT NULL AND H.CUST_ID <> 0 THEN
H.CUST_ID
ELSE CASE WHEN U.CUST_ID <> 0 THEN
U.CUST_ID
ELSE
'0'
END END
AS CUST_ID,
A.DEPTID,
CASE WHEN Q.BUS_MODEL_CD IS NOT NULL THEN -- from contract line
Q.BUS_MODEL_CD
ELSE CASE WHEN B.BUS_MODEL_CD IS NOT NULL THEN -- from account
B.BUS_MODEL_CD
ELSE
'0000'
END END
AS BUS_MODEL_CD,
CASE WHEN C.PROJECT_ID <> ' ' THEN
C.PROJECT_ID
ELSE
'00000000'
END AS PROJECT_ID,
A.ACCOUNT,
CASE WHEN Q.PRODUCT_ID IS NOT NULL THEN
Q.PRODUCT_ID
ELSE CASE WHEN H.PRODUCT_ID IS NOT NULL THEN
H.PRODUCT_ID
ELSE
'00000000'
END END
AS PRODUCT_ID,
'Billing Detail' AS DESCR,
A.RESOURCE_TYPE,
A.RESOURCE_CATEGORY,
A.RESOURCE_SUB_CAT,
-- CASE WHEN C.PROJECT_ID IS NOT NULL THEN
-- CASE WHEN UPPER(L.PROJ_CAT_DESCR) LIKE '%CLASSIC%' THEN
CASE WHEN L.ZZ_PROJ_CAT_ID IN ('5070','5080','5090') THEN
'C'
ELSE
'M'
END
-- ELSE
-- 'M'
-- END
AS SW_VERSION_CD,
BB.BASE_CURRENCY AS CURRENCY_CD,
SUM(A.MONETARY_AMOUNT * -1) AS BASE_AMOUNT,
CASE WHEN D.RATE IS NOT NULL THEN
SUM(A.MONETARY_AMOUNT * -1 * D.RATE)
ELSE
SUM(A.MONETARY_AMOUNT * -1 * DD.RATE)
END
AS USD_AMOUNT
FROM
(select '1' as source_id, * from S1_PS_BI_ACCT_ENTRY) A
LEFT OUTER JOIN D_ACCOUNT_SS B ON A.ACCOUNT = B.ACCOUNT
LEFT OUTER JOIN D_S1_PS_BUS_UNIT_TBL_GL BB ON A.BUSINESS_UNIT_GL =
BB.BUSINESS_UNIT
LEFT OUTER JOIN (select '1' as source_id, * from S1_PS_BI_LINE) C ON
A.BUSINESS_UNIT = C.BUSINESS_UNIT AND A.LINE_SEQ_NUM = C.LINE_SEQ_NUM
AND
A.INVOICE = C.INVOICE and a.source_id = c.source_id
LEFT OUTER JOIN D_CURRENCY_CONVERSION_AVG_VW D ON A.ACCOUNTING_PERIOD
= D.MONTH_ID AND
A.FISCAL_YEAR = D.YEAR_ID AND BB.BASE_CURRENCY = D.FROM_CUR
LEFT OUTER JOIN D_CURRENCY_CONVERSION_CRRNT_VW DD ON BB.BASE_CURRENCY
= DD.FROM_CUR
--LEFT OUTER JOIN D_DEPT_SS_PRE_CONVERT F ON A.DEPTID = F.DEPTID
LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND
C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM
and c.source_id = h.source_id
LEFT OUTER JOIN D_PROJECT_SS L ON C.PROJECT_ID = L.PROJECT_ID
LEFT OUTER JOIN D_PRODUCT_SS Q ON C.PRODUCT_ID = Q.PRODUCT_ID
LEFT OUTER JOIN D_DATE T ON A.ACCOUNTING_DT = T.DATE_ID
LEFT OUTER JOIN D_INVOICE_SS U ON A.BUSINESS_UNIT = U.LEGAL_ENTITY AND
A.INVOICE = U.INVOICE
LEFT OUTER JOIN D_DATE V ON A.JOURNAL_DATE = V.DATE_ID
LEFT OUTER JOIN D_ORDER_SS W ON C.CONTRACT_NUM = W.CONTRACT_NUM AND
C.CONTRACT_LINE_NUM = W.CONTRACT_LINE_NUM
WHERE
B.ACCT_LEVEL2_CD = 'CURYRERN' AND
B.ACCT_LEVEL3_CD IN ('OPERREVS','COSTREVN','NNOPREX1')
GROUP BY
A.BUSINESS_UNIT_GL,
V.CDN,
T.CDN,
A.JOURNAL_ID,
C.CONTRACT_NUM,
C.CONTRACT_LINE_NUM,
C.INVOICE,
W.ORDER_NO,
H.CUST_ID,
U.CUST_ID,
A.DEPTID,
Q.BUS_MODEL_CD,
B.BUS_MODEL_CD,
C.PROJECT_ID,
A.ACCOUNT,
Q.PRODUCT_ID,
H.PRODUCT_ID,
A.RESOURCE_TYPE,
A.RESOURCE_CATEGORY,
A.RESOURCE_SUB_CAT,
L.ZZ_PROJ_CAT_ID,
BB.BASE_CURRENCY,
D.RATE



Reply With Quote
  #4 (permalink)  
Old 04-27-2009, 09:42 PM
Ed Murphy
Guest
 
Posts: n/a
Default Re: Query Performance

Eric wrote:

> The query is listed first, which is basically the same as the query
> that we've been using successfully for years, except for the line in
> the WHERE clause of SOURCE_ID = 1 (SOURCE_ID is the field that we've
> added to the dimension table as part of its primary key).


Putting it in WHERE defeats the left outer join to D_CONTRACT_SS. You
should move it to the join instead, i.e.

LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND
C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM AND H.SOURCE_ID = 1

which I suspect will also fix the speed issue.
Reply With Quote
  #5 (permalink)  
Old 04-27-2009, 10:12 PM
Erland Sommarskog
Guest
 
Posts: n/a
Default Re: Query Performance

Eric (evajen01@yahoo.com) writes:
> Certainly, I will post the query and the CREATE table script. They
> are quite lengthly, so I avoided posting them originally.
>
> The query is listed first, which is basically the same as the query
> that we've been using successfully for years, except for the line in
> the WHERE clause of SOURCE_ID = 1 (SOURCE_ID is the field that we've
> added to the dimension table as part of its primary key). The CREATE
> table script for the dimension table follows the query. I apologize
> for all of the superfluous information included, but I thought it
> would be best to post the query in its entirety. After the create
> table script, I've posted a work-around query as our solution, which
> seems to execute o.k., but I don't understand why we would have to use
> this type of a workaround, where we are basically creating arbitrary
> data that exists only to perform a join. Thanks for considering.


From your original post, it sounded like it was a two-table query.
See there a good reason to post the entire query!

Now, your "workaround" is not only a workaround, it is also a correct
version of the query. The slow version is very likely to be incorrect,
as Ed pointed out.

If you say:

SELECT ...
FROM A
LEFT JOIN B ON A.col = B.col

The result set will include all rows in A since this is an outer join.
But if you know add a WHERE condition:

WHERE B.col = 1

You know filter so that you include only rows where B.col = 1. This means
that if a row in A does not have a matching row in B, that row is no longer
in the result set. You outer join has effectively become an inner join.

Thus:

> LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND
> C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM
>...
> WHERE
> B.ACCT_LEVEL2_CD = 'CURYRERN' AND
> B.ACCT_LEVEL3_CD IN ('OPERREVS','COSTREVN','NNOPREX1')
> AND H.SOURCE_ID = '1'


should probably be:

LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM
AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM
AND H.SOURCE_ID = 1

which also what you have in your "workaround" in a somewhat more
contrived way.

Besides, I notice that also the B table is left-joined in the query,
but that is effectively an inner join given the WHERE clause.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
  #6 (permalink)  
Old 04-28-2009, 11:15 AM
Sybaseguru
Guest
 
Posts: n/a
Default Re: Query Performance

Eric wrote:
> Recently, we had to add a field to the dimension table and make it
> part of the primary key. So, now, the dimension table has a 3-field
> compound primary key made up of field1, field2 and field3. In my
> query, however, I'm still only joining to the dimension table on
> field1 and field2. This has dramatically deteriorated the performance
> of the query.
>
> My question is: why would adding an additional field to the primary
> key in my dimension table so deteriorate the performance of the query,
> even when I am not including the newly added field in the query or as
> part of the join? My problem is, I really can't include the newly
> added field as part of the join, since it does not exist in my fact
> table.


Adding a field to an index means that the index is larger - especially if
this were a char field. This in turn could change the optimisers selection
of "best" plan.

Reply With Quote
  #7 (permalink)  
Old 04-28-2009, 09:56 PM
Eric
Guest
 
Posts: n/a
Default Re: Query Performance

Thanks to everyone for the feedback. I get what you are saying about
adding the filter to the WHERE clause negating the LEFT JOIN.

What I can't understand is why the 'contrived' workaround that I am
using runs faster than simply adding the H.SOURCE_ID = 1 to the join,
as:

LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM
AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM
AND H.SOURCE_ID = 1

The workaround that I am using seems like it would be more
inefficient, since I basically am creating a source id in a subquery
for tables where it does not exist for the sole purpose of creating
the join, as:

FROM
(select '1' as source_id, * from S1_PS_BI_ACCT_ENTRY) A
LEFT OUTER JOIN (select '1' as source_id, * from S1_PS_BI_LINE) C ON
A.BUSINESS_UNIT = C.BUSINESS_UNIT AND A.LINE_SEQ_NUM = C.LINE_SEQ_NUM
AND A.INVOICE = C.INVOICE and a.source_id = c.source_id
LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM
AND
C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM
and c.source_id = h.source_id

I can't figure out why this runs, while adding the SOURCE_ID = 1 to
the JOIN makes the query hang. I guess I should be glad that I can
get it to run this way, but I wish I understood better why it works
one way, and not the other.

Thanks!
Reply With Quote
  #8 (permalink)  
Old 04-29-2009, 07:39 AM
Erland Sommarskog
Guest
 
Posts: n/a
Default Re: Query Performance

Eric (evajen01@yahoo.com) writes:
> Thanks to everyone for the feedback. I get what you are saying about
> adding the filter to the WHERE clause negating the LEFT JOIN.
>
> What I can't understand is why the 'contrived' workaround that I am
> using runs faster than simply adding the H.SOURCE_ID = 1 to the join,
> as:
>
> LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM
> AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM
> AND H.SOURCE_ID = 1
>
> The workaround that I am using seems like it would be more
> inefficient, since I basically am creating a source id in a subquery
> for tables where it does not exist for the sole purpose of creating
> the join, as:
>
> FROM
> (select '1' as source_id, * from S1_PS_BI_ACCT_ENTRY) A
> LEFT OUTER JOIN (select '1' as source_id, * from S1_PS_BI_LINE) C ON
> A.BUSINESS_UNIT = C.BUSINESS_UNIT AND A.LINE_SEQ_NUM = C.LINE_SEQ_NUM
> AND A.INVOICE = C.INVOICE and a.source_id = c.source_id
> LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM
> AND
> C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM
> and c.source_id = h.source_id
>
> I can't figure out why this runs, while adding the SOURCE_ID = 1 to
> the JOIN makes the query hang. I guess I should be glad that I can
> get it to run this way, but I wish I understood better why it works
> one way, and not the other.


Logically they are the same, at least as far as I can see, but maybe the
optimizer disagrees. Or it just did just see trhough it.

Keep in mind that the default behaviour is that any query of this complexity
is slow. The optimizer then works by from *statistics* *sampled* about the
data distribution in the tables and *estimates* what is the best way to
execute the query. The optimizer does not consider all possible query plans,
because that would take too long time, but stops when it has found a plan
which seems good enough. It spoils you by getting things right most the
time, but when it doesn't, you get hurt.

To be able to tell why one query works well or another, I would need to see
the different query plans with estimates and all, and it would still be
a difficult job.

Consider yourself lucky that you found a workaround by being a bit quirky.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Reply With Quote
 
Reply

Popular Tags in the Forum
performance, query

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 12:01 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.