|
|||
|
Hi,
I'm a bit green in trying to optimize query performance, so please bear with me. I've got a query built in a package that has run well over the past 3 years. It involves a large table with a left outer join to a dimension table. The dimension table had a compound primary key made up of field1 and field2. There was no primary key on my "left" fact table. As I said, the query in this condition has executed well every day for several years. Recently, we had to add a field to the dimension table and make it part of the primary key. So, now, the dimension table has a 3-field compound primary key made up of field1, field2 and field3. In my query, however, I'm still only joining to the dimension table on field1 and field2. This has dramatically deteriorated the performance of the query. I'm not very good at interpreting query execution plans at this time, though I'm trying to learn. My question is: why would adding an additional field to the primary key in my dimension table so deteriorate the performance of the query, even when I am not including the newly added field in the query or as part of the join? My problem is, I really can't include the newly added field as part of the join, since it does not exist in my fact table. Sorry this is all a bit abstract. In general, I'm just trying to figure out if primary key fields should always be part of joins as part of a well written query. Thanks for any help anyone can provide. |
|
|
||||
|
||||
|
|
|
|||
|
Eric (evajen01@yahoo.com) writes:
> I'm a bit green in trying to optimize query performance, so please > bear with me. > > I've got a query built in a package that has run well over the past 3 > years. It involves a large table with a left outer join to a > dimension table. The dimension table had a compound primary key made > up of field1 and field2. There was no primary key on my "left" fact > table. As I said, the query in this condition has executed well every > day for several years. > > Recently, we had to add a field to the dimension table and make it > part of the primary key. So, now, the dimension table has a 3-field > compound primary key made up of field1, field2 and field3. In my > query, however, I'm still only joining to the dimension table on > field1 and field2. This has dramatically deteriorated the performance > of the query. > > I'm not very good at interpreting query execution plans at this time, > though I'm trying to learn. > > My question is: why would adding an additional field to the primary > key in my dimension table so deteriorate the performance of the query, > even when I am not including the newly added field in the query or as > part of the join? My problem is, I really can't include the newly > added field as part of the join, since it does not exist in my fact > table. > > Sorry this is all a bit abstract. In general, I'm just trying to > figure out if primary key fields should always be part of joins as > part of a well written query. Performance tuning is rarely trivial, nor is understanding scenarios like this. And it certainly does not help when one does not see the query. I can think of a more than one possible reason, but I don't feel like wasting time of typing guess that may be entirely off-the-mark. I would suggest that you post the actual query, as well as the CREATE TABLE and CREATE INDEX statemetns for the table. If you don't want to publish intellectual property, put a script with it all in a editor window, and use a find/replace function to replace names with something that does not reveal the business. Just be careful that the same name is replaced consistently. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx |
|
|||
|
Certainly, I will post the query and the CREATE table script. They
are quite lengthly, so I avoided posting them originally. The query is listed first, which is basically the same as the query that we've been using successfully for years, except for the line in the WHERE clause of SOURCE_ID = 1 (SOURCE_ID is the field that we've added to the dimension table as part of its primary key). The CREATE table script for the dimension table follows the query. I apologize for all of the superfluous information included, but I thought it would be best to post the query in its entirety. After the create table script, I've posted a work-around query as our solution, which seems to execute o.k., but I don't understand why we would have to use this type of a workaround, where we are basically creating arbitrary data that exists only to perform a join. Thanks for considering. Query: SELECT 1 AS DATA_TYPE_ID, 1 AS DATA_LOAD_ID, 53 AS DATA_LOAD_ADJ_ID, 'CERN' AS SETID, A.BUSINESS_UNIT_GL AS LEGAL_ENTITY, CASE WHEN V.CDN IS NOT NULL THEN V.CDN ELSE T.CDN END AS CDN, CASE WHEN A.JOURNAL_ID <> ' ' THEN A.JOURNAL_ID ELSE '0' END AS JOURNAL_ID, V.CDN AS JOURNAL_DATE_CDN, CASE WHEN C.CONTRACT_NUM <> ' ' THEN C.CONTRACT_NUM ELSE '0' END AS CONTRACT_NUM, C.CONTRACT_LINE_NUM, C.INVOICE, W.ORDER_NO, CASE WHEN H.CUST_ID IS NOT NULL AND H.CUST_ID <> 0 THEN H.CUST_ID ELSE CASE WHEN U.CUST_ID <> 0 THEN U.CUST_ID ELSE '0' END END AS CUST_ID, A.DEPTID, CASE WHEN Q.BUS_MODEL_CD IS NOT NULL THEN -- from contract line Q.BUS_MODEL_CD ELSE CASE WHEN B.BUS_MODEL_CD IS NOT NULL THEN -- from account B.BUS_MODEL_CD ELSE '0000' END END AS BUS_MODEL_CD, CASE WHEN C.PROJECT_ID <> ' ' THEN C.PROJECT_ID ELSE '00000000' END AS PROJECT_ID, A.ACCOUNT, CASE WHEN Q.PRODUCT_ID IS NOT NULL THEN Q.PRODUCT_ID ELSE CASE WHEN H.PRODUCT_ID IS NOT NULL THEN H.PRODUCT_ID ELSE '00000000' END END AS PRODUCT_ID, 'Billing Detail' AS DESCR, A.RESOURCE_TYPE, A.RESOURCE_CATEGORY, A.RESOURCE_SUB_CAT, -- CASE WHEN C.PROJECT_ID IS NOT NULL THEN -- CASE WHEN UPPER(L.PROJ_CAT_DESCR) LIKE '%CLASSIC%' THEN CASE WHEN L.ZZ_PROJ_CAT_ID IN ('5070','5080','5090') THEN 'C' ELSE 'M' END -- ELSE -- 'M' -- END AS SW_VERSION_CD, BB.BASE_CURRENCY AS CURRENCY_CD, SUM(A.MONETARY_AMOUNT * -1) AS BASE_AMOUNT, CASE WHEN D.RATE IS NOT NULL THEN SUM(A.MONETARY_AMOUNT * -1 * D.RATE) ELSE SUM(A.MONETARY_AMOUNT * -1 * DD.RATE) END AS USD_AMOUNT FROM S1_PS_BI_ACCT_ENTRY A LEFT OUTER JOIN D_ACCOUNT_SS B ON A.ACCOUNT = B.ACCOUNT LEFT OUTER JOIN D_S1_PS_BUS_UNIT_TBL_GL BB ON A.BUSINESS_UNIT_GL = BB.BUSINESS_UNIT LEFT OUTER JOIN S1_PS_BI_LINE C ON A.BUSINESS_UNIT = C.BUSINESS_UNIT AND A.LINE_SEQ_NUM = C.LINE_SEQ_NUM AND A.INVOICE = C.INVOICE LEFT OUTER JOIN D_CURRENCY_CONVERSION_AVG_VW D ON A.ACCOUNTING_PERIOD = D.MONTH_ID AND A.FISCAL_YEAR = D.YEAR_ID AND BB.BASE_CURRENCY = D.FROM_CUR LEFT OUTER JOIN D_CURRENCY_CONVERSION_CRRNT_VW DD ON BB.BASE_CURRENCY = DD.FROM_CUR --LEFT OUTER JOIN D_DEPT_SS_PRE_CONVERT F ON A.DEPTID = F.DEPTID LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM LEFT OUTER JOIN D_PROJECT_SS L ON C.PROJECT_ID = L.PROJECT_ID LEFT OUTER JOIN D_PRODUCT_SS Q ON C.PRODUCT_ID = Q.PRODUCT_ID LEFT OUTER JOIN D_DATE T ON A.ACCOUNTING_DT = T.DATE_ID LEFT OUTER JOIN D_INVOICE_SS U ON A.BUSINESS_UNIT = U.LEGAL_ENTITY AND A.INVOICE = U.INVOICE LEFT OUTER JOIN D_DATE V ON A.JOURNAL_DATE = V.DATE_ID LEFT OUTER JOIN D_ORDER_SS W ON C.CONTRACT_NUM = W.CONTRACT_NUM AND C.CONTRACT_LINE_NUM = W.CONTRACT_LINE_NUM WHERE B.ACCT_LEVEL2_CD = 'CURYRERN' AND B.ACCT_LEVEL3_CD IN ('OPERREVS','COSTREVN','NNOPREX1') AND H.SOURCE_ID = '1' GROUP BY A.BUSINESS_UNIT_GL, V.CDN, T.CDN, A.JOURNAL_ID, C.CONTRACT_NUM, C.CONTRACT_LINE_NUM, C.INVOICE, W.ORDER_NO, H.CUST_ID, U.CUST_ID, A.DEPTID, Q.BUS_MODEL_CD, B.BUS_MODEL_CD, C.PROJECT_ID, A.ACCOUNT, Q.PRODUCT_ID, H.PRODUCT_ID, A.RESOURCE_TYPE, A.RESOURCE_CATEGORY, A.RESOURCE_SUB_CAT, L.ZZ_PROJ_CAT_ID, BB.BASE_CURRENCY, D.RATE D_CONTRACT_SS dimension table create: USE [LIMESTONE] GO /****** Object: Table [dbo].[D_CONTRACT_SS] Script Date: 04/27/2009 15:54:42 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[D_CONTRACT_SS]( [CONTRACT_NUM] [varchar](25) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_CONTRACT_NUM] DEFAULT (0), [CONTRACT_LINE_NUM] [smallint] NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_CONTRACT_LINE_NUM] DEFAULT (0), [SOURCE_ID] [smallint] NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_SOURCE_ID] DEFAULT (1), [PRICING_STRUCTURE] [varchar](4) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_PRICING_STRUCTURE] DEFAULT ('NS'), [BILL_PLAN_STRUCTURE] [varchar](15) NULL, [CUST_ID] [varchar](8) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_CUST_ID] DEFAULT (0), [OPTY_ID] [varchar](25) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_OPTY_ID] DEFAULT (0), [CA_STATUS] [varchar](24) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_CA_STATUS] DEFAULT ('NOT SPECIFIED'), [CONTRACT_TYPE] [varchar](15) NOT NULL CONSTRAINT [DF_D_CONTRACT_CONTRACT_TYPE] DEFAULT ('NOT SPECIFIED'), [PRODUCT_ID] [varchar](18) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_PRODUCT_ID] DEFAULT ('00000000'), [ACCT_PLAN_ID] [varchar](5) NULL, [BILL_PLAN_ID] [varchar](5) NULL, [PROJECT_ID] [varchar](10) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_PROJECT_ID] DEFAULT ('00000000'), [CONTRACT_SIGN_CDN] [int] NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_CONTRACT_SIGN_CDN] DEFAULT (0), [CONTRACT_BOOK_CDN] [int] NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_CONTRACT_BOOK_CDN] DEFAULT (0), [END_TERM_DT] [datetime] NULL, [DESCR] [varchar](30) NULL, [CURRENCY_CD] [varchar](5) NULL, [REV] [money] NULL, [COST] [money] NULL, [LIMIT_AMOUNT] [numeric](18, 0) NULL, [START_CDN] [int] NULL, [END_CDN] [int] NULL, [LAST_REV_CDN] [int] NULL, [LASTUPDDTTM] [datetime] NULL, [REV_ACCOUNT] [varchar](10) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_REV_ACCOUNT] DEFAULT (0), [COST_ACCOUNT] [varchar](10) NOT NULL CONSTRAINT [DF_D_CONTRACT_SS_COST_ACCOUNT] DEFAULT (0), [ZZ_BILL_PN_REF] [varchar](10) NULL, [ZZ_CONTRACT_REF] [varchar](25) NULL, [ZZ_REV_PN_ONLY_FLG] [varchar](1) NULL, [UPDT_DT_TM] [smalldatetime] NULL CONSTRAINT [DF_D_CONTRACT_UPDT_DT_TM] DEFAULT (getdate()), [UPDT_PRSNL] [char](10) NULL CONSTRAINT [DF_D_CONTRACT_UPDT_PRSNL] DEFAULT ('SYSTM'), CONSTRAINT [PK_D_CONTRACT_SS] PRIMARY KEY CLUSTERED ( [CONTRACT_NUM] ASC, [CONTRACT_LINE_NUM] ASC, [SOURCE_ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_PADDING OFF Current work-around query: SELECT 1 AS DATA_TYPE_ID, 1 AS DATA_LOAD_ID, 53 AS DATA_LOAD_ADJ_ID, 'CERN' AS SETID, A.BUSINESS_UNIT_GL AS LEGAL_ENTITY, CASE WHEN V.CDN IS NOT NULL THEN V.CDN ELSE T.CDN END AS CDN, CASE WHEN A.JOURNAL_ID <> ' ' THEN A.JOURNAL_ID ELSE '0' END AS JOURNAL_ID, V.CDN AS JOURNAL_DATE_CDN, CASE WHEN C.CONTRACT_NUM <> ' ' THEN C.CONTRACT_NUM ELSE '0' END AS CONTRACT_NUM, C.CONTRACT_LINE_NUM, C.INVOICE, W.ORDER_NO, CASE WHEN H.CUST_ID IS NOT NULL AND H.CUST_ID <> 0 THEN H.CUST_ID ELSE CASE WHEN U.CUST_ID <> 0 THEN U.CUST_ID ELSE '0' END END AS CUST_ID, A.DEPTID, CASE WHEN Q.BUS_MODEL_CD IS NOT NULL THEN -- from contract line Q.BUS_MODEL_CD ELSE CASE WHEN B.BUS_MODEL_CD IS NOT NULL THEN -- from account B.BUS_MODEL_CD ELSE '0000' END END AS BUS_MODEL_CD, CASE WHEN C.PROJECT_ID <> ' ' THEN C.PROJECT_ID ELSE '00000000' END AS PROJECT_ID, A.ACCOUNT, CASE WHEN Q.PRODUCT_ID IS NOT NULL THEN Q.PRODUCT_ID ELSE CASE WHEN H.PRODUCT_ID IS NOT NULL THEN H.PRODUCT_ID ELSE '00000000' END END AS PRODUCT_ID, 'Billing Detail' AS DESCR, A.RESOURCE_TYPE, A.RESOURCE_CATEGORY, A.RESOURCE_SUB_CAT, -- CASE WHEN C.PROJECT_ID IS NOT NULL THEN -- CASE WHEN UPPER(L.PROJ_CAT_DESCR) LIKE '%CLASSIC%' THEN CASE WHEN L.ZZ_PROJ_CAT_ID IN ('5070','5080','5090') THEN 'C' ELSE 'M' END -- ELSE -- 'M' -- END AS SW_VERSION_CD, BB.BASE_CURRENCY AS CURRENCY_CD, SUM(A.MONETARY_AMOUNT * -1) AS BASE_AMOUNT, CASE WHEN D.RATE IS NOT NULL THEN SUM(A.MONETARY_AMOUNT * -1 * D.RATE) ELSE SUM(A.MONETARY_AMOUNT * -1 * DD.RATE) END AS USD_AMOUNT FROM (select '1' as source_id, * from S1_PS_BI_ACCT_ENTRY) A LEFT OUTER JOIN D_ACCOUNT_SS B ON A.ACCOUNT = B.ACCOUNT LEFT OUTER JOIN D_S1_PS_BUS_UNIT_TBL_GL BB ON A.BUSINESS_UNIT_GL = BB.BUSINESS_UNIT LEFT OUTER JOIN (select '1' as source_id, * from S1_PS_BI_LINE) C ON A.BUSINESS_UNIT = C.BUSINESS_UNIT AND A.LINE_SEQ_NUM = C.LINE_SEQ_NUM AND A.INVOICE = C.INVOICE and a.source_id = c.source_id LEFT OUTER JOIN D_CURRENCY_CONVERSION_AVG_VW D ON A.ACCOUNTING_PERIOD = D.MONTH_ID AND A.FISCAL_YEAR = D.YEAR_ID AND BB.BASE_CURRENCY = D.FROM_CUR LEFT OUTER JOIN D_CURRENCY_CONVERSION_CRRNT_VW DD ON BB.BASE_CURRENCY = DD.FROM_CUR --LEFT OUTER JOIN D_DEPT_SS_PRE_CONVERT F ON A.DEPTID = F.DEPTID LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM and c.source_id = h.source_id LEFT OUTER JOIN D_PROJECT_SS L ON C.PROJECT_ID = L.PROJECT_ID LEFT OUTER JOIN D_PRODUCT_SS Q ON C.PRODUCT_ID = Q.PRODUCT_ID LEFT OUTER JOIN D_DATE T ON A.ACCOUNTING_DT = T.DATE_ID LEFT OUTER JOIN D_INVOICE_SS U ON A.BUSINESS_UNIT = U.LEGAL_ENTITY AND A.INVOICE = U.INVOICE LEFT OUTER JOIN D_DATE V ON A.JOURNAL_DATE = V.DATE_ID LEFT OUTER JOIN D_ORDER_SS W ON C.CONTRACT_NUM = W.CONTRACT_NUM AND C.CONTRACT_LINE_NUM = W.CONTRACT_LINE_NUM WHERE B.ACCT_LEVEL2_CD = 'CURYRERN' AND B.ACCT_LEVEL3_CD IN ('OPERREVS','COSTREVN','NNOPREX1') GROUP BY A.BUSINESS_UNIT_GL, V.CDN, T.CDN, A.JOURNAL_ID, C.CONTRACT_NUM, C.CONTRACT_LINE_NUM, C.INVOICE, W.ORDER_NO, H.CUST_ID, U.CUST_ID, A.DEPTID, Q.BUS_MODEL_CD, B.BUS_MODEL_CD, C.PROJECT_ID, A.ACCOUNT, Q.PRODUCT_ID, H.PRODUCT_ID, A.RESOURCE_TYPE, A.RESOURCE_CATEGORY, A.RESOURCE_SUB_CAT, L.ZZ_PROJ_CAT_ID, BB.BASE_CURRENCY, D.RATE |
|
|||
|
Eric wrote:
> The query is listed first, which is basically the same as the query > that we've been using successfully for years, except for the line in > the WHERE clause of SOURCE_ID = 1 (SOURCE_ID is the field that we've > added to the dimension table as part of its primary key). Putting it in WHERE defeats the left outer join to D_CONTRACT_SS. You should move it to the join instead, i.e. LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM AND H.SOURCE_ID = 1 which I suspect will also fix the speed issue. |
|
|||
|
Eric (evajen01@yahoo.com) writes:
> Certainly, I will post the query and the CREATE table script. They > are quite lengthly, so I avoided posting them originally. > > The query is listed first, which is basically the same as the query > that we've been using successfully for years, except for the line in > the WHERE clause of SOURCE_ID = 1 (SOURCE_ID is the field that we've > added to the dimension table as part of its primary key). The CREATE > table script for the dimension table follows the query. I apologize > for all of the superfluous information included, but I thought it > would be best to post the query in its entirety. After the create > table script, I've posted a work-around query as our solution, which > seems to execute o.k., but I don't understand why we would have to use > this type of a workaround, where we are basically creating arbitrary > data that exists only to perform a join. Thanks for considering. From your original post, it sounded like it was a two-table query. See there a good reason to post the entire query! Now, your "workaround" is not only a workaround, it is also a correct version of the query. The slow version is very likely to be incorrect, as Ed pointed out. If you say: SELECT ... FROM A LEFT JOIN B ON A.col = B.col The result set will include all rows in A since this is an outer join. But if you know add a WHERE condition: WHERE B.col = 1 You know filter so that you include only rows where B.col = 1. This means that if a row in A does not have a matching row in B, that row is no longer in the result set. You outer join has effectively become an inner join. Thus: > LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND > C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM >... > WHERE > B.ACCT_LEVEL2_CD = 'CURYRERN' AND > B.ACCT_LEVEL3_CD IN ('OPERREVS','COSTREVN','NNOPREX1') > AND H.SOURCE_ID = '1' should probably be: LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM AND H.SOURCE_ID = 1 which also what you have in your "workaround" in a somewhat more contrived way. Besides, I notice that also the B table is left-joined in the query, but that is effectively an inner join given the WHERE clause. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx |
|
|||
|
Eric wrote:
> Recently, we had to add a field to the dimension table and make it > part of the primary key. So, now, the dimension table has a 3-field > compound primary key made up of field1, field2 and field3. In my > query, however, I'm still only joining to the dimension table on > field1 and field2. This has dramatically deteriorated the performance > of the query. > > My question is: why would adding an additional field to the primary > key in my dimension table so deteriorate the performance of the query, > even when I am not including the newly added field in the query or as > part of the join? My problem is, I really can't include the newly > added field as part of the join, since it does not exist in my fact > table. Adding a field to an index means that the index is larger - especially if this were a char field. This in turn could change the optimisers selection of "best" plan. |
|
|||
|
Thanks to everyone for the feedback. I get what you are saying about
adding the filter to the WHERE clause negating the LEFT JOIN. What I can't understand is why the 'contrived' workaround that I am using runs faster than simply adding the H.SOURCE_ID = 1 to the join, as: LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM AND H.SOURCE_ID = 1 The workaround that I am using seems like it would be more inefficient, since I basically am creating a source id in a subquery for tables where it does not exist for the sole purpose of creating the join, as: FROM (select '1' as source_id, * from S1_PS_BI_ACCT_ENTRY) A LEFT OUTER JOIN (select '1' as source_id, * from S1_PS_BI_LINE) C ON A.BUSINESS_UNIT = C.BUSINESS_UNIT AND A.LINE_SEQ_NUM = C.LINE_SEQ_NUM AND A.INVOICE = C.INVOICE and a.source_id = c.source_id LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM and c.source_id = h.source_id I can't figure out why this runs, while adding the SOURCE_ID = 1 to the JOIN makes the query hang. I guess I should be glad that I can get it to run this way, but I wish I understood better why it works one way, and not the other. Thanks! |
|
|||
|
Eric (evajen01@yahoo.com) writes:
> Thanks to everyone for the feedback. I get what you are saying about > adding the filter to the WHERE clause negating the LEFT JOIN. > > What I can't understand is why the 'contrived' workaround that I am > using runs faster than simply adding the H.SOURCE_ID = 1 to the join, > as: > > LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM > AND C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM > AND H.SOURCE_ID = 1 > > The workaround that I am using seems like it would be more > inefficient, since I basically am creating a source id in a subquery > for tables where it does not exist for the sole purpose of creating > the join, as: > > FROM > (select '1' as source_id, * from S1_PS_BI_ACCT_ENTRY) A > LEFT OUTER JOIN (select '1' as source_id, * from S1_PS_BI_LINE) C ON > A.BUSINESS_UNIT = C.BUSINESS_UNIT AND A.LINE_SEQ_NUM = C.LINE_SEQ_NUM > AND A.INVOICE = C.INVOICE and a.source_id = c.source_id > LEFT OUTER JOIN D_CONTRACT_SS H ON C.CONTRACT_NUM = H.CONTRACT_NUM > AND > C.CONTRACT_LINE_NUM = H.CONTRACT_LINE_NUM > and c.source_id = h.source_id > > I can't figure out why this runs, while adding the SOURCE_ID = 1 to > the JOIN makes the query hang. I guess I should be glad that I can > get it to run this way, but I wish I understood better why it works > one way, and not the other. Logically they are the same, at least as far as I can see, but maybe the optimizer disagrees. Or it just did just see trhough it. Keep in mind that the default behaviour is that any query of this complexity is slow. The optimizer then works by from *statistics* *sampled* about the data distribution in the tables and *estimates* what is the best way to execute the query. The optimizer does not consider all possible query plans, because that would take too long time, but stops when it has found a plan which seems good enough. It spoils you by getting things right most the time, but when it doesn't, you get hurt. To be able to tell why one query works well or another, I would need to see the different query plans with estimates and all, and it would still be a difficult job. Consider yourself lucky that you found a workaround by being a bit quirky. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
|
|
![]() |
| Popular Tags in the Forum |
| performance, query |
| Thread Tools | |
| Display Modes | |
|
|