Query running slow
I have captured the Actual plan in Plan Explorer (attached). In the plan, I see estimated rows are on higher side and RID look ups on Dim.Organization table. I need your help in finding out what's issue with this plan and ways to improve the execution times.
Query:
SELECT Dim.Organization.Product, Dim.Organization.Department, Dim.Organization.ProfitCenter, COUNT(DISTINCT CASE WHEN Dim.Submission.SubmissionCount =1 THEN CONVERT(NVARCHAR(100), CONVERT(NVARCHAR(100),Dim.Submission.SubmissionNumber) + '-' + CONVERT(NVARCHAR(100),Dim.Submission.CustomSubmissionVersionID) ) END), count(distinct Dim.Policy.PolicyNumber+Dim.Policy.PolicyDoc), Case When 'Ironshore Agency Share' ='Iron-Starr Agency Share' Then sum(Fact.PolicyTransaction.GrossWrittenPremiumRC) Else SUM((Fact.PolicyTransaction.GrossWrittenPremiumRC)*Dim.Policy.IronshoreAgencySharePct) End FROM Dim.Organization INNER JOIN Fact.PolicyTransaction ON (Fact.PolicyTransaction.OrganizationKey=Dim.Organization.OrganizationKey AND Fact.PolicyTransaction.EndDate > Convert(Date,'04/05/2015') AND Fact.PolicyTransaction.StartDate<=Convert(Date,'04/05/2015') AND Dim.Organization.StartDate<=Convert(Date,'04/05/2015') AND Dim.Organization.EndDate>Convert(Date,'04/05/2015') AND Fact.PolicyTransaction.SourceSystemID Not in (9)) INNER JOIN Dim.Submission ON (Fact.PolicyTransaction.SubmissionKey=Dim.Submission.SubmissionKey AND Fact.PolicyTransaction.EndDate > Convert(Date,'04/05/2015') AND Fact.PolicyTransaction.StartDate<=Convert(Date,'04/05/2015') AND Dim.Submission.EndDate > Convert(Date,'04/05/2015') AND Dim.Submission.StartDate <=Convert(Date,'04/05/2015') AND Fact.PolicyTransaction.SourceSystemID Not in (9)) INNER JOIN Dim.Policy ON (Dim.Policy.Policykey = Fact.PolicyTransaction.PolicyKey AND Dim.Policy.EndDate > Convert(Date,'04/05/2015') AND Dim.Policy.StartDate <=Convert(Date,'04/05/2015') AND Fact.PolicyTransaction.EndDate > Convert(Date,'04/05/2015') AND Fact.PolicyTransaction.StartDate<=Convert(Date,'04/05/2015') AND Dim.Policy.SourceSystemID Not in (9) AND Fact.PolicyTransaction.SourceSystemID Not in (9)) GROUP BY Dim.Organization.Product, Dim.Organization.Department, Dim.Organization.ProfitCenter
Anyway, there are two distinct problem areas in the submitted plan. First:
Nested Loops Join is an expensive strategy when the outer input is large. 71,764,343 rows definitely counts as large. 499,658,841 rows for the RID Lookup even more so.
To eliminate the RID Lookup, add columns StartDate and EndDate to a key of an index on the Organization table. Columns Product, Department, and ProfitCenter should be included in the index. The index also needs to have OrganizationKey. An example suitable index would be:
CREATE INDEX i -- Naming is up to you ON Dim.Organization ( OrganizationKey, EndDate, StartDate ) INCLUDE ( Product, Department, ProfitCenter );
This assumes EndDate will usually be more selective than StartDate. If that is not true for you, reverse the order of those columns. You may even need two indexes with start and end date keys in different orders for best performance overall. That is something only you can determine through analysis and testing.
The indexing on PolicyTransaction is more interesting. The potential key columns are SourceSystemID, StartDate, and EndDate, all of which are compared using inequalities. Given the very low selectivity of the values used in this query, it is a little surprising the optimizer chose a seek over a scan. Perhaps the table is very wide, or has a lot of unused space (a separate issue).
As a fact table, it is unusual to have indexes at all (aside from a clustered index, which is conspicuous by its absence). The current index on SourceSystemID is not selective, though it does at least cover the columns needed by the query. I hope that is by design, and the index does not includes all columns from the base table!. It seems possible that SourceSystemID (and possibly another column to make the index unique) should be the clustered index on this table, not a nonclustered index.
So, consider dropping the existing nonclustered on PolicyTransaction, and replacing it with a clustered index that will be useful to future queries against this table. For example:
-- Suggestion only DROP INDEX IDX_NC_SourceSystemID ON Fact.PolicyTransaction; CREATE UNIQUE CLUSTERED INDEX CX -- Name this! ON Fact.PolicyTransaction ( StartDate, EndDate, SourceSystemID, -- Other column(s) needed to make the index unique, if possible );
Again, the exact choice of key order depends on your knowledge of the data and querying patterns. SourceSystemID should probably not be the leading key if it is usually specified as an inequality, for instance.
The second problem area of the plan is:
This plan shape results from the use of multiple DISTINCT aggregates in the query. This can be tough to improve, especially within a generated-query template, but partial pre-aggregation can definitely help. See the article below for details:
You may get some relief by upgrading to SQL Server 2012 (or later) at some point:
Ultimately, though, this sort of query really needs the columnstore technology in SQL Server 2012 (and very preferably 2014) to really fly. Performance improvements of 50x or more are not uncommon, and this particular query should do better than that. That is for the future though, and columnstore does require Enterprise Edition. Nevertheless, I have to mention it because it is the most complete overall solution.
The existing plan shape (Eager Spools) is very wasteful, writing 71.3 million rows to a tempdb workfile before replaying those rows three times through three very similar aggregates. investigate the partial aggregation idea reduce the repeated and wasted effort here.
Yes it is a Generated query for a report from Business objects(SAP BO). This DB is a part of ETL work flow,where data gets loaded from Datawarehouse(GDW)to Datamart database and BO application will use it for report purpose once data is loaded.