DISTINCT clause degrading performance
select distinct a11.CustomerDivisionKey CustomerDivisionKey, a11.CustomerDivisionID CustomerDivisionID, a11.CustomerDivisionName CustomerDivisionName from Dim_CustomerDivision a11 join Fact_Project a12 on (a11.CustomerDivisionKey = a12.CustomerDivisionKey) join vwDataPermission_Project_Requsition a13 on (a12.ProjectKey = a13.SSOProjectKey) where (a11.CustomerDivisionKey not in (-9) and a13.UserName = Nfirstname.lastname@example.org') order by 1 asc
If I remove DISTINCT clause from the above script then it executes within 5 sec. But I cannot remove DISTINCT clause because this sql query is created by microstrategy tool. I also updated the statistics of these tables but that did not improve the performance at all.
So please let me know your thoughts as to how can I improve the performance without changing the sql script.
Please find the attachment for its actual execution plan.
Here are some generic thoughts.
First: How many rows did each of the two queries return? If the numbers are different then it shows that the DISTINCT is actuallly needed. If they are the same then it does not show they are not needed – just that it does not do anything for these specific parameters and on your specific data set on the staging server.
Second: DISTINCT will always lower performance because you force the optimizer to do the extra work to check for duplicated data. (Unless there are reasons based on constraints and/or query logic that the optimizer can use to interfere that duplicates are actually impossible). But from 5 seconds to 8 minutes indicates that for some reasont he entire execution plan was completely changed for this, because just adding the logic to remove duplicates to a 5 second execution should at worst double it or so.
Third: I do not know the schema of the two tables involved, nor the definition of the view so I have to do a bit of gambling here. But I guess that each row in Fact_Project joins to exactly one row in Dim_CustomerDivision, whereas each customer division can be related to multiple projects. I also guess that each project relates to multiple rows from the view, but just one of them can be for this particular UserName. The scenario where the query without DISTINCT would return duplicates is when this user has access to two or more projects within the same CustomerDivision.
Even though you join all tables, you only return data from the first. So one other version of the query you can try would be:
SELECT a11.CustomerDivisionKey, a11.CustomerDivisionID, a11.CustomerDivisionName FROM dbo.Dim_CustomerDivision AS a11 WHERE a11.CustomerDivisionKey <> -9 AND EXISTS (SELECT * FROM dbo.Fact_Project AS a12 INNER JOIN dbo.vwDataPermission_Project_Requsition AS a13 ON a13.SSOProjectKey = a12.ProjectKey WHERE a12.CustomerDivisionKey = a11.CustomerDivisionKey AND a13.UserName = Nemail@example.com') ORDER BY CustomerDivisionKey;
You'll need to test it. Check that I didn't mess up, and it returns the correct results. Also check performance – I HOPE it will be better but no guarantees.
If it does, then the problem is of course: how do you convince your query generating tool to generate a query in this form. That is one of mamy reasons why I dislike code that generates SQL. Some of those tools allow you to hand-craft code or to call stored procedures – perhaps that can be a solution for you?