Why doesnt SQL Server just add a predicate in this case?
OUL.OrgUnitId = OUL.SourceOrgUnitId
But the second one is way more expensive. I thought that a query plan of the first query could have been modified to simply include the predicate with the expression above on the one operator. But SQL Server is either costing it as too expensive, or not generating that plan as a candidate to consider.
I'm running SQL Server 2014, SP2. When I use the old cardinality estimator, I still see the issue.
https://gist.github.com/mjswart-d2l/6b59e0af73de06c562cb2e52b99fa65d
Your query has the following relevant equivalences:
- OUL.SourceOrgUnitId = OUL.OrgUnitId
- OUI.OrgUnitId = GI.GroupId
Applying those to the written 'join predicate' GI.GroupId = OUL.OrgUnitId, the written predicate becomes:
- GI.GroupId = OUL.SourceOrgUnitId; then
- OUI.OrgUnitId = OUL.SourceOrgUnitId
The final form OUI.OrgUnitId = OUL.SourceOrgUnitId relates the tables correctly, but there is no indexable expression predicate to form the basis of an indexed nested loops join into OUL (no index with SourceOrgUnitId as the leading key).
Given multiple (explicit or implicit) equivalences in a query, the optimizer will not explore all possible combinations against all indexes in order to make an indexed nested loops join possible.
As far as I am aware, this is deliberate: it avoids exploding compilation time and search space. Some queries (esp. with self-joins, overlapping predicates, imperfect indexing) may not always find the ideal index plan for all possible query specifications.
In your particular example, we can workaround the issue in more natural ways than given in the question. One is to write a problematic equivalence earlier:
DECLARE @UserId INT = 71710; DECLARE @SectionGroupTypeID INT = 819; SELECT OUL.UserId, OUL.OrgUnitId, OUL.SourceOrgUnitId FROM GROUPS GI JOIN ORG_USERS OUI ON OUI.OrgUnitId = GI.GroupId -- **Derived table with moved predicate** JOIN (SELECT * FROM ORG_USERS AS OU WHERE OU.SourceOrgUnitId = OU.OrgUnitId) OUL ON GI.GroupId = OUL.OrgUnitId WHERE GI.GroupTypeID = @SectionGroupTypeId AND OUI.UserId = @UserId OPTION (MAXDOP 1);
This is a possible practical workaround rather than a general recommendation since it does not rely on documented behaviour. The exact shape of the query tree (and so which predicates appear 'before' or 'after' others at simplification time) depends on numerous factors and implementation details that could change.
Simplified repro
CREATE TABLE #T1 (col1 integer PRIMARY KEY); CREATE TABLE #T2 (col1 integer PRIMARY KEY, col2 integer NOT NULL); INSERT #T1 (col1) VALUES (1); INSERT #T2 (col1, col2) SELECT SV.number, SV.number FROM master.dbo.spt_values AS SV WHERE SV.[type] = N'P'; SELECT * FROM #T1 AS T1 JOIN #T2 AS T2 ON T2.col1 = T1.col1 WHERE T2.col1 = T2.col2; WITH T2 AS ( SELECT * FROM #T2 AS T2 WHERE T2.col1 = T2.col2 ) SELECT * FROM #T1 AS T1 JOIN T2 ON T2.col1 = T1.col1; DROP TABLE #T1, #T2;
Given all the equivalences that the optimizer could evaluate, I would have hoped that it could have explored more. Like at least the equivalences defined by the joins, or at least the equivalences when there are less then 3 or 4 tables.
I think this stuff is fascinating. You know if you wanted to turn it into a post. You'd have at least one reader. Looks like you've already written most of it 🙂
Thanks again
OUL.OrgUnitId ^ OUL.SourceOrgUnitId = 0
Then I get the predicate in the operator and the "good" plan.