Why doesnt SQL Server just add a predicate in this case?

mjswart 2017-08-22 14:45:02

The attached query plan has two queries in it. The queries differ by one expression in the WHERE clause, The second one has this:

OUL.OrgUnitId = OUL.SourceOrgUnitId

But the second one is way more expensive. I thought that a query plan of the first query could have been modified to simply include the predicate with the expression above on the one operator. But SQL Server is either costing it as too expensive, or not generating that plan as a candidate to consider.

I'm running SQL Server 2014, SP2. When I use the old cardinality estimator, I still see the issue.

mjswart 2017-08-22 16:14:36
It's funny. If I change the where clause to
OUL.OrgUnitId ^ OUL.SourceOrgUnitId = 0
Then I get the predicate in the operator and the "good" plan.
mjswart 2017-08-22 17:24:27
Here's a reproduction of the issue (seen to reproduce the issue on sql server 2014 and sql server 2016)
https://gist.github.com/mjswart-d2l/6b59e0af73de06c562cb2e52b99fa65d
SQLkiwi 2017-08-23 11:32:01
For inner join, there is no logical distinction between a join predicate and a where clause predicate. Inner join starts life as a cross product with predicates applied.

Your query has the following relevant equivalences:

  1. OUL.SourceOrgUnitId = OUL.OrgUnitId
  2. OUI.OrgUnitId = GI.GroupId

Applying those to the written 'join predicate' GI.GroupId = OUL.OrgUnitId, the written predicate becomes:

  1. GI.GroupId = OUL.SourceOrgUnitId; then
  2. OUI.OrgUnitId = OUL.SourceOrgUnitId

The final form OUI.OrgUnitId = OUL.SourceOrgUnitId relates the tables correctly, but there is no indexable expression predicate to form the basis of an indexed nested loops join into OUL (no index with SourceOrgUnitId as the leading key).

Given multiple (explicit or implicit) equivalences in a query, the optimizer will not explore all possible combinations against all indexes in order to make an indexed nested loops join possible.

As far as I am aware, this is deliberate: it avoids exploding compilation time and search space. Some queries (esp. with self-joins, overlapping predicates, imperfect indexing) may not always find the ideal index plan for all possible query specifications.

In your particular example, we can workaround the issue in more natural ways than given in the question. One is to write a problematic equivalence earlier:

DECLARE @UserId INT = 71710;
DECLARE @SectionGroupTypeID INT = 819;
 
SELECT OUL.UserId, 
       OUL.OrgUnitId, 
       OUL.SourceOrgUnitId
  FROM GROUPS GI
  JOIN ORG_USERS OUI
       ON OUI.OrgUnitId = GI.GroupId
  -- **Derived table with moved predicate**
  JOIN (SELECT * FROM ORG_USERS AS OU WHERE OU.SourceOrgUnitId = OU.OrgUnitId) OUL 
       ON GI.GroupId = OUL.OrgUnitId
 WHERE GI.GroupTypeID = @SectionGroupTypeId
   AND OUI.UserId = @UserId
 OPTION (MAXDOP 1);

This is a possible practical workaround rather than a general recommendation since it does not rely on documented behaviour. The exact shape of the query tree (and so which predicates appear 'before' or 'after' others at simplification time) depends on numerous factors and implementation details that could change.


Simplified repro

CREATE TABLE #T1 (col1 integer PRIMARY KEY);
CREATE TABLE #T2 (col1 integer PRIMARY KEY, col2 integer NOT NULL);
 
INSERT #T1 (col1)
VALUES (1);
 
INSERT #T2 (col1, col2)
SELECT SV.number, SV.number 
FROM master.dbo.spt_values AS SV
WHERE SV.[type] = N'P';
 
SELECT
    *
FROM #T1 AS T1
JOIN #T2 AS T2
    ON T2.col1 = T1.col1
WHERE
    T2.col1 = T2.col2;
 
WITH T2 AS
(
    SELECT * 
    FROM #T2 AS T2
    WHERE T2.col1 = T2.col2
)
SELECT
    *
FROM #T1 AS T1
JOIN T2
    ON T2.col1 = T1.col1;
 
DROP TABLE #T1, #T2;
mjswart 2017-08-23 12:25:15
Thanks Paul,

Given all the equivalences that the optimizer could evaluate, I would have hoped that it could have explored more. Like at least the equivalences defined by the joins, or at least the equivalences when there are less then 3 or 4 tables.

I think this stuff is fascinating. You know if you wanted to turn it into a post. You'd have at least one reader. Looks like you've already written most of it 🙂

Thanks again