Why doesn't Query optimizer eliminate unneeded left join in view with Pivot?
Brad Schulz provides a case where a view on a pivot statement uses a left join. When querying the view, the join is redundant, but not eliminated by the query optimizer.
This is surprising. I would have guessed that SQL Server would remove the unnecessary join during the simplifications step.
LogOp_LeftOuterJoin LogOp_Get ... LogOp_Pivot ...
either can't be simplified or won't
- Join collapsing
Also, there are two main reasons for a join to be regarded as 'necessary':
- It projects new columns used by the result set; or
- It affects result set cardinality
Projecting new columns includes the case where an outer join introduces NULL-extended rows. Affecting result cardinality includes cases where rows are eliminated by an inner join (no matching row), or rows from one input are duplicated because multiple rows with the same join key exist in the other input.
In the example given, and after expanding the view definition, the query is:
SELECT v.customerid, v.customername FROM ( SELECT c.customerid, c.customername, HomePhone = pn.Home, WorkPhone = pn.Work, MobilePhone = pn.Mobile, FaxPhone = pn.Fax FROM dbo.customers AS c LEFT JOIN ( SELECT pvt.customerid, pvt.Home, pvt.Work, pvt.Mobile, pvt.Fax FROM dbo.phones AS p PIVOT ( MAX(p.phonenumber) FOR phonetype IN (Home, Work, Mobile, Fax) ) AS pvt ) AS pn ON pn.customerid = c.customerid ) AS v;
Looking at the two reasons a join can be regarded as necessary in turn, we see that point (1) does not apply, since we are projecting only columns from the customers table.
Therefore, point (2) must apply: the join affects the cardinality of the result set. Now, the use of LEFT JOIN means that customers without a matching entry in the phones table will not affect result cardinality (if no row exists, a single row will still be returned from join, NULL-extended for the missing data).
The only remaining possibility is that performing the join would result in more than one row in the result set per customer. Clearly this cannot be the case, however, since the expansion of the PIVOT includes a GROUP BY customerid clause.
And this is where the issue arises. When properties are derived for the LogOp_Pivot, the code does not correctly set the property that guarantees a maximum of one row per customerid. Without this property set, the various stages and rules that can remove a redundant join cannot be applied successfully.
If the query is changed to add a DISTINCT (the only change is in the first line):
SELECT DISTINCT v.customerid, v.customername FROM ( SELECT c.customerid, c.customername, HomePhone = pn.Home, WorkPhone = pn.Work, MobilePhone = pn.Mobile, FaxPhone = pn.Fax FROM dbo.customers AS c LEFT JOIN ( SELECT pvt.customerid, pvt.Home, pvt.Work, pvt.Mobile, pvt.Fax FROM dbo.phones AS p PIVOT ( MAX(p.phonenumber) FOR phonetype IN (Home, Work, Mobile, Fax) ) AS pvt ) AS pn ON pn.customerid = c.customerid ) AS v;
…(or an equivalent outer GROUP BY is added), we are saying we don't want to see any duplicated join input rows, so the join is optimized away by the RedundantLOJN simplification rule, despite the incorrect pivot property:
When the PIVOT is expressed using manual CASE, MAX, and GROUP BY, the properties are derived correctly and RedundantLOJN kicks in to remove the join during the simplification stage.