Why doesn't Query optimizer eliminate unneeded left join in view with Pivot?

mjswart 2013-05-28 14:50:17

Check out May 16th comments on blog post http://michaeljswart.com/2011/06/forget-about-pivot/

Brad Schulz provides a case where a view on a pivot statement uses a left join. When querying the view, the join is redundant, but not eliminated by the query optimizer.

This is surprising. I would have guessed that SQL Server would remove the unnecessary join during the simplifications step.

I guess:

LogOp_LeftOuterJoin
 
    LogOp_Get ...
 
    LogOp_Pivot ...

either can't be simplified or won't

mjswart 2013-05-30 19:59:32
Just playing around with SQLFiddle. Here is this example available in a handy SQLFiddle link: http://sqlfiddle.com/#!6/ae58b/2
SQLkiwi 2013-05-30 22:48:06
Here's another one showing another way to write the view with PIVOT so the correct property is set, and join elimination can occur. And yet another one that is arguably simpler and cooler because it doesn't use extra explicit aggregates. Note both approaches result in a trivial plan.
mjswart 2013-05-31 12:40:42
I should have learned by now, that you can't out-thorough the SQLKiwi.
SQLkiwi 2013-05-28 23:34:08
There are a number of stages at which the optimizer can remove unnecessary joins:

  1. Simplification
  2. Join collapsing
  3. Exploration

Also, there are two main reasons for a join to be regarded as 'necessary':

  1. It projects new columns used by the result set; or
  2. It affects result set cardinality

Projecting new columns includes the case where an outer join introduces NULL-extended rows. Affecting result cardinality includes cases where rows are eliminated by an inner join (no matching row), or rows from one input are duplicated because multiple rows with the same join key exist in the other input.

In the example given, and after expanding the view definition, the query is:

SELECT
    v.customerid,
    v.customername
FROM 
(
    SELECT
        c.customerid,
        c.customername,
        HomePhone = pn.Home,
        WorkPhone = pn.Work,
        MobilePhone = pn.Mobile,
        FaxPhone = pn.Fax
    FROM dbo.customers AS c
    LEFT JOIN 
    (
        SELECT 
            pvt.customerid,
            pvt.Home,
            pvt.Work,
            pvt.Mobile,
            pvt.Fax
        FROM dbo.phones AS p
        PIVOT 
        (
            MAX(p.phonenumber) 
            FOR phonetype IN (Home, Work, Mobile, Fax)
        ) AS pvt
    ) AS pn
        ON pn.customerid = c.customerid
) AS v;

Query plan

Looking at the two reasons a join can be regarded as necessary in turn, we see that point (1) does not apply, since we are projecting only columns from the customers table.

Therefore, point (2) must apply: the join affects the cardinality of the result set. Now, the use of LEFT JOIN means that customers without a matching entry in the phones table will not affect result cardinality (if no row exists, a single row will still be returned from join, NULL-extended for the missing data).

The only remaining possibility is that performing the join would result in more than one row in the result set per customer. Clearly this cannot be the case, however, since the expansion of the PIVOT includes a GROUP BY customerid clause.

And this is where the issue arises. When properties are derived for the LogOp_Pivot, the code does not correctly set the property that guarantees a maximum of one row per customerid. Without this property set, the various stages and rules that can remove a redundant join cannot be applied successfully.

If the query is changed to add a DISTINCT (the only change is in the first line):

SELECT DISTINCT
    v.customerid,
    v.customername
FROM 
(
    SELECT
        c.customerid,
        c.customername,
        HomePhone = pn.Home,
        WorkPhone = pn.Work,
        MobilePhone = pn.Mobile,
        FaxPhone = pn.Fax
    FROM dbo.customers AS c
    LEFT JOIN 
    (
        SELECT 
            pvt.customerid,
            pvt.Home,
            pvt.Work,
            pvt.Mobile,
            pvt.Fax
        FROM dbo.phones AS p
        PIVOT 
        (
            MAX(p.phonenumber) 
            FOR phonetype IN (Home, Work, Mobile, Fax)
        ) AS pvt
    ) AS pn
        ON pn.customerid = c.customerid
) AS v;

…(or an equivalent outer GROUP BY is added), we are saying we don't want to see any duplicated join input rows, so the join is optimized away by the RedundantLOJN simplification rule, despite the incorrect pivot property:

Query plan

When the PIVOT is expressed using manual CASE, MAX, and GROUP BY, the properties are derived correctly and RedundantLOJN kicks in to remove the join during the simplification stage.

mjswart 2013-05-29 12:51:55
Thanks so much Paul. That's great info!
Dan Holmes 2013-05-28 15:14:31
I can't see the SQL inside the view or the DDL but can SQL reliably determine the cardinality of the JOIN? If you join phones to customers and only need customers you would still get a potential one to many in the output. Perhaps the the PIVOT command isn't in the list of simplification rules for this type of join reduction.
mjswart 2013-05-28 15:25:08
Hi Dan, the full reproduction is described at the link I provided. (The attachment is here to provide the query plan).

I understand what you're saying. In order for a join to not matter it has to (a) not filter rows – guaranteed in a left join, (b) provide at most one matching row per lookup and (c) have an empty or unused output list.

In this case, I believe all three are true and SQL Server either doesn't believe it or doesn't recognize it.

And I think you're right. It certainly seems that Left Joins eliminations aren't considered when the inner input is a pivot.