Why doesn't Query optimizer eliminate unneeded left join in view with Pivot?

Check out May 16th comments on blog post http://michaeljswart.com/2011/06/forget-about-pivot/

Brad Schulz provides a case where a view on a pivot statement uses a left join. When querying the view, the join is redundant, but not eliminated by the query optimizer.

This is surprising. I would have guessed that SQL Server would remove the unnecessary join during the simplifications step.

I guess:

 LogOp_LeftOuterJoin
 
     LogOp_Get ...
         
     LogOp_Pivot ...

either can't be simplified or won't

avatar image By mjswart 146 asked May 28, 2013 at 02:50 PM
more ▼
(comments are locked)
avatar image mjswart May 30, 2013 at 07:59 PM

Just playing around with SQLFiddle. Here is this example available in a handy SQLFiddle link: http://sqlfiddle.com/#!6/ae58b/2

avatar image SQLkiwi ♦ May 30, 2013 at 10:48 PM

Here's another one showing another way to write the view with PIVOT so the correct property is set, and join elimination can occur. And yet another one that is arguably simpler and cooler because it doesn't use extra explicit aggregates. Note both approaches result in a trivial plan.

avatar image mjswart May 31, 2013 at 12:40 PM

I should have learned by now, that you can't out-thorough the SQLKiwi.

10|10000 characters needed characters left

2 answers: sort voted first

There are a number of stages at which the optimizer can remove unnecessary joins:

  1. Simplification

  2. Join collapsing

  3. Exploration

Also, there are two main reasons for a join to be regarded as 'necessary':

  1. It projects new columns used by the result set; or

  2. It affects result set cardinality

Projecting new columns includes the case where an outer join introduces NULL-extended rows. Affecting result cardinality includes cases where rows are eliminated by an inner join (no matching row), or rows from one input are duplicated because multiple rows with the same join key exist in the other input.

In the example given, and after expanding the view definition, the query is:

 SELECT
     v.customerid,
     v.customername
 FROM 
 (
     SELECT
         c.customerid,
         c.customername,
         HomePhone = pn.Home,
         WorkPhone = pn.Work,
         MobilePhone = pn.Mobile,
         FaxPhone = pn.Fax
     FROM dbo.customers AS c
     LEFT JOIN 
     (
         SELECT 
             pvt.customerid,
             pvt.Home,
             pvt.Work,
             pvt.Mobile,
             pvt.Fax
         FROM dbo.phones AS p
         PIVOT 
         (
             MAX(p.phonenumber) 
             FOR phonetype IN (Home, Work, Mobile, Fax)
         ) AS pvt
     ) AS pn
         ON pn.customerid = c.customerid
 ) AS v;

Query plan

Looking at the two reasons a join can be regarded as necessary in turn, we see that point (1) does not apply, since we are projecting only columns from the customers table.

Therefore, point (2) must apply: the join affects the cardinality of the result set. Now, the use of LEFT JOIN means that customers without a matching entry in the phones table will not affect result cardinality (if no row exists, a single row will still be returned from join, NULL-extended for the missing data).

The only remaining possibility is that performing the join would result in more than one row in the result set per customer. Clearly this cannot be the case, however, since the expansion of the PIVOT includes a GROUP BY customerid clause.

And this is where the issue arises. When properties are derived for the LogOp_Pivot, the code does not correctly set the property that guarantees a maximum of one row per customerid. Without this property set, the various stages and rules that can remove a redundant join cannot be applied successfully.

If the query is changed to add a DISTINCT (the only change is in the first line):

 SELECT DISTINCT
     v.customerid,
     v.customername
 FROM 
 (
     SELECT
         c.customerid,
         c.customername,
         HomePhone = pn.Home,
         WorkPhone = pn.Work,
         MobilePhone = pn.Mobile,
         FaxPhone = pn.Fax
     FROM dbo.customers AS c
     LEFT JOIN 
     (
         SELECT 
             pvt.customerid,
             pvt.Home,
             pvt.Work,
             pvt.Mobile,
             pvt.Fax
         FROM dbo.phones AS p
         PIVOT 
         (
             MAX(p.phonenumber) 
             FOR phonetype IN (Home, Work, Mobile, Fax)
         ) AS pvt
     ) AS pn
         ON pn.customerid = c.customerid
 ) AS v;

...(or an equivalent outer GROUP BY is added), we are saying we don't want to see any duplicated join input rows, so the join is optimized away by the RedundantLOJN simplification rule, despite the incorrect pivot property:

Query plan

When the PIVOT is expressed using manual CASE, MAX, and GROUP BY, the properties are derived correctly and RedundantLOJN kicks in to remove the join during the simplification stage.

sp.png (8.2 kB)
sp.png (21.0 kB)
avatar image By SQLkiwi ♦ 6.6k answered May 28, 2013 at 11:34 PM
more ▼
(comments are locked)
avatar image mjswart May 29, 2013 at 12:51 PM

Thanks so much Paul. That's great info!

10|10000 characters needed characters left

I can't see the SQL inside the view or the DDL but can SQL reliably determine the cardinality of the JOIN? If you join phones to customers and only need customers you would still get a potential one to many in the output. Perhaps the the PIVOT command isn't in the list of simplification rules for this type of join reduction.

avatar image By Dan Holmes 725 answered May 28, 2013 at 03:14 PM
more ▼
(comments are locked)
avatar image mjswart May 28, 2013 at 03:25 PM

Hi Dan, the full reproduction is described at the link I provided. (The attachment is here to provide the query plan).

I understand what you're saying. In order for a join to not matter it has to (a) not filter rows - guaranteed in a left join, (b) provide at most one matching row per lookup and (c) have an empty or unused output list.

In this case, I believe all three are true and SQL Server either doesn't believe it or doesn't recognize it.

And I think you're right. It certainly seems that Left Joins eliminations aren't considered when the inner input is a pivot.

10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question

Topics:

x631
x445
x2

asked: May 28, 2013 at 02:50 PM

Seen: 1792 times

Last Updated: May 31, 2013 at 12:40 PM