TOP used before the predicate – feels like a Sql bug but certainly i am missing something

Dan Holmes 2014-05-14 21:03:18

The SQL is pretty simple, i have a CTE with 3 unions. Two of the three have a TOP (1). The outside statement has the predicate. The query plan shows that the first statement in the CTE uses the predicate in the seeks. The second two process the TOP before the filter which reduces the row count to zero – which is wrong.

I checked BOL and it doesn't mention the processing order but everything i knew about logical processing says the predicates/joins are first then the projection and finally the TOP and order bys. Based on the link below and an image by Jet Wang in the thread. This seems to back up my thoughts. But certainly i am missing something. This seems too rudimentary for it to be a sql bug. thoughts?

http://social.msdn.microsoft.com/Forums/sqlserver/en-US/70efeffe-76b9-4b7e-b4a1-ba53f5d21916/order-of-execution-of-sql-queries

SQLkiwi 2014-05-14 21:21:04
You just have to remember that logical processing order also has scope. Each of your SELECTs has its own scope (i.e. a self-contained logical processing order of its own), e.g.:

SELECT TOP (1)
    'g' objecttype,
    v.vehicleid,
    v.runid,
    v.rundate, 
    v.id, 
    startdatetime,
    1--matches the CROSS JOIN for Pickup
FROM dbo.tblRunactual v  
WHERE v.verified = 1
ORDER BY v.StartDateTime

…is its own scope. The optimizer can move things around within a query (even across scopes) but it must retain the original semantics.

The semantics of your query is that the scope above is logically evaluated before the outer filter is applied. It may not be what you intended, but nevertheless it is correct in SQL.

Dan Holmes 2014-05-14 21:37:06
Why was the scope different with and without the TOP? And a rewritten version using GROUP BY does what i am after but i don't get the scope thing. How do i know what starts and ends it? The GROUP BY version here apparently has a characteristic that allows the outside filter to be injected. Is this something particular to the TOP statement?

WITH stops AS (
    SELECT 't' objecttype, v.vehicleid, v.runid, v.rundate, v.id Runactualid
        , CASE stoptype.ispu WHEN 1 THEN ta.startdatetime ELSE ta.enddatetime END stoptime, stoptype.ispu 
    FROM dbo.tblRunactual v --ON ga.id = ra.garageactualid
    INNER JOIN dbo.tbltripsactual ta ON ta.runactualid = v.id
    CROSS JOIN (SELECT 1 UNION ALL SELECT -1) stoptype(ispu)
    --only non-cancelled/non-noshow trips.  the name of the view is empty minutes remember.
    WHERE  ta.cancelled = 0 AND (ta.noshow = 0 OR (ta.noshow = 0 AND ispu = 1))
    UNION 
    SELECT  'g' objecttype, v.vehicleid, v.runid, v.rundate, v.id, MIN(startdatetime), 1--matches the CROSS JOIN for Pickup
    FROM dbo.tblRunactual v  
    WHERE v.verified = 1
    GROUP BY v.vehicleid, v.runid, v.rundate, v.id
    UNION 
    SELECT 'g' objecttype, v.vehicleid, v.runid, v.rundate, v.id, MAX(enddatetime), -1--matches the CROSS JOIN for dropoff
    FROM dbo.tblRunactual v
    WHERE v.verified = 1
    GROUP BY v.vehicleid, v.runid, v.rundate, v.id
)SELECT * FROM stops WHERE vehicleid = 409 AND rundate = 20140219  

SQLkiwi 2014-05-15 00:53:42
The logical query processing scopes don't change; the semantics do.

As I say in the answer, the optimizer cannot change what the query means, it can only try to find an efficient physical implementation given the result specification.

Pushing the filter down into the scope shown would change the meaning of the query. Finding the first row in StartDateTime order where verified = 1 is just not the same thing as finding the first row with the extra filter condition vehicleid = 409 AND rundate = 20140219.

Without the TOP, the optimizer has much more freedom because placement of the filter does not affect the semantics, only the performance. This reasoning is quite general (it has to be!) and certainly not limited to TOP.

SQLkiwi 2014-05-15 01:02:29
By way of further explanation, the semantics of your GROUP BY rewrite do not match those of the original query. Projecting columns from the single row that sorts first by StartDateTime is not the same thing as grouping by those projected columns and finding MIN(StartDateTime) over the whole set. The exception being if the projected columns have the same value in every row in the set. I don't know if that is true in your case, and neither does the optimizer.

BTW did you mean to change UNION ALL to UNION in the rewrite?

Dan Holmes 2014-05-15 01:27:18
In not understanding the semantics of the statement with TOP it turns out that the GROUP BY version is what i really meant. Ironically i had a comment in the code that said i could use a GROUP BY or a DISTINCT and i chose the DISTINCT one. Now after this thread i realize i picked the wrong one. If you hadn't connected the dots, these are the pieces of the plan i view i posted under the title "Filling in the gaps: there has to be a better way" a couple days ago. Thanks for the help or i would have considered this code correct and put it in my product.

UNION ALL to UNION
Yes. A subsequent CTE had a DISTINCT that i removed because i removed the ALL.

SQLkiwi 2014-05-15 02:00:45
Aha! All's well that ends well then 🙂

And yes, I did recall the code from the other question.