TOP used before the predicate - feels like a Sql bug but certainly i am missing something

The SQL is pretty simple, i have a CTE with 3 unions. Two of the three have a TOP (1). The outside statement has the predicate. The query plan shows that the first statement in the CTE uses the predicate in the seeks. The second two process the TOP before the filter which reduces the row count to zero - which is wrong.

I checked BOL and it doesn't mention the processing order but everything i knew about logical processing says the predicates/joins are first then the projection and finally the TOP and order bys. Based on the link below and an image by Jet Wang in the thread. This seems to back up my thoughts. But certainly i am missing something. This seems too rudimentary for it to be a sql bug. thoughts?

http://social.msdn.microsoft.com/Forums/sqlserver/en-US/70efeffe-76b9-4b7e-b4a1-ba53f5d21916/order-of-execution-of-sql-queries

Plan.pesession (6.8 kB)
avatar image By Dan Holmes 725 asked May 14, 2014 at 09:03 PM
more ▼
(comments are locked)
10|10000 characters needed characters left

1 answer: sort voted first

You just have to remember that logical processing order also has scope. Each of your SELECTs has its own scope (i.e. a self-contained logical processing order of its own), e.g.:

 SELECT TOP (1)
     'g' objecttype,
     v.vehicleid,
     v.runid,
     v.rundate, 
     v.id, 
     startdatetime,
     1--matches the CROSS JOIN for Pickup
 FROM dbo.tblRunactual v  
 WHERE v.verified = 1
 ORDER BY v.StartDateTime

...is its own scope. The optimizer can move things around within a query (even across scopes) but it must retain the original semantics.

The semantics of your query is that the scope above is logically evaluated before the outer filter is applied. It may not be what you intended, but nevertheless it is correct in SQL.

avatar image By SQLkiwi ♦ 6.6k answered May 14, 2014 at 09:21 PM
more ▼
(comments are locked)
avatar image Dan Holmes May 14, 2014 at 09:37 PM

Why was the scope different with and without the TOP? And a rewritten version using GROUP BY does what i am after but i don't get the scope thing. How do i know what starts and ends it? The GROUP BY version here apparently has a characteristic that allows the outside filter to be injected. Is this something particular to the TOP statement? WITH stops AS ( SELECT 't' objecttype, v.vehicleid, v.runid, v.rundate, v.id Runactualid , CASE stoptype.ispu WHEN 1 THEN ta.startdatetime ELSE ta.enddatetime END stoptime, stoptype.ispu FROM dbo.tblRunactual v --ON ga.id = ra.garageactualid INNER JOIN dbo.tbltripsactual ta ON ta.runactualid = v.id CROSS JOIN (SELECT 1 UNION ALL SELECT -1) stoptype(ispu) --only non-cancelled/non-noshow trips. the name of the view is empty minutes remember. WHERE ta.cancelled = 0 AND (ta.noshow = 0 OR (ta.noshow = 0 AND ispu = 1)) UNION SELECT 'g' objecttype, v.vehicleid, v.runid, v.rundate, v.id, MIN(startdatetime), 1--matches the CROSS JOIN for Pickup FROM dbo.tblRunactual v WHERE v.verified = 1 GROUP BY v.vehicleid, v.runid, v.rundate, v.id UNION SELECT 'g' objecttype, v.vehicleid, v.runid, v.rundate, v.id, MAX(enddatetime), -1--matches the CROSS JOIN for dropoff FROM dbo.tblRunactual v WHERE v.verified = 1 GROUP BY v.vehicleid, v.runid, v.rundate, v.id )SELECT * FROM stops WHERE vehicleid = 409 AND rundate = 20140219

avatar image SQLkiwi ♦ May 15, 2014 at 12:53 AM

The logical query processing scopes don't change; the semantics do.

As I say in the answer, the optimizer cannot change what the query means, it can only try to find an efficient physical implementation given the result specification.

Pushing the filter down into the scope shown would change the meaning of the query. Finding the first row in StartDateTime order where verified = 1 is just not the same thing as finding the first row with the extra filter condition vehicleid = 409 AND rundate = 20140219.

Without the TOP, the optimizer has much more freedom because placement of the filter does not affect the semantics, only the performance. This reasoning is quite general (it has to be!) and certainly not limited to TOP.

avatar image SQLkiwi ♦ May 15, 2014 at 01:02 AM

By way of further explanation, the semantics of your GROUP BY rewrite do not match those of the original query. Projecting columns from the single row that sorts first by StartDateTime is not the same thing as grouping by those projected columns and finding MIN(StartDateTime) over the whole set. The exception being if the projected columns have the same value in every row in the set. I don't know if that is true in your case, and neither does the optimizer.

BTW did you mean to change UNION ALL to UNION in the rewrite?

avatar image Dan Holmes May 15, 2014 at 01:27 AM

In not understanding the semantics of the statement with TOP it turns out that the GROUP BY version is what i really meant. Ironically i had a comment in the code that said i could use a GROUP BY or a DISTINCT and i chose the DISTINCT one. Now after this thread i realize i picked the wrong one. If you hadn't connected the dots, these are the pieces of the plan i view i posted under the title "Filling in the gaps: there has to be a better way" a couple days ago. Thanks for the help or i would have considered this code correct and put it in my product.

UNION ALL to UNION Yes. A subsequent CTE had a DISTINCT that i removed because i removed the ALL.

avatar image SQLkiwi ♦ May 15, 2014 at 02:00 AM

Aha! All's well that ends well then :)

And yes, I did recall the code from the other question.

10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.