How to get predicate push down in WITH statements

Tony Green 2017-12-28 15:44:54

The attached session has 2 queries which produce identical output. The second query has fewer reads and uses less memory. Why is query two over twice as slow? Is there a better way to get the predicate pushed down to reduce the number of rows sooner like query one?

Hugo Kornelis 2017-12-29 19:22:29
Let's first answer the question you ask, and then move on to the question you should have asked.

The second query uses Hash Match and Merge Join operators to join to six copies of the CTE, each with a specific filter for the Name column (DeviceType, Farm, Transmitter, DeviceName, Bed, and Unit). These operator require just a single pass over the DeviceInfoData table. So that's six passes, each using an index seek on an index that has Name as the leading column. The effect is only reading the rows related to the six selected names, and nothing else. That results in a low number of reads: 6 scans and 3140 logical reads. However, the Hash Matches require memory, and the Merge Joins require sorted data for which extra Sort operators are added, that also require memory. The memory grant for this second query is almos 18MB because of all these Hash Match and Sort operators.

The first query uses six subqueries to retrieve the same data. The way these subqueries are written results in an execution plan that uses Nested Loops joins, that execute once for each row from the main tables. The benefit of this is that these inputs can use an index (the same actually) to seek on the combination of Name and DeviceSessionId, which means less data to be read. However, they do execute once for each row from the Devices and DeviceSessions tables: 261 total. A single execution of the search in DeviceInfoData is cheaper than in the other query, but because it returns data for a single device only it needs to be repeated. Each of the six subqueries, repeated 261 times. That adds up to more IO: 6320 logical reads (in 1566 scans: six times 261). That's why the total logical reads is higher. On the other hand, this plan uses no Hash Match operators and just a single Sort, so the memory grant is much lower – less than 5MB.

I'm not going to answer your question on "twice as slow". The timing is in milliseconds, and a difference of just 50 ms can be caused by too many external factors (a lock by another user, a scheduled virus scan starting and taking some resources, etc etc). If you want to have a true measurement of which one runs fastest, then: (1) first disable execution plan and use the SSMS options to have the results be discarded after execution (to eliminate the relative slow rendering that SSMS does); (2) load one of the two queries in the query window and add GO 100 to make it run 100 times; (3) enable the Include Client Statistics option; and (4) run. Then do the same for the other query (from the same window, so just replace the query text). After that, look at the Client Statistics and compare the Total execution time. For queries this fast, that is in my experience the most reliable method to measure elapsed time. (However, it doesn't tell you anything about resource usage).


Now that I have answered the question you asked, let's move to the question you should have asked. Which is: "in the first query I have six almost identical subqueries; why isn't the optimizer smart enough to collapse them into a single execution? and in the second query I even used a CTE that I referenced six times and the optimizer still insists on six almost identical branches. why is that and how can I eliminate this duplicate work".

The reason for this is that the optimizer is bad at recognizing identical subtrees (I use the word subtree because the query gets converted into an internal format that is called a query tree at the start of optimization). I have been told that even recognizing identical subtrees can be harder than it seems, and in your case they are not even identical because you are filtering on a different value in the Name column every time.The CTE does not help because, when the query gets converted into a tree, the CTE is expanded into its definition. So the optimizer sees the query as if you had just used copy/paste to create six copies (with the same minimal but relevant change).

If you want to prevent the opimizer from coming up with a plan with six identical branches in the plan, your best bet is to first materialize the data in the CTE into a temporary table. However, I am not sure if this will actually help you. The index used is pretty effective at only retrieving the rows it actually needs. If the full table has more than just these six values for Name, then materializing the CTE into a temporary table will force SQL Server to process all rows. However, you can add WHERE Name IN (N'DeviceType', N'Farm', […] ) and that might combat this. But even than it is still possible that the combined performance of the two queries will not be better than the original one; you'd have to test.

For the first query, I have a feeling that the subqueries are more complex than they need to be. Top TOP(1) is redundant based on the row_number calculation and filter. However, I think you can get rid of the nested subquery and the row_number if you leave in the TOP, as long as you add an appropriate ORDER BY. So for instance, for one of the subqueries you would use:

OUTER APPLY --[dbo].[v_DeviceSessionInfo] 
                (   SELECT TOP (1)
                            did.Value
                    FROM    dbo.DeviceInfoData AS did
                    WHERE   Name        = N'DeviceType'
                    AND     DeviceSessionId = 1
                    ORDER BY did.TimestampUTC DESC) AS DeviceInfoDeviceType

But you will ned to test this to verify (a) that the results do still come back unchanged and correct; and (b) whether this actually has better performance. I cannot predict that with certainty based on the information I have.

Tony Green 2018-01-03 18:58:10
I did try simplifying the subqueries as you suggested and the overall performance improved.
Hugo Kornelis 2018-01-03 21:08:03
I'm glad to hear that, Tony. Thanks for closing the loop!