Why does SQL Server do the two sorts in the attached query plan, I would have expected a Merge Join and a Stream Aggregate would have been enough and a lot faster.
The query is:
I have indexes on both the tables that are sorted by job_unid and includes intfpostdate.
Regards, Kim Hansen
By Kim Hansen 3 asked Oct 03, 2015 at 02:57 PM
I couldn't reproduce your scenario where an additional SORT was required for
(Note that I had to append
I tried I slightly more elaborate re-write of the query, attempting to get the grouping and aggregation done earlier:
This yielded a slightly simpler plan, most notably without a single Sort operator:
So, you could try that variation and see if it ends up any better on your system.
The major difference between our plans is that I built my sample data using AdventureWorks.Sales.SalesOrderHeader, which only has 31K distinct SalesOrderID values. In your case, you have close to 3 million unique job_unid values. So aside from me getting a much better representative data set to test with, I have to ask: does this query really need to return all 3 million rows? Whats most puzzling to me is you're returning 3 million dates, and they're sorted by job_unid, but job_unid isn't even in the output. Why? Is a user really going to consume 3 million rows for any purpose? What is the point of showing them a date without any clue about which job_unid it belongs to?
By Aaron Bertrand ♦ 1.7k answered Oct 04, 2015 at 01:30 PM
A simple Merge Concatenation and Stream Aggregate might execute faster on your system, but it is not the cheapest option according to the query optimizer's cost model. For example, simply using the table cardinalities from the submitted query plan, the 'simple' approach looks like this:
This has an estimated cost of 84.83 units on my test SQL Server 2014 instance. As the warning symbols on the scans above indicate, I do not have statistics on the tables, so your estimate will likely be different (also, we likely have different memory and CPU configurations). The estimated cost is mostly associated with the large number of rows passing through the Concatenation (21.7 million or so).
Note that the 'simple' plan above requires the index to have
The optimizer considers the option of pushing the aggregate below the Union All, performing a partial
Again the estimated row counts beyond the scans are likely incorrect due to the lack of statistics on my simulation, but the point is still evident: performing a partial aggregation early reduces the number of rows passing through the Concatenation significantly. On my system, this reduces the estimated cost from 84.83 to 37.15. On that basis, the optimizer chooses the plan shape you see.
Side note: The Sorts are required because merge requires inputs sorted on all joined (concatenated in this case) keys. Please see my article, "Avoiding Sorts with Merge Join Concatenation" for more details.
Strictly, the sorts are unnecessary, but the Concatenation implementation requires input sorts on e.g. (job_unid, partialaggresult) and the index can only provide order on (job_unid). Unfortunately, the optimizer is not able to reason in the way required to eliminate the sorts. You can get around this by writing the partial aggregates out manually (see Aaron's answer for an example of that).
Whether the optimizer's transformation pays off depends on a number of factors, not least of which how effective the partial aggregates are in reducing the row count, and whether the sorts are performed in memory or spill to disk. You did not provide a post-execution (actual) plan, so we can't know if the estimates were accurate at runtime or not.
There is no documented way to turn off this partial aggregate exploration option, but there are many ways to express the query differently, that may produce a different plan shape, again for example as shown in Aaron's answer (that he was writing concurrently, it seems).
I am using SQL Server 2014 SP1 CU 2 (build 12.00.4422) with the new CE model enabled. You are on build 12.00.2269, which is 2014 RTM (with an early security update). As a general observation, you should update to at least SP1, though this is unlikely to affect plan choice.
By SQLkiwi ♦ 6.6k answered Oct 04, 2015 at 02:12 PM