I've seen it claimed that SQL Server can use an index to find a MIN() or MAX() value without requiring any further processing, but I haven't seen any evidence of that yet. That isn't the case with this query. I also tried a simpler version where I specified the accountId, basisId, and awsId columns, but SQL Server still included a Top operator even though it expected, and received, only one row.
Removing the Stream Aggregate operator probably wouldn't help this query much, but this is actually part of a larger query, and SQL Server has a tendency to replace the Stream Aggregate with a blocking Hash Match depending on query parameters and how I write the rest of the query. I'm hoping that if I simplify this part of the query, then it will have an easier time with the rest of it.
By sam.bishop 54 asked Oct 04, 2016 at 06:35 PM
SQL Server can use a b-tree index to find the minimum or maximum value immediately by scanning the first row from a suitably ordered index in forward or reverse order. The scan may start at either end of the index, or at a particular seek point.
The Top (1) operator is part of the implementation referred to above. Scanning the first row from the index forward or backward, either from the end or starting at a particular seek location. The Top is required to stop the scan after the first row. Plan execution is driven from the left; data flows from the right as a consequence.
An aggregate after the Top (1) may also be required (for scalar (no group by clause) aggregates only) to respect the semantics of
This all only works once, currently. Repeating the scan/seek/top 1 for each of a number of values would be a 'skip scan', which is not yet implemented.
The plan is close to optimal for the query provided, given the available index, and the grouping requirement. Specifying an equality predicate on
Depending on the availability of a list of seek keys (account, basis, aws, date range), you may find expressing the query as an
SQL Server switches to a hash aggregate when the ordering requirement (group by keys) for a Stream Aggregate would require a Sort, either because a suitable index is not available, or the sort order cannot be preserved across intervening plan operators. (It may also prefer a hash for purely cost reasons, typically with few, large groups expected so the hash table approach is most efficient.)
You can of course require a Stream Aggregate using an
Unless there is a compelling reason to write a monolithic query, breaking the query up can be the best approach. The 10k rows in the example plan provided can be very quickly stored in a temporary table (which may also have indexes added if helpful, and will benefit from automatic statistics).
By SQLkiwi ♦ 6.6k answered Oct 04, 2016 at 07:58 PM