possible to get MAX() with only an index scan?
Removing the Stream Aggregate operator probably wouldn't help this query much, but this is actually part of a larger query, and SQL Server has a tendency to replace the Stream Aggregate with a blocking Hash Match depending on query parameters and how I write the rest of the query. I'm hoping that if I simplify this part of the query, then it will have an easier time with the rest of it.
SQL Server can use a b-tree index to find the minimum or maximum value immediately by scanning the first row from a suitably ordered index in forward or reverse order. The scan may start at either end of the index, or at a particular seek point.
I also tried a simpler version where I specified the accountId, basisId, and awsId columns, but SQL Server still included a Top operator even though it expected, and received, only one row.
The Top (1) operator is part of the implementation referred to above. Scanning the first row from the index forward or backward, either from the end or starting at a particular seek location. The Top is required to stop the scan after the first row. Plan execution is driven from the left; data flows from the right as a consequence.
An aggregate after the Top (1) may also be required (for scalar (no group by clause) aggregates only) to respect the semantics of MIN or MAX if the result set is empty (where the 'correct' result is NULL).
This all only works once, currently. Repeating the scan/seek/top 1 for each of a number of values would be a 'skip scan', which is not yet implemented.
Removing the Stream Aggregate operator probably wouldn't help this query much…
The plan is close to optimal for the query provided, given the available index, and the grouping requirement. Specifying an equality predicate on awsId allows a seek on all three leading keys of the index, followed by an ordered range scan limited by a Top (1), which is obviously even better.
Depending on the availability of a list of seek keys (account, basis, aws, date range), you may find expressing the query as an APPLY into a Top (1) (or min/max) more efficient. One seek + top (1) per group (a manual skip scan).
SELECT V.*, CA.MaxreportDate FROM ( -- Sample driving values VALUES (38124, 1, 7), (38256, 1, 5), (38310, 1, 6) ) AS V (accountId, basisId, awsId) CROSS APPLY ( SELECT TOP (1) MaxreportDate = GLDD2.reportDate FROM dbo.GaapLotDailyDescriptive2 AS GLDD2 WHERE GLDD2.accountId = V.accountId AND GLDD2.basisId = V.basisId AND GLDD2.awsId = V.awsId AND GLDD2.reportDate >= -36523 AND GLDD2.reportDate <= 6088 ORDER BY GLDD2.reportDate DESC ) AS CA;
…this is actually part of a larger query, and SQL Server has a tendency to replace the Stream Aggregate with a blocking Hash Match depending on query parameters and how I write the rest of the query. I'm hoping that if I simplify this part of the query, then it will have an easier time with the rest of it.
SQL Server switches to a hash aggregate when the ordering requirement (group by keys) for a Stream Aggregate would require a Sort, either because a suitable index is not available, or the sort order cannot be preserved across intervening plan operators. (It may also prefer a hash for purely cost reasons, typically with few, large groups expected so the hash table approach is most efficient.)
You can of course require a Stream Aggregate using an ORDER GROUP hint. Hints are not usually desirable, and are a blunt instrument: all aggregates in the query plan will be of the stream or sort distinct variety with that hint.
Unless there is a compelling reason to write a monolithic query, breaking the query up can be the best approach. The 10k rows in the example plan provided can be very quickly stored in a temporary table (which may also have indexes added if helpful, and will benefit from automatic statistics).