possible to get MAX() with only an index scan?

I've seen it claimed that SQL Server can use an index to find a MIN() or MAX() value without requiring any further processing, but I haven't seen any evidence of that yet. That isn't the case with this query. I also tried a simpler version where I specified the accountId, basisId, and awsId columns, but SQL Server still included a Top operator even though it expected, and received, only one row.

Removing the Stream Aggregate operator probably wouldn't help this query much, but this is actually part of a larger query, and SQL Server has a tendency to replace the Stream Aggregate with a blocking Hash Match depending on query parameters and how I write the rest of the query. I'm hoping that if I simplify this part of the query, then it will have an easier time with the rest of it.

Plan.pesession (7.2 kB)
avatar image By sam.bishop 54 asked Oct 04, 2016 at 06:35 PM
more ▼
(comments are locked)
avatar image Aaron Bertrand ♦ Oct 04, 2016 at 06:37 PM

That may be true if all you're getting is the MIN() or MAX(). If you start adding other columns, other aggregates, grouping, filters, joins, etc. then that quickly goes out the window.

avatar image sam.bishop Oct 04, 2016 at 07:01 PM

All I'm adding in this case is the columns necessary to do further joins. It seems like it shouldn't matter because those columns are the same columns used to do the lookup on the index. But, disappointing as it would be, I wouldn't be surprised if the optimizer only looks for the only-the-MAX-column case. Perhaps a correlated subquery would help, then? I will look into that.

It would still be great to know though if SQL Server ever does a MAX() without a second operator.

10|10000 characters needed characters left

1 answer: sort voted first

I've seen it claimed that SQL Server can use an index to find a MIN() or MAX() value without requiring any further processing, but I haven't seen any evidence of that yet.

SQL Server can use a b-tree index to find the minimum or maximum value immediately by scanning the first row from a suitably ordered index in forward or reverse order. The scan may start at either end of the index, or at a particular seek point.

I also tried a simpler version where I specified the accountId, basisId, and awsId columns, but SQL Server still included a Top operator even though it expected, and received, only one row.

The Top (1) operator is part of the implementation referred to above. Scanning the first row from the index forward or backward, either from the end or starting at a particular seek location. The Top is required to stop the scan after the first row. Plan execution is driven from the left; data flows from the right as a consequence.

An aggregate after the Top (1) may also be required (for scalar (no group by clause) aggregates only) to respect the semantics of MIN or MAX if the result set is empty (where the 'correct' result is NULL).

This all only works once, currently. Repeating the scan/seek/top 1 for each of a number of values would be a 'skip scan', which is not yet implemented.

Removing the Stream Aggregate operator probably wouldn't help this query much...

The plan is close to optimal for the query provided, given the available index, and the grouping requirement. Specifying an equality predicate on awsId allows a seek on all three leading keys of the index, followed by an ordered range scan limited by a Top (1), which is obviously even better.

Depending on the availability of a list of seek keys (account, basis, aws, date range), you may find expressing the query as an APPLY into a Top (1) (or min/max) more efficient. One seek + top (1) per group (a manual skip scan).

For example:

 SELECT 
     V.*, 
     CA.MaxreportDate
 FROM 
 (
     -- Sample driving values
     VALUES
         (38124, 1, 7),
         (38256, 1, 5),
         (38310, 1, 6)
 ) AS V (accountId, basisId, awsId)
 CROSS APPLY
 (
     SELECT TOP (1)
         MaxreportDate = GLDD2.reportDate
     FROM dbo.GaapLotDailyDescriptive2 AS GLDD2
     WHERE 
         GLDD2.accountId = V.accountId
         AND GLDD2.basisId = V.basisId
         AND GLDD2.awsId = V.awsId
         AND GLDD2.reportDate >= -36523
         AND GLDD2.reportDate <= 6088
     ORDER BY
         GLDD2.reportDate DESC
 ) AS CA;

Expected plan shape

...this is actually part of a larger query, and SQL Server has a tendency to replace the Stream Aggregate with a blocking Hash Match depending on query parameters and how I write the rest of the query. I'm hoping that if I simplify this part of the query, then it will have an easier time with the rest of it.

SQL Server switches to a hash aggregate when the ordering requirement (group by keys) for a Stream Aggregate would require a Sort, either because a suitable index is not available, or the sort order cannot be preserved across intervening plan operators. (It may also prefer a hash for purely cost reasons, typically with few, large groups expected so the hash table approach is most efficient.)

You can of course require a Stream Aggregate using an ORDER GROUP hint. Hints are not usually desirable, and are a blunt instrument: all aggregates in the query plan will be of the stream or sort distinct variety with that hint.

Unless there is a compelling reason to write a monolithic query, breaking the query up can be the best approach. The 10k rows in the example plan provided can be very quickly stored in a temporary table (which may also have indexes added if helpful, and will benefit from automatic statistics).

sp.png (17.8 kB)
avatar image By SQLkiwi ♦ 6.6k answered Oct 04, 2016 at 07:58 PM
more ▼
(comments are locked)
avatar image sam.bishop Oct 07, 2016 at 02:56 PM

Thank you so much! I've learned a tremendous amount, and the query I was optimizing is about 50 times faster now.

I'd known that the APPLY operators can be used with more than just a function, but I wasn't aware that it could be used to get a better execution plan. I've tried searching the web for other examples of performance optimizations possible with an APPLY, but they've all followed the same pattern: "... APPLY (SELECT TOP ... ORDER BY ...)". Are there any other situations where an APPLY is often used to optimize the execution plan?

avatar image SQLkiwi ♦ Oct 08, 2016 at 12:28 PM

There are many useful applications of APPLY (lateral join in other products) but most of the time it is not worth trying to optimize like this in advance. Write the query in natural syntax, and only consider specific rewrites if a performance problem occurs. It is perfectly common for APPLY to result in a worse result as well, so please don't take this too far.

avatar image sam.bishop Oct 09, 2016 at 04:29 AM

Yes. I wouldn't want to give anyone the wrong idea. I'm a (newly minted) performance engineer, using SQL Sentry and other tools to find trouble spots. (The query I modified was the slowest user-facing query on our system.) But you're right: for the most part, straightforward code--SQL or not--is preferable and performs just fine.

10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question

Topics:

x641
x455
x117
x89
x27

asked: Oct 04, 2016 at 06:35 PM

Seen: 133 times

Last Updated: Oct 09, 2016 at 03:57 PM