How to avoid a clustered index scan when ordering by clustered index

Barry Schulz 2016-08-12 22:41:58

– These queries are used to page data.
– IRN, the clustered index is a GUID and uses new sequetial id as the default value

First Query
– Utilizes the the index I created to cover the predicate and output from HimEggRoomReceiving

Second Query
– Same as the first query except including except with a larger date range
– Does a clustered index scan instead of using the index from the first query

Third Query
– Same as second except no ORDER BY

Vlady Oselsky 2016-09-01 14:22:43
Have you tried to query against base tables instead of against the View?
SQLkiwi 2016-09-06 15:23:55
This is an example of a fundamental optimization problem. Is it better to:

  1. Use an index to locate all qualifying rows, sort the (full) set, and take the first 'n' rows of the result; or
  2. Scan an index in the required sort order, applying the filtering conditions to each row, hoping to find the first 'n' qualifying rows quickly.

The point being that the scan need not be a full scan; it can stop as soon as the required rows are found. How much of the scan needs to complete depends on where, in the order specified, the required 'n' rows are located.

Say we are looking for 100 rows. In the best case, the first 100 rows of the scan all meet the specified predicates, and the query stops after scanning just 100 rows, with no sorting needed. In the worst case, there are fewer than 100 matches in total (or the 100th match is the last row in index order), and a complete index scan is performed.

The calculations involved in the trade off is not helped by an assumption made in the optimizer's costing model, that predicate matches will be evenly distributed. So, say 1% of the rows in the table are expected to match. If we want 100 rows, the optimizer will assume that 100 * 100 = 10,000 rows will need to be scanned to find those 100 rows.

There's not a tremendous amount you can do to influence the decision between a scan or a seek plus sort. The main tools are a FAST n query hint, or using TOP with a variable and an OPTIMIZE FOR hint.

To use TOP with a variable:

DECLARE @T bigint = 201; -- Number of rows actually needed
 
SELECT TOP (@T)
[your query]
OPTION (OPTIMIZE FOR (@T = 1)); -- Vary this number

Using the FAST hint only works if the plan you want appears for a number of rows smaller than specified in the TOP clause.

Beyond that, you could add an index hint to the view (or a copy of the view), or write the query out manually. None of those latter options are particularly appealing from a maintenance point of view (many places to update when the core view is changed).