Wrong index scan

For presentation purpose (Parameter sniffing) I created sample table

 CREATE TABLE [dbo].[Orders](
     [id] [INT] IDENTITY(1,1) NOT NULL,
     [customer] [INT] NOT NULL , 
     [region] INT ,
     [val] [NCHAR](10) NULL,
  CONSTRAINT [PK] PRIMARY KEY CLUSTERED 
 (
     [id] ASC
 ));
 
 --insert sample data
 INSERT INTO dbo.Orders
         ( [customer],[region], val )
 SELECT TOP 1100000
     ABS (CHECKSUM (NEWID ())) % 10000 AS customer, --rand 1000 customers
     CASE WHEN ABS (CHECKSUM (NEWID ())) % 100 < 10 THEN 1
         WHEN ABS (CHECKSUM (NEWID ())) % 100 BETWEEN 10 AND 15 THEN 2
         ELSE 3 end AS cust, --region 3 is most populated
         REPLICATE (N'X' , ABS (CHECKSUM (NEWID ())) % 10 + 1)
 FROM sys.all_columns
 CROSS JOIN sys.all_columns b 
 
 --create indexes 
 
 CREATE NONCLUSTERED INDEX ix_Customer ON dbo.Orders ([customer])
 CREATE NONCLUSTERED INDEX ix_region ON dbo.Orders ([region])
 
 --Created procedure
 CREATE PROCEDURE ORDER_multiple_param
 @cust INT = NULL,
 @id INT = null
 AS
 BEGIN
     SELECT * FROM Orders 
     WHERE ([customer] = @cust OR @cust IS NULL)
         AND (id = @id OR @id IS NULL)
 END

And then I run:

 EXEC ORDER_multiple_param  @cust = 2, @id =2

The optimizer uses index ix_region

I can understand cluster index scan or ix_Customer scan, but what is the reason of scanning ix_region?

avatar image By Yorik 16 asked Dec 28, 2015 at 02:59 PM
more ▼
(comments are locked)
10|10000 characters needed characters left

2 answers: sort voted first

This is essentially a coin flip. When I tried your repro, it used ix_Customer on my system. If you change the stored procedure to force either index, e.g.:

     SELECT * FROM Orders WITH (INDEX(ix_Customer))

You will see that all of the properties of the plan are the same - the cost numbers involved with the scan are identical, and a key lookup is still required in both cases. So it is just as efficient to scan either index to satisfy the query. A seek isn't possible because (a) none of the indexes have both customer and id in their key, and (b) you don't necessarily want that plan for this query anyway, because you'll get different results depending on whether @cust or @id are supplied at all, and what values they contain. I realize that's what you're trying to present but you should consider using OPTION (RECOMPILE) and perhaps dynamic SQL - I call this the kitchen sink procedure and have a quick video about it here.

avatar image By Aaron Bertrand ♦ 1.7k answered Dec 28, 2015 at 04:37 PM
more ▼
(comments are locked)
avatar image Yorik Dec 28, 2015 at 05:13 PM

Hi Aaron,

you are fast! thank you:) Exactly what I trying to show to SW team.

this really interesting. I ran it on sql 2008 and will try 2012 tomorrow

But actually id is cluster so it does present in ix_Customer.

avatar image Aaron Bertrand ♦ Dec 28, 2015 at 05:15 PM

But "present" and "present in the key" are two different things. ix_Customer will be sorted by customer but the id values could theoretically be in any order. So if you have 10,000 rows with customer = 2, you might only have 4 rows with id = 2, and how will SQL Server find them? Those values are not in the key, so it has to scan all the pages. Sometimes it is worth manually specifying the clustering key in NCIs depending on the type of queries you're going to use...

avatar image Yorik Dec 28, 2015 at 05:30 PM +

Agree, but this exactly what optimizer do by scanning ix_Region. Scan all pages and filter by id=2 and then lookup. I can understand it if ix_Region was smaller than other indexes but it same size.

Again I looked show wrong query but this result execution I cannot understand.

For me it may be cluster index scan as general or ix_Customer scan. in both scan.

avatar image Yorik Dec 28, 2015 at 06:20 PM

yep...till someone will call this procedure with just customer param and id=null. in this case it will scan ix_region and do loockup for each record.

very bad coin for 10^7 records:) I must try it on 2012 2014

avatar image Aaron Bertrand ♦ Dec 28, 2015 at 06:22 PM

And that is where OPTION (RECOMPILE) can be useful - it won't just re-use the same plan if @id is null.

avatar image Yorik Dec 28, 2015 at 06:34 PM +

yes Exactly what I want to show that for report purposes query like this can be used with recompile. But not for user interface with thousand calls per hour.

thank you!

10|10000 characters needed characters left

I can understand cluster index scan or ix_Customer scan, but what is the reason of scanning ix_region?

The Clustered Index Scan is not chosen because the estimated number of rows returned is so small (just one). One Key Lookup is not enough to make a Clustered Index Scan cheaper than a Nonclustered Index Scan plus Key Lookup.

The more interesting choice is between the two nonclustered indexes. Intuitively, one might expect ix_customer to be cheaper because it contains both predicate columns. Using ix_region, the execution plan has to look up the customer column in order to test it. This is exactly not the reason, however.

Though you can't see them in the final plan, during compilation the predicates in the query are contained in Filters and costed. A post-optimization rewrite pushes the Filter(s) into the index scan or lookup. We can expose these with undocumented trace flag 9130:

 DECLARE @id integer = 2;
 DECLARE @cust integer = 2;
 
 SELECT O.* 
 FROM dbo.Orders AS O WITH (INDEX(ix_Customer))
 WHERE 
     (O.customer = @cust OR @cust IS NULL)
     AND (O.id = @id OR @id IS NULL)
 OPTION (OPTIMIZE FOR (@cust = 2, @id = 2), QUERYTRACEON 9130);

 SELECT O.* 
 FROM dbo.Orders AS O WITH (INDEX(ix_region))
 WHERE 
     (O.customer = @cust OR @cust IS NULL)
     AND (O.id = @id OR @id IS NULL)
 OPTION (OPTIMIZE FOR (@cust = 2, @id = 2), QUERYTRACEON 9130);
 

(The OPTIMIZE FOR hint is needed because SQL Server cannot sniff local variables unless RECOMPILE is used, which would also embed the parameters, simplify the query and so defeat the purpose of this exercise.) Anyway, the plans are:

ix_customer

ix_customer plan

In this plan, all predicates are evaluated in the single Filter:

Filter properties

The estimated cost of the Filter is proportional to the number of rows it tests (1.1 million) and the number of predicates (4). The result of this computation is 1.408 cost units for CPU.

ix_region

ix_region plan

This plan has two separate Filters.

The one closest to the Index Scan tests the id predicate (it can do this since id is the clustering key, so it is also present in the nonclustered index):

id predicate

The estimated cost of this Filter is again related to the number of rows (1.1 million, as before) and number of predicates (only 2 now). The result of this computation is 0.748 cost units for CPU. Importantly, the cardinality estimate is still just one row.

The second Filter is positioned later, after the customer column has been looked up:

customer predicate

This Filter is almost free, since it has just a single row estimated to be tested against the two predicates. The estimated CPU cost is 0.0000007 units.

Splitting the predicates, the fact the second predicate is only expected to be tested against one row (instead of 1.1 million), while retaining a single estimated lookup means the ix_region plan is estimated to cost less.

The cost estimates are not realistic of course, but if you were asking why the optimizer makes this decision, this is the answer. In the final plans without TF 9130, the two Filters are pushed into the Scan and Lookup respectively, but as I mentioned, this is a post-optimization rewrite i.e. after all the cost-based decisions have already been made.

As Aaron mentioned, as a proper solution I would probably employ OPTION (RECOMPILE), or dynamic SQL. Even 10,000 calls per hour averages less than 3 per second. You might be surprised how little effect OPTION (RECOMPILE) would actually have at that level of activity.

sp.png (18.9 kB)
sp.png (24.8 kB)
sp.png (21.1 kB)
sp.png (21.1 kB)
sp.png (22.1 kB)
avatar image By SQLkiwi ♦ 6.6k answered Jan 02, 2016 at 11:09 AM
more ▼
(comments are locked)
10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question

Topics:

x641
x455
x27
x3

asked: Dec 28, 2015 at 02:59 PM

Seen: 65 times

Last Updated: Jan 02, 2016 at 11:09 AM