Wrong index scan
CREATE TABLE [dbo].[Orders]( [id] [INT] IDENTITY(1,1) NOT NULL, [customer] [INT] NOT NULL , [region] INT , [val] [NCHAR](10) NULL, CONSTRAINT [PK] PRIMARY KEY CLUSTERED ( [id] ASC )); --insert sample data INSERT INTO dbo.Orders ( [customer],[region], val ) SELECT TOP 1100000 ABS (CHECKSUM (NEWID ())) % 10000 AS customer, --rand 1000 customers CASE WHEN ABS (CHECKSUM (NEWID ())) % 100 < 10 THEN 1 WHEN ABS (CHECKSUM (NEWID ())) % 100 BETWEEN 10 AND 15 THEN 2 ELSE 3 end AS cust, --region 3 is most populated REPLICATE (N'X' , ABS (CHECKSUM (NEWID ())) % 10 + 1) FROM sys.all_columns CROSS JOIN sys.all_columns b --create indexes CREATE NONCLUSTERED INDEX ix_Customer ON dbo.Orders ([customer]) CREATE NONCLUSTERED INDEX ix_region ON dbo.Orders ([region]) --Created procedure CREATE PROCEDURE ORDER_multiple_param @cust INT = NULL, @id INT = null AS BEGIN SELECT * FROM Orders WHERE ([customer] = @cust OR @cust IS NULL) AND (id = @id OR @id IS NULL) END
And then I run:
EXEC ORDER_multiple_param @cust = 2, @id =2
The optimizer uses index ix_region
I can understand cluster index scan or ix_Customer scan, but what is the reason of scanning ix_region?
SELECT * FROM Orders WITH (INDEX(ix_Customer))
You will see that all of the properties of the plan are the same – the cost numbers involved with the scan are identical, and a key lookup is still required in both cases. So it is just as efficient to scan either index to satisfy the query. A seek isn't possible because (a) none of the indexes have both customer and id in their key, and (b) you don't necessarily want that plan for this query anyway, because you'll get different results depending on whether @cust or @id are supplied at all, and what values they contain. I realize that's what you're trying to present but you should consider using OPTION (RECOMPILE) and perhaps dynamic SQL – I call this the kitchen sink procedure and have a quick video about it here.
I can understand it if ix_Region was smaller than other indexes but it same size.
Again I looked show wrong query but this result execution I cannot understand.
For me it may be cluster index scan as general or ix_Customer scan. in both scan.
in this case it will scan ix_region and do loockup for each record.
very bad coin for 10^7 records:)
I must try it on 2012 2014
Exactly what I want to show that for report purposes query like this can be used with recompile.
But not for user interface with thousand calls per hour.
thank you!
The Clustered Index Scan is not chosen because the estimated number of rows returned is so small (just one). One Key Lookup is not enough to make a Clustered Index Scan cheaper than a Nonclustered Index Scan plus Key Lookup.
The more interesting choice is between the two nonclustered indexes. Intuitively, one might expect ix_customer to be cheaper because it contains both predicate columns. Using ix_region, the execution plan has to look up the customer column in order to test it. This is exactly not the reason, however.
Though you can't see them in the final plan, during compilation the predicates in the query are contained in Filters and costed. A post-optimization rewrite pushes the Filter(s) into the index scan or lookup. We can expose these with undocumented trace flag 9130:
DECLARE @id integer = 2; DECLARE @cust integer = 2; SELECT O.* FROM dbo.Orders AS O WITH (INDEX(ix_Customer)) WHERE (O.customer = @cust OR @cust IS NULL) AND (O.id = @id OR @id IS NULL) OPTION (OPTIMIZE FOR (@cust = 2, @id = 2), QUERYTRACEON 9130); SELECT O.* FROM dbo.Orders AS O WITH (INDEX(ix_region)) WHERE (O.customer = @cust OR @cust IS NULL) AND (O.id = @id OR @id IS NULL) OPTION (OPTIMIZE FOR (@cust = 2, @id = 2), QUERYTRACEON 9130);
(The OPTIMIZE FOR hint is needed because SQL Server cannot sniff local variables unless RECOMPILE is used, which would also embed the parameters, simplify the query and so defeat the purpose of this exercise.) Anyway, the plans are:
ix_customer
In this plan, all predicates are evaluated in the single Filter:
The estimated cost of the Filter is proportional to the number of rows it tests (1.1 million) and the number of predicates (4). The result of this computation is 1.408 cost units for CPU.
ix_region
This plan has two separate Filters.
The one closest to the Index Scan tests the id predicate (it can do this since id is the clustering key, so it is also present in the nonclustered index):
The estimated cost of this Filter is again related to the number of rows (1.1 million, as before) and number of predicates (only 2 now). The result of this computation is 0.748 cost units for CPU. Importantly, the cardinality estimate is still just one row.
The second Filter is positioned later, after the customer column has been looked up:
This Filter is almost free, since it has just a single row estimated to be tested against the two predicates. The estimated CPU cost is 0.0000007 units.
Splitting the predicates, the fact the second predicate is only expected to be tested against one row (instead of 1.1 million), while retaining a single estimated lookup means the ix_region plan is estimated to cost less.
The cost estimates are not realistic of course, but if you were asking why the optimizer makes this decision, this is the answer. In the final plans without TF 9130, the two Filters are pushed into the Scan and Lookup respectively, but as I mentioned, this is a post-optimization rewrite i.e. after all the cost-based decisions have already been made.
As Aaron mentioned, as a proper solution I would probably employ OPTION (RECOMPILE), or dynamic SQL. Even 10,000 calls per hour averages less than 3 per second. You might be surprised how little effect OPTION (RECOMPILE) would actually have at that level of activity.
you are fast! thank you:) Exactly what I trying to show to SW team.
this really interesting. I ran it on sql 2008 and will try 2012 tomorrow
But actually id is cluster so it does present in ix_Customer.