In a EDW environment, how can this query be optimized for Clustered Index Seek ?

vbuga 2014-10-28 13:35:00

This query results in a large number of logical reads. How can it be optimized ?

nhmidi 2014-10-28 13:41:15
http://blog.sqlauthority.com/2009/11/09/sql-server-removing-key-lookup-seek-predicate-predicate-an-interesting-observation-related-to-datatypes/
vbuga 2014-10-28 14:58:02
Thanks, from this article I got some useful info, but nothing substantial that I can apply to what I am seeing. By adding the second column to compound index and converting its type to int, the author was able to force SQL Server engine to perform an index seek, instead of index scan. In my case, the column in the where clause is a smallint.

There is no conversion of types happening, since the predicate in the plan says:
[EDW_Integration].[dbo].[INT_Company_Item].[Record_Status] as [L].[Record_Status] = (1)

In the query that is underperforming, we are getting the two columns: Company_Item_ID, Company_Item_SK, based on conditions (Record_Status=1)

An sp_helpindex on the table shows:

IX_INT_Company_Item_BK clustered located on PRIMARY Company_Code, Project_Code, Item_Code, Record_Status

PK_INT_COMPANY_ITEM nonclustered, unique, primary key located on PRIMARY Company_Item_ID

Would a compound index on Company_Item_ID, Company_Item_SK, help ?

nhmidi 2014-10-28 15:14:15
Index Scan Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table.

Use FORCESEEK :http://technet.microsoft.com/en-us/library/bb510478(v=sql.105).aspx

Chris Wilson 2014-10-28 15:28:51
Vbuga,

It looks like you don't have an index that satisfies the the predicate and contains the columns needed for the select.

Neither index you listed includes the Record_Status field so the optimizer can't use them to get to the records it needs. Thus it does a clustered Index Scan (remember that the Clustered Index is the full table) and searches the entire table for all records with a Record_Status = 1.

If you notice on the SELECT operator in the plan there is a little yellow exclamation point. If you hover your mouse over the operator it shows a Warning at the bottom:
Missing Index Found. Right Click to view.

If you right click on the SELECT and choose Missing Index Details, the missing index is as follows:

USE [EDW_Integration]
GO
CREATE NONCLUSTERED INDEX [*name index here*]
ON [dbo].[INT_Company_Item] ([Record_Status])
INCLUDE ([Company_Item_ID],[Company_Item_SK])
GO

Change the name of the index to something meaningful and create it on the database and you should be good to go.

vbuga 2014-10-28 20:07:45
Thanks Chris, I think this answer makes more sense. Will give it a try.

Here's some more info, and I wonder if this will change the answer:

SELECT count(Company_Item_ID) tblRows,
count(distinct(Record_Status)) distRecordStatus
FROM dbo.INT_Company_Item ici

tblRows distRecordStatus

68254522 2

SELECT count(Company_Item_ID)
FROM dbo.INT_Company_Item ici
WHERE Record_Status=1

countRecordStatus

29757017

So is it really doing a table scan, since the number of total records (68,254,522) and number of records where Record_Status = 1 is (29,757,017) ?

  • Note: the initial plan submitted showed 28,715,000 which is slightly less than what I ran later today 29,757,017 because of changes in number of records in the database.
Chris Wilson 2014-10-28 20:22:41
The operator in the plan will tell you if it is doing a scan or a seek. In this case it was showing as a scan so even though it only brought back less than half of the rows it still touched very page in the table.

Craig Freedman has a blog post about the difference between a scan and a seek.
http://blogs.msdn.com/b/craigfr/archive/2006/06/26/647852.aspx

Lots of other really good blog posts out there as well.

My guess with the size of the table is that even with a good index you will still most likely get a scan as it is usually faster to scan through ~30 million rows than it is to seek.