SQL Server execution plan is using PK instead of index

Juliano 2015-10-07 20:10:50

The following query is taking 5 seconds to return 9692 rows:

    Unit.Name AS UnitName,
    UTE.Name AS UTEName,
    Process.Name AS Process 
        LEFT JOIN Unit ON Project.Unit_Id = Unit.Id 
              AND Project.Company_Id = Unit.Company_Id
        LEFT JOIN UTE ON Project.UTE_Id = UTE.Id
        LEFT JOIN Process ON Project.Process_Id = Process.Id
    Project.Company_Id = '????????????????'


These are the two indexes referencing Company_Id column, however this table have more than 15 indexes that don't reference it.

CREATE NONCLUSTERED INDEX [IX_Company_Id] ON [dbo].[Project]
    [Company_Id] ASC
CREATE NONCLUSTERED INDEX [IX_Project_02] ON [dbo].[Project]
    [ProjectType_Id] ASC,
    [ProjectStatus] ASC,
    [Unit_Id] ASC,
    [Id] ASC,
    [ProjectDirector_Id] ASC,
    [UTE_Id] ASC,
    [Company_Id] ASC,
    [LostType_Id] ASC,
    [Pillar_Id] ASC,
    [ProjectPhase_Id] ASC
INCLUDE (   [CalculateSequencialProject],

It is using the PK instead of my indexes. Why?

Extra question: Every JOIN is causing a Hash Match, is that ok? I mean considering that the final query will have more than 10 JOINs.

Execution Plan

Aaron Bertrand 2015-10-07 20:16:02
Can you show us the definitions of the indexes you expect the query to use?
Juliano 2015-10-07 20:32:45
Aaron, I have two possible indexes (added to the question). This query is just the beginning of a bigger query that I'm trying to refactor step by step.
Aaron Bertrand 2015-10-07 20:44:52
SQL Server might be able to use the second index if you add CreateDate to the include list (and Process_Id to the key), but between the join conditions and the fact that your primary filter criteria is the 7th column in the key (why is the first column ProjectType_Id?), it's still going to perform a scan of that index, and it would only be chosen if the table (represented by the clustered index) is a lot wider than this (we can't tell how many other columns are in the table but not mentioned in the query). The first index is not suitable because, while it could be used to locate the rows for the where clause, it does not contain Unit_Id, which is necessary for the first outer join. SQL Server determines that it's cheaper to scan the clustered index than to seek on the skinny index and perform lookups for every single row to obtain the other necessary data to satisfy the query.

So, you might try dropping both indexes, and creating this index instead (this is just a first stab):

CREATE NONCLUSTERED INDEX [IX_Project_02a] ON [dbo].[Project]
   [Company_Id], -- made this the leading key column
   [Unit_Id],  -- moved this up to help facilitate the first join
   [Process_Id]  -- added this to help facilitate the second join
   -- removed several columns here
      -- they are not necessary for this query, and if they are
      -- necessary for other queries, may be better as INCLUDE
   [ProjectStatus], -- for this query, does not need to be in the key

(Note that the second index may actually be useful for other queries, so check index_usage_stats, don't just drop it because I said so. The first index, on just Company_Id, is highly unlikely to be useful for any queries.)

A hash match is generally suboptimal; it means that the inputs are not sorted. Better indexes can usually eliminate those.

Juliano 2015-10-08 16:24:53
Thank you, Aaron. I'll try your suggestions.
SQLkiwi 2015-10-08 13:39:53
> The following query is taking 5 seconds to return 9692 rows

It shouldn't take that long. You must be using an underpowered (or cloud) server. If you had run the query from within Plan Explorer, it would have included I/O metrics and other useful performance information.

however this table have more than 15 indexes that don't reference it.

This is a red flag that your table may be too wide, and/or that your indexing strategy is not well-planned. Try to normalize your table design and give proper thought to your indexes. You should fix the spelling errors like DescriptionTittle and CalculateSequencialProject while you're at it.

It is using the PK instead of my indexes. Why?

Because there is no index with Company_Id as the leading key that can also provide the other columns needed in the query.

Every JOIN is causing a Hash Match, is that ok? I mean considering that the final query will have more than 10 JOINs.

With the tiny number of rows in your tables at the moment, yes. It is probably optimal. The cause of the slow performance is almost certainly elsewhere – probably a hardware issue.

Writing a query with 10 joins is often a sign of trying to do too much in one query. Instead of writing one monster query (which you will have endless trouble optimizing), break the query into manageable blocks, and use temporary tables (indexed if necessary) to hold small intermediate results.

You will get better quality execution plans, and more predictable performance this way. Your final queries will likely be comprehensible by mere humans as an additional benefit. Maintaining queries usually represents more of the lifetime cost of a database than the initial implementation, so writing simpler queries usually pays off in the long run.

Other Observations

You appear to be using uniqueidentifier as the type for your Id columns. This often indicates a lack of understanding of the tool you're using (Entity Framework?) and what data types make good primary keys.

Your tables contain very few rows. If you intend to scale this, you need to develop and test with a representative data volume and value distribution.

You should get into the habit of prefixing your table references with the schema prefix, perhaps it is dbo.Process, for example.

You should also get into the habit of using aliases for your object references e.g. dbo.Project AS P.

If you really want to test the performance of hash joins versus lookups, you may, for testing purposes, put a WITH (FORCESEEK) hint on the table references in your query.

Ultimately, I am suggesting you employ a decent DBA to assist you at this stage.

Juliano 2015-10-08 16:24:17
Thank you for your comments. I'm new in this project, it has been on production for one year and I have been working on it for just a month. It is using EF in the data layer and I have no idea why they decided to use uniqueidentifier instead of integers (however, even Microsoft use this type on SharePoint and Project Server if I'm wrong). I have to change things carefully because the project is running on production environment, so I can't just drop all indexes and recreate them (there are many other queries and EF generates some crazy ones). That's also the reason to have not fixed the spelling errors. This many JOINs are almost all made to small tables, like Project.ProjectType_Id = ProjectType.Id (the ProjectType table has 5 rows). They are need just to provide the textual description on a SSRS report. At last, but not least, this is a server running on-premise, but I'm connected to it remotely, via VPN.