What is the best way to remove key lookup (costing 68%) ?

Kin 2014-05-17 02:31:31

What is the best way to remove KeyLookup which is costing 68% of the query ?

I know that index with included columns will help, but wont it make the index (Clustered index) wider ?

Aaron Bertrand 2014-05-17 15:08:18
Adding included columns can help to eliminate a key lookup, but that doesn't always mean it's wise to do so. In this case, Paul has identified that you have a LOT of columns in the key lookup, so rushing to add INCLUDE columns might not be the best first step toward optimization. More details will help. As an aside, included columns can't be added to a clustered index – the key lookup is using the clustered index because the non-clustered index that is used to satisfy part of the query does not have those columns. The clustered index – by definition – has all of the columns.
AliRaz 2014-05-17 04:08:54
Your clustered index is the physical ordering of the data on disk, and it holds but a [included] column on a non clustered index would widen it, however you wouldn't have the overhead of traversing the reverse b-tree structure of the index ifle.
SQLkiwi 2014-05-17 05:14:49
> What is the best way to remove Key Lookup which is costing 68% of the query?

It is 68% of the estimated cost of the query, according to the optimizer's standard model (which takes little account of whatever hardware you might be running SQL Server on).

The real question is whether this is a good execution plan for your query and system. From the information provided, it is hard to tell. Not only is the plan anonymized, it is an estimate – there is no post-execution data to tell us how accurate the optimizer's estimates were.

Whether this is a good plan or not depends heavily on how selective the existing nonclustered indexes are, and how many rows are actually looked up in the base table. Without a post-execution ('actual') plan, and an indication of how representative the execution is of your workload, it is really tough to offer guidance.

Are you experiencing a performance problem with this query, or just concerned by the apparently-high percentage cost of the Key Lookups?

The thing is, each lookup is returning a large number of columns – 24 for the first one and 76 for the second. It is unlikely your want to include so many columns in the existing nonclustered indexes. It also raises questions about the database design. Again, this depends on information (table and index DDL) that isn't available to us from the anonymized plan alone.

In summary, if the optimizer's estimates are reasonably accurate, this is likely a good execution plan for your query. If the estimates are not accurate, you should do all you can to make them accurate (perhaps by updating or providing new statistics). If you can provide more context and information to work with, we will likely be able to offer more specific advice.

Kin 2014-05-17 16:28:10
I looked at the execution plan and the key lookup was the highest cost. I now remember from your PRECON (PASS 2013) that you highlighted what you said above. My concern is that the process is a caching process which takes more than 5-7 mins to run as it caches the entire data (6-10 million records) in the application cache. That might sound brutal, but that is how it is designed. I am trying to optimize the process and looking at the execution plan, the highest cost was the keylookup.
Aaron Bertrand 2014-05-17 16:35:40
How have you validated that the 5-7 minutes is spent running the query, and not transferring the results over the network, logic done in the app while loading the cache, etc.? Does the query take the same amount of time when the results are simply discarded (using Plan Explorer to generate an actual plan)? How about without the app cache taking part (using Management Studio)? Can you also test either or both of those things locally (taking network out of the equation)? If you're loading 6-10 million rows, is that the whole table? If not, maybe there is a more efficient way to filter out the rows you don't need (again, this is where schema, logic, and existing indexes would be useful in the conversation).