Included column savings vs performance impact

Jason Davis 2014-12-18 19:52:05

I have a query that has a key lookup that is 85% of the query cost. I added one column, that is datetime as an included column for the index that is being used, and appears to be best suited. With the included column added, the key lookup still exists, but the cost associated with it has dropped by 90+%. The overall cost of the query is reduced by 80+%.

What I want to know, is this cost savings enough to outweigh any potential performance impact caused by additional column?

Here are some stats on the table/index with and without the included column (let me know if anything else is needed)

Table Index Space 421mb without vs 427mb with
Index Leaf Rows 504,841 without vs 511,080 with

Aaron Bertrand 2014-12-18 20:03:29
Hard for us to know for sure without seeing the plan and knowing the other column data types, but have you considered including all of the missing columns referenced by the key lookup?
Jason Davis 2014-12-18 20:10:14
I had, there are 10 columns in total in the lookup. The one column I added to the index was being used in the WHERE for comparison as the other 9 are only in the SELECT.

The columns in the lookups
INT – 1
Numeric – 1
DateTime – 5
VarChar – 3

Would this add too much to the index?

Aaron Bertrand 2014-12-18 21:23:05
I can't answer that, because I have no idea how to assess what "too much" means. I do know that it is worth it for you to test that scenario, and make sure you test not only this specific query but also other queries that might use this index as well as the write portion of your workload. Getting rid of the key lookup may make this query better overall (or it may not help all that much), but there is no free lunch. I just have no idea how much you're willing to pay for lunch.
Jason Davis 2014-12-18 21:30:47
Ok, makes sense. Thanks for the input.
SQLkiwi 2014-12-19 00:50:43
Some general points:

  1. Cost percentages in execution plans are always estimates. Even in post-execution ("actual") plans. They are based on the optimizer's cost model, which is not meant to reflect your system's performance characteristics. The numbers and formulas exist primarily to give the optimizer a way to choose, generally, between different candidate plan shapes for a query. I cannot stress enough how literally you should NOT take these numbers ๐Ÿ™‚
  2. Plan Explorer provides a button to directly upload a plan or session file with your question. These can be anonymized if the names are sensitive somehow, or general policy dictates. Queries run from PE provide the best diagnostic information.
  3. Included columns make the index structure larger.
  4. A larger structure (at the leaf level) means fewer index rows per page than before.
  5. Fewer rows per page means more memory is used by the index when in cache.
  6. It also means more disk I/O to read the index from disk.
  7. Included columns have to be maintained by any data modification query that affects those columns. This can change the update/insert/merge plan shape significantly, and so affect performance in ways that may not be easy to anticipate.

Ultimately, there are multiple competing factors. The only way to be sure is to test a representative workload with and without the included columns, as Aaron suggested. That said, the impact of adding a few small columns to the include list is often quite small. Just not always ๐Ÿ™‚ Be very careful about adding large columns or too many columns though.

Jack 2014-12-24 16:15:12
Wouldn't it also make sense to evaluate the choice of the clustered index in this case? With 10 columns in the lookup including several date columns, it may be that the best candidate for the clustered index is not what is being used for the clustered index.
SQLkiwi 2014-12-25 05:30:54
Possibly. Hard to know from the information provided.