Is SQL Server secretly using unused indexes?

Greg S 2016-09-21 19:38:01

This scenario occurred on SQL 2014.

I have a large, complicated query which I've been performance tuning. In the plan, there exists an Index Scan leading into a Merge Join. Since I know that it will be doing a MAX aggregate after this join, I thought I would try adding a filtered index (matching the query predicate) on the aggregated column. After I apply the new index, I get a plan which now performs an Index Seek into a Nested Loop join. However, the seek is on the original index, not the one that I created. In fact, the new index is not used at all in the query plan. I confirm this by checking *sys.dm_db_index_usage_stats* and verifying that seeks, scans and lookups are all zero. I drop the new index and get the original query plan.

Obviously, I conclude that the statistics associated with the index are the crucial factor. I update statistics on the underlying table but the original plan remains unchanged. I can still add the index and get the new plan and then drop it to get the original plan. So I create new statistics that match the fields and filter of the new index but the original plan remains. So we can conclude that the statistics are not the crucial factor. Only when I create the index will the plan change.

So the SQL 2014 optimizer is creating different plans based on the existence of an index without ever using that index. What's going on? More importantly, how can I get the second plan without having to create an unused index (which will be prone to periodically being tagged for deletion by the DBAs because it is entirely unused).

Vlady Oselsky 2016-09-21 20:00:30
Can you post both execution plans?
Greg S 2016-09-21 20:15:43
In withoutindex I am looking at Object8.Index19. In withindex, the same index is Object8.Index18.
Greg S 2016-09-21 20:17:30
I am looking at the Statement that begins "WITH Object2 AS"
Vlady Oselsky 2016-09-21 20:28:07
Have you tried to add "WITH (INDEX(INDEX_NAME))" clause to try to force it to use the original index? Also, what do you see in index analysis when running without the new index? Lastly, have you tried to delete plan cache for this query?
Greg S 2016-09-21 20:31:21
My concern is not that I want it to use the index. I want to be able to drop the unused index and keep the same execution plan.
Vlady Oselsky 2016-09-21 20:35:46
How many indexes you have on the table? Did you ever see the flowing post I wonder if something crazy is happening in your case https://www.brentozar.com/archive/2016/03/indexes-helping-indexes/
Greg S 2016-09-22 01:31:34
There are about 8 indexes on that table.
SQLkiwi 2016-09-22 07:06:28
>So the SQL 2014 optimizer is creating different plans based on the existence of an index without ever using that index. What's going on?

The optimizer's starting point and search is guided by a number of inputs, existing indexes being one among many. Given a complex query where it is not possible to explore every alternative the optimizer is capable of generating, the final plan will depend sensitively on the initial join order chosen, initial cardinality estimates, and the precise code path taken through the optimizer.

That is quite generally true, for all but the simplest queries, even when optimization does not terminate with the Time Out reason, as it does in both plans provided. Simply, the optimizer found something good enough after making a reasonable number of exploration and implementation attempts based on the initial costing and complexity of the query.

More importantly, how can I get the second plan without having to create an unused index (which will be prone to periodically being tagged for deletion by the DBAs because it is entirely unused).

You might be able to get the second plan using a plan guide or USE PLAN hint, depending on the definition of the filtered index. In any case, and especially if the query is not suitable for a plan guide, often the best way to get the highest quality plan from a complex query is to break it up into smaller queries, using small temporary tables (not variables, generally) to get useful statistics on the intermediate result sets, and give the optimizer the best quality information, and a reasonably-sized search space for each query.