Is SQL Server secretly using unused indexes?
I have a large, complicated query which I've been performance tuning. In the plan, there exists an Index Scan leading into a Merge Join. Since I know that it will be doing a MAX aggregate after this join, I thought I would try adding a filtered index (matching the query predicate) on the aggregated column. After I apply the new index, I get a plan which now performs an Index Seek into a Nested Loop join. However, the seek is on the original index, not the one that I created. In fact, the new index is not used at all in the query plan. I confirm this by checking *sys.dm_db_index_usage_stats* and verifying that seeks, scans and lookups are all zero. I drop the new index and get the original query plan.
Obviously, I conclude that the statistics associated with the index are the crucial factor. I update statistics on the underlying table but the original plan remains unchanged. I can still add the index and get the new plan and then drop it to get the original plan. So I create new statistics that match the fields and filter of the new index but the original plan remains. So we can conclude that the statistics are not the crucial factor. Only when I create the index will the plan change.
So the SQL 2014 optimizer is creating different plans based on the existence of an index without ever using that index. What's going on? More importantly, how can I get the second plan without having to create an unused index (which will be prone to periodically being tagged for deletion by the DBAs because it is entirely unused).
The optimizer's starting point and search is guided by a number of inputs, existing indexes being one among many. Given a complex query where it is not possible to explore every alternative the optimizer is capable of generating, the final plan will depend sensitively on the initial join order chosen, initial cardinality estimates, and the precise code path taken through the optimizer.
That is quite generally true, for all but the simplest queries, even when optimization does not terminate with the Time Out reason, as it does in both plans provided. Simply, the optimizer found something good enough after making a reasonable number of exploration and implementation attempts based on the initial costing and complexity of the query.
More importantly, how can I get the second plan without having to create an unused index (which will be prone to periodically being tagged for deletion by the DBAs because it is entirely unused).
You might be able to get the second plan using a plan guide or USE PLAN hint, depending on the definition of the filtered index. In any case, and especially if the query is not suitable for a plan guide, often the best way to get the highest quality plan from a complex query is to break it up into smaller queries, using small temporary tables (not variables, generally) to get useful statistics on the intermediate result sets, and give the optimizer the best quality information, and a reasonably-sized search space for each query.