Why the good plan, bad plan scenario.

GEORGEJO 2014-11-19 14:56:50

Hi,

The attached plan explorer session shows a good plan, bad plan scenario captured from a live customer site.

The bad plan XML was captured using sp_WhoIsActive and so includes the statement however the good plan XML was captured using a sql trace event SHOWPLAN XML and is missing the statement text, although the statement text is identical.

I'm just posting for interest and any ideas from someone on understanding the following.

1) The good plan was produced after when a customer upload screen was taking 30 seconds to complete an index was added ( [ix_KCONTAINER_Parent_includes] )

2) When a nightly update stats jobs is run the BAD PLAN is produced. Customers come in the following morning and find the system unusable due to the bad plan running in 30+ seconds.

3) When an alter index [ix_KCONTAINER_Parent_includes] rebuild statement is executed the good plan is produced and it runs in 1-2 seconds.

4) But, the good plan doesn't use the altered index so it just seems that it's good fortune that the optimizer picks a good plan?

All ideas gratefully received!

Thanks

Aaron Bertrand 2014-11-19 15:13:16
Haven't looked at the specific plans, but a recompile was probably triggered by changes to the indexes/statistics. Have you considered using OPTION (RECOMPILE) on the query, to see if the cost of compilation is worth the benefit of getting the better plan more reliably?
Mart 2014-11-19 15:16:13
Hey there Gerorgejo

Ignoring the plans themselves for a moment…

When you update the stats the query engine will need to update all the plans in the cache that previously relied on those stats. This means that a fresh plan, based on the new stats, will be generated – this will also be based on the parameters passed at that time.

When the index is rebuilt, any plan that uses that table will need to be checked to see if it can now use the index (the stats on the index will also be fresh). This potentially means that a few plans will be recompiled based on this index operation. As the plan is compiled it will have parameters to store which, it would appear, are 'better' than those used after the stats update.

Take a look at the parameters passed to the procedure after the stats are updated overnight and do the same to see what's in there now – I bet there's a difference and that's what's causing you problem.

Mart 2014-11-19 15:35:44
Looking at two plans, the above seems to hold true. The bad plan is having issues on two tables; rm2Folder and KCONTAINER, both of which are included in the where clause:

AND (t1812178.RmFolderDetails = 6
AND t1812178.Title LIKE '%Nursing%')

In the bad plan the data is estimated at 33.7M rows compared to 17.8K in the good plan. I suspect it's this paramater that's causing the problem i.e. the values shown above are not 'usual'.

As the compile time is 656ms (good) or 542ms (bad) and you mention the query takes between 1 or 2 seonds (good) and 30 secons (bad) then it may be worth using OPTION (recompile) so a freash plan is generated each time, or review the way the searches are done to make them more optimal/sargable.