Need help interpreting Exection Plans to improve performance

Conrad 2013-04-15 13:33:00

I have a moderately complex nested T-SQL query, which for the most part has good performance, except at the start of each of my 2-week pay periods (where there are fewer records to process!) the query bogs down orders of magnitude slower. Then at some point in the bi-weekly period, SQL Server (2008) determines the execution plan should change and everything is back to normal speeds. Attached are the actual execution plans for the "good" and "bad" scenarios – I need help interpreting them to figure out what to do so that the "good" scenario is used all the time. Thanks for any assistance.

Good Execution Plan

Bad Execution Plan

Aaron Bertrand 2013-04-15 13:39:01
I haven't looked at the plans yet, but for volatile queries like this, it is often worth taking the compile hit and adding RECOMPILE to the statement. This should prevent poor plan choices from getting "stuck" even though you might pay a small compile price each time.
Conrad 2013-04-15 15:01:34
@Aaron – I will give the OPTION(RECOMPILE) a try and see what happens.
Conrad 2013-05-01 14:45:20
@Aaron – unfortunately that did not work.
Dan Holmes 2013-04-15 14:01:46
The estimates coming from the "Nested Loops Join" which is the 6th node from the left of the 'SELECT' node has very different estimates than the same node in the good plan. There is a statistics issue here most likely or parameter sniffing (which Aaron's idea would eliminate).

The top branch of the plan that gets the employee information is different between the good and bad plans. The bad plan expected one row from the employee data and the good plan expects 271. Those numbers change the access patterns to the rest of the data.

As the days move one what data changes in those tables?

Conrad 2013-04-15 15:00:49
@Dan Holmes – all table contents are pretty static in qualtity except for the contents of EmployeeHours, which increases linearly throughout the 2 week period. Then the number of rows of EmployeeHours returned from this query resets to 0.
Dan Holmes 2013-04-15 18:51:45
How is this executed? I don't see any parameters in the query so this must be created dynamically and executed throughout the weeks. If that is the case, the same plan will be used each time. As the data changes the estimates are become more like the actuals and your performance is better.

I think Aaron's OPTION(RECOMPILE) will give you more consistent results.

Conrad 2013-04-15 19:37:02
It is indeed created dynamically. Unfortunately, the same plan is not used each time. The change in the number of rows from EmployeeHours queried is the variable which causes the bad execution times from about day 2-4 of the biweekly period. (Today is day 1 of the period; I'm hoping to test the proposed solution over the next couple days.)
SQLkiwi 2013-04-19 06:31:07
One way to avoid the performance problem would be to parameterize the query (rather than generating text containing literal values), capture a plan shape that always works well (presumably the 'good plan' shape), and force future executions of the parameterized query to use that shape with a plan guide. Simply adding OPTION (RECOMPILE) to the statement may work as well, depending on what the underlying issue is – though this comes at the cost of fully compiling an execution plan on every execution of course.

The root cause of the problem is that the query is not very optimizer-friendly as written. The optimizer does a remarkable job in the 'good plan' case in spite of this. Nevertheless, that outcome is likely as much down to luck as anything.

By 'not optimizer-friendly' I mean the query uses some unusual constructions that typically fall outside the optimizer's cost model and therefore result in less reliable plan generation. The specific 'unusual constructions' are the complex AND/OR predicates at various levels (making cardinality estimation essentially guesswork); and the (SELECT COUNT(*) FROM …) = 0 syntax, which is an odd way to write a NOT EXISTS clause. Similarly, the subquery (SELECT columns FROM …) IS NULL is another odd way to express NOT EXISTS. This odd form also requires a Stream Aggregate + Assert to check the subquery only produces a maximum of one row on each correlated execution.

All these things (and more) make this query less readable for humans as well as making life hard for the query optimizer. My preference would be to rewrite and simplify the query to make the logic clearer – helping the optimizer produce more reliable and stable plans while making future maintenance easier too.

Specifically, I would rewrite the odd code sections using the appropriate NOT EXISTS syntax. I would also break the query into up to three parts, using temporary tables to store the small intermediate result sets generated. This is a very lightweight technique, and gives the optimizer crucial cardinality and statistical distribution information.

The first logical part of the query joins the employee tables (list, hours, and schedules) applying the relevant predicates as it goes. The results (just the columns needed!) should be stored in a temporary table and indexed appropriately.

The second stage checks for matches (NOT EXISTS) in the Master tables. This query should be easy to write naturally given the results from stage 1. You should also be able to avoid the index spools currently present in the 'good plan'. Building an index on the fly for every execution is normally significantly less efficient than providing a permanent useful index on the base table.

The third stage (if necessary – the result from previous stages is empty in both your examples) is another NOT EXISTS operation to replace the COUNT(*) = 0 code.

This may seem like more work, but it is the difference between applying a band-aid and truly fixing the problem in a robust and permanent way. It is almost always preferable to write queries that will optimize well rather than writing complex hard-to-analyze queries that may suddenly vary in performance at any time.

Conrad 2013-05-01 14:47:19
Changing my (SELECT COUNT(*) …) = 0 to NOT EXISTS (SELECT * …) seems to make a big difference in the general case. I haven't had a chance to try it when the query becomes extremely slow. I will test it again at that point.