Huge performance hit when adding 1 table join

Conrad 2013-09-24 15:20:07

In a recently updated query for my payroll software, I am seeing a huge performance hit – execution time goes from 3 seconds to ~100 seconds (MS SQL Server 2008 Std SP3 x64). I had to add the line

JOIN EmployeeList AS el ON eh.EmployeeId = el.EmployeeId

due to the fact I added

el.FixedScheduleWeeks = 1 AND

to the query. (As mentioned, if I remove the above, execution time drops back down to usual times.) Attached is the actual execution plan which shows that essentially all the time is being spent in key lookup in EmployeeSchedules, but I don't know how to interpret that. Thanks for any suggestions here.

SQLkiwi 2013-09-24 16:56:56
The percentages reported by SQL Server in execution plans are always estimates (used by the query optimizer to choose a plan shape). SQL Server does not currently report runtime operator costs, so Plan Explorer can't show that information either.

The Expensive Operation

The really expensive operation in your plan is the Clustered Index Scan of EmployeeList:

Expensive Scan

This full table scan is executed 460,347 times, although the estimates indicates that the optimizer expected it to run only once.

Cause

The optimizer expects 301 rows from the (one-time) execution of the scan, but the inner side of the join actually produces 138,564,447 (460,347 * 301) rows! The optimizer chooses a plan that would work well for one row, but it is disastrous for the number of rows actually encountered.

Action

The key to improving this query plan is to improve the accuracy of the information available to the optimizer. Updating statistics on the base tables should help as a first step. To see why, look at the Index Seek and Key Lookup on EmployeeHours at the far right of the query plan:

Out-of-date statistics

Written in SQL, this part of the plan is:

SELECT
    EH.EmployeId,
    EH.TimeIn
FROM dbo.EmployeeHours AS EH
WHERE
    EH.TimeIn >= '2013-09-15 00:00:00.000'
    AND EH.TimeIn < '2013-10-01 00:00:00.000'

If statistics have not been updated on that table since mid-September, the out-of-date statistics will tell the optimizer that no rows exist for the specified time range.

This is just one example – it may well be that other parts of the plan were built on out-of-date statistical information too.

Future Steps

There is plenty more to be done with this query after updating statistics, but that is the first step. Going forward, you should look at the possibility of eliminating Key Lookups where appropriate, and expressing the query more simply. You should also look closely at the AND and OR logic there. You may be getting correct results, but I would personally be a lot more explicit about precedence using parentheses.

Another popular idea is to break the query down into two or more steps using temporary tables (#tables) to store small intermediate results. This makes life easier for the optimizer, and can provide additional statistical information that is just not possible with a single large query. A happy side-effect is that the result of the refactoring will probably be more comprehensible to humans as well.

Conrad 2013-09-24 17:47:17
Running update statistics fixed the problem – thanks much for the suggestion.