Huge performance hit when adding 1 table join
JOIN EmployeeList AS el ON eh.EmployeeId = el.EmployeeId
due to the fact I added
el.FixedScheduleWeeks = 1 AND
to the query. (As mentioned, if I remove the above, execution time drops back down to usual times.) Attached is the actual execution plan which shows that essentially all the time is being spent in key lookup in EmployeeSchedules, but I don't know how to interpret that. Thanks for any suggestions here.
The Expensive Operation
The really expensive operation in your plan is the Clustered Index Scan of EmployeeList:
This full table scan is executed 460,347 times, although the estimates indicates that the optimizer expected it to run only once.
Cause
The optimizer expects 301 rows from the (one-time) execution of the scan, but the inner side of the join actually produces 138,564,447 (460,347 * 301) rows! The optimizer chooses a plan that would work well for one row, but it is disastrous for the number of rows actually encountered.
Action
The key to improving this query plan is to improve the accuracy of the information available to the optimizer. Updating statistics on the base tables should help as a first step. To see why, look at the Index Seek and Key Lookup on EmployeeHours at the far right of the query plan:
Written in SQL, this part of the plan is:
SELECT EH.EmployeId, EH.TimeIn FROM dbo.EmployeeHours AS EH WHERE EH.TimeIn >= '2013-09-15 00:00:00.000' AND EH.TimeIn < '2013-10-01 00:00:00.000'
If statistics have not been updated on that table since mid-September, the out-of-date statistics will tell the optimizer that no rows exist for the specified time range.
This is just one example – it may well be that other parts of the plan were built on out-of-date statistical information too.
Future Steps
There is plenty more to be done with this query after updating statistics, but that is the first step. Going forward, you should look at the possibility of eliminating Key Lookups where appropriate, and expressing the query more simply. You should also look closely at the AND and OR logic there. You may be getting correct results, but I would personally be a lot more explicit about precedence using parentheses.
Another popular idea is to break the query down into two or more steps using temporary tables (#tables) to store small intermediate results. This makes life easier for the optimizer, and can provide additional statistical information that is just not possible with a single large query. A happy side-effect is that the result of the refactoring will probably be more comprehensible to humans as well.