Filling in the gaps: there has to be a better way
(edit based on Paul's reply about OPTION (QUERYTRACEON 9130))
That really helped, and i changed some indexes based on what was exposed. This plan is a 3 year query that is down to 2.5 min from 8. This the same query as the original post except the WHERE is RunDate >= 20110101.
I think i am at the end of optimization but if there is more to get, i love more ideas. New Plan
I'm not entirely convinced that the 668,661 rows are a problem in this query.
- From a logical I/O point of view, the calendar table incurs only 4178 reads across its 822 iterations.
- The under-sized input estimate to the following hash aggregate seems like it might cause an under-allocation of memory, but this is a grouping aggregate, where the memory used depends on the number of distinct values encountered. Looking at the output, it produces 242 distinct values versus 1350 expected, so it is probably not a problem. The lack of SLEEP_TASK waits backs this up.
This might all be different when the query is run with a different filter value on the view, but I have no way of knowing about that. If the query ran particularly slowly with a certain parameter value, I feel sure you would have submitted that instance for analysis instead, right? ;c)
The execution time of 20.6 seconds for this query, with all data in memory, makes me think there are other issues here. There just aren't enough obviously-expensive operations to make me think that is reasonable.
The query plan suffers the usual problem with single-statement, multi-CTE queries: there are multiple references to the stops CTE, resulting in it being evaluated multiple times. Notice the repeated sections of the plan and compare the operators to the stops CTE. It seems wasteful to do this work multiple times. Again, I have no idea how significant this is without being able to test it myself.
This is of particular concern because the repeated plan sections contain a sort that has an under-estimate of its input cardinality. You are running SQL Server 2008 R2, so the query plan does not contain information about any spills this might be causing. As a side note, SP2 is a couple of years old now, you might think about applying SP2 CU12 (build 10.50.4305) though there might be an SP3 on the way, I don't know. I don't think the sorts are spilling right now, because the Wait Stats don't show any IO_COMPLETION waits, but it would be remiss of me not to mention it.
Another thing that I would check is those seeks with residual predicates. Run the query again on a test system with OPTION (QUERYTRACEON 9130) and check for any large inputs to the new Filters, followed by a significant reduction in cardinality afterward. Sometimes this can reveal a hidden inefficiency that could be affecting performance more than you realize.
One intriguing, option here is to replace the view with a multi-statement table-valued function. I don't normally recommend these for all sorts of reasons, but they can be useful to break up a problem view in some cases. This is an option if the client user or application can tolerate a change in syntax to pass the rundate as a parameter instead of in a WHERE clause:
SELECT ... FROM dbo.fn_ExportLayer_EmptyMinutesOnRuns (@rundate);
The basic approach is to split the query into multiple table variables inside the function, making good use of table variable indexes, reuse of stored data (think the stops CTE).
Cardinality estimates can usually be made good enough by adding OPTION (RECOMPILE) to statements inside the function. This also allows the parameter-embedding optimization for @rundate and so on, so base table statistics can still be useful.
Debugging and plan analysis is definitely more challenging, but a Profiler trace or extended events can reveal post-execution plans within the function body.
Admittedly, it is difficult to make good use of parallelism within a TVF.
If the query can be broken into parts with row counts limited to a sensible level, this approach can be worth exploring. It all depends.
Comments on the 3-year plan
- The large Filter and Eager Index Spool over the EnhancedBreaksActual table is a concern. The predicates there are complex, and I don't know enough about the data distribution and structure of that table to make a firm recommendation, but it seems likely that the integer conversion on BreakDate is unhelpful. You could consider CONVERT(date, BreakDate) instead of the char(8) style 112 conversion, then create a computed column for that expression on EnhancedBreaksActual and index it (with the right extra keys and includes). There will be a couple of residuals, but assuming BreakDate is selective, this would likely be beneficial.
- You should trace this query execution using Profiler looking for Sort or Hash spill events, just in case. I don't believe these are occurring, but it is best to check.
- The parallel many-to-many merges are inefficient, generating the excessive I/O you see on the Worktable in the Table I/O tab (1,758,550 scans involving 9,458,235 work table rows). Hash or loop joins might be worth exploring, but this is difficult to enforce in a view with CTEs without accidentally forcing the entire query's join order (i.e. using join hints). This also might not work well for different rundate ranges. The m-m joins could also be avoided by making one of the inputs unique, but is hard to know from here whether that is logically possible or not. Of course that might be a lot clearer (and indexable) if the query were broken up 😉
- The query is still executing the tblRunActual logic four times, resulting in 3,577,801 accesses and 30,852,831 logical reads, even after the effects of the lazy index spool on one of the instances. This largely-duplicated effort is likely the biggest cost remaining in the query.