Filling in the gaps: there has to be a better way

Dan Holmes 2014-05-09 20:26:08

Roughly in the middle of this this plan is a big pipe of 600k rows. I know what the sql is but i don't know how to rewrite it. The performance is alright, i can run this for 3 years (290k+) rows and it will finish in <8 min. If i can rewrite to make that better certainly i will. I do need to keep it to a view though, so breaking it into smaller pieces isn't an option. Attached is the TSQL for the view. The problem is the CROSS APPLY in the grouped_by_hour CTE. Any ideas for a rewrite are most welcome.

thanks

(edit based on Paul's reply about OPTION (QUERYTRACEON 9130))

That really helped, and i changed some indexes based on what was exposed. This plan is a 3 year query that is down to 2.5 min from 8. This the same query as the original post except the WHERE is RunDate >= 20110101.

I think i am at the end of optimization but if there is more to get, i love more ideas. New Plan

SQLkiwi 2014-05-11 00:18:50
Hi Dan,

I'm not entirely convinced that the 668,661 rows are a problem in this query.

  1. From a logical I/O point of view, the calendar table incurs only 4178 reads across its 822 iterations.
  2. The under-sized input estimate to the following hash aggregate seems like it might cause an under-allocation of memory, but this is a grouping aggregate, where the memory used depends on the number of distinct values encountered. Looking at the output, it produces 242 distinct values versus 1350 expected, so it is probably not a problem. The lack of SLEEP_TASK waits backs this up.

This might all be different when the query is run with a different filter value on the view, but I have no way of knowing about that. If the query ran particularly slowly with a certain parameter value, I feel sure you would have submitted that instance for analysis instead, right? ;c)

The execution time of 20.6 seconds for this query, with all data in memory, makes me think there are other issues here. There just aren't enough obviously-expensive operations to make me think that is reasonable.

The query plan suffers the usual problem with single-statement, multi-CTE queries: there are multiple references to the stops CTE, resulting in it being evaluated multiple times. Notice the repeated sections of the plan and compare the operators to the stops CTE. It seems wasteful to do this work multiple times. Again, I have no idea how significant this is without being able to test it myself.

This is of particular concern because the repeated plan sections contain a sort that has an under-estimate of its input cardinality. You are running SQL Server 2008 R2, so the query plan does not contain information about any spills this might be causing. As a side note, SP2 is a couple of years old now, you might think about applying SP2 CU12 (build 10.50.4305) though there might be an SP3 on the way, I don't know. I don't think the sorts are spilling right now, because the Wait Stats don't show any IO_COMPLETION waits, but it would be remiss of me not to mention it.

Another thing that I would check is those seeks with residual predicates. Run the query again on a test system with OPTION (QUERYTRACEON 9130) and check for any large inputs to the new Filters, followed by a significant reduction in cardinality afterward. Sometimes this can reveal a hidden inefficiency that could be affecting performance more than you realize.

One intriguing, option here is to replace the view with a multi-statement table-valued function. I don't normally recommend these for all sorts of reasons, but they can be useful to break up a problem view in some cases. This is an option if the client user or application can tolerate a change in syntax to pass the rundate as a parameter instead of in a WHERE clause:

SELECT ... 
FROM dbo.fn_ExportLayer_EmptyMinutesOnRuns (@rundate);

The basic approach is to split the query into multiple table variables inside the function, making good use of table variable indexes, reuse of stored data (think the stops CTE).

Cardinality estimates can usually be made good enough by adding OPTION (RECOMPILE) to statements inside the function. This also allows the parameter-embedding optimization for @rundate and so on, so base table statistics can still be useful.

Debugging and plan analysis is definitely more challenging, but a Profiler trace or extended events can reveal post-execution plans within the function body.

Admittedly, it is difficult to make good use of parallelism within a TVF.

If the query can be broken into parts with row counts limited to a sensible level, this approach can be worth exploring. It all depends.

Comments on the 3-year plan

  1. The large Filter and Eager Index Spool over the EnhancedBreaksActual table is a concern. The predicates there are complex, and I don't know enough about the data distribution and structure of that table to make a firm recommendation, but it seems likely that the integer conversion on BreakDate is unhelpful. You could consider CONVERT(date, BreakDate) instead of the char(8) style 112 conversion, then create a computed column for that expression on EnhancedBreaksActual and index it (with the right extra keys and includes). There will be a couple of residuals, but assuming BreakDate is selective, this would likely be beneficial.
  2. You should trace this query execution using Profiler looking for Sort or Hash spill events, just in case. I don't believe these are occurring, but it is best to check.
  3. The parallel many-to-many merges are inefficient, generating the excessive I/O you see on the Worktable in the Table I/O tab (1,758,550 scans involving 9,458,235 work table rows). Hash or loop joins might be worth exploring, but this is difficult to enforce in a view with CTEs without accidentally forcing the entire query's join order (i.e. using join hints). This also might not work well for different rundate ranges. The m-m joins could also be avoided by making one of the inputs unique, but is hard to know from here whether that is logically possible or not. Of course that might be a lot clearer (and indexable) if the query were broken up ๐Ÿ˜‰
  4. The query is still executing the tblRunActual logic four times, resulting in 3,577,801 accesses and 30,852,831 logical reads, even after the effects of the lazy index spool on one of the instances. This largely-duplicated effort is likely the biggest cost remaining in the query.
Dan Holmes 2014-05-12 15:42:42
updated the original post with a new plan based on the filters exposed by the option trace hint. the new plan is for 3+years and runs in 2.5 minutes. This was the 8 min query i mentioned in the OP. If there is more to get out of this i would love it but i don't see anything else that is just obvious (didn't think of the filter thing though either).
SQLkiwi 2014-05-12 17:15:58
Updated my answer with comments on the new plan. There is always more than can be done, though of course there comes a point where performance is 'good enough'. 8 -> 2.5 minutes is a very useful improvement, but only you know whether it will suffice. I still like the TVF idea ๐Ÿ™‚
Dan Holmes 2014-05-12 17:27:13
The consumer of this query can't use the UDF syntax. It is a 3rd party app with a built in object discovery mechanism that i can't control.

I did get Sort and Hash warning emitted in profiler during this execution.

All that said, i think i am content with the performance. The normal usage of this view won't be to pull 3 years of data. I used that example because that is what started this performance tuning. That query started at 1230am and didn't finish by 730am. It locked out all other users (no RCSI so readers block writers). So from 7 hours and counting to 2.5 mins al lin a extreme use case – i think i am satisfied – for now.

Sujai Karnam 2014-05-12 23:13:24
Hi Paul, Your above response has a ton of good information for us to learn! Thanks for sharing!
I was analyzing this query and plan against your response (learning :)) very closely and had one question:
1. You mentioned "Wait Stats don't show any IO_COMPLETION waits", how would you find the Wait Stats info in the attached plan if at all?
Sujai Karnam 2014-05-12 23:28:09
Oops! ignore my premature question..just realized that this is a Plan Explorer PRO feature.
SQLkiwi 2014-05-12 23:50:07
Yes indeed, automatic wait stats collection is a PRO feature.