Crazy large estimates from multiple joins on a CLRTVF

Apart from splitting this statement into pieces are there other things that can be done to push the estimate closer to reality?

The code runs fast but the estimate is contributing ( i think ) to the on and off problem i have with RESOURCE_SEMAPHORE_QUERY_COMPILE wait.

Switching the 'split' implementation to a TSQL based version the estimate is 1 row. (I think this behavior is known and has been for a while.)

A followup question to a larger audience about the ramifications of this large grant is here: https://social.msdn.microsoft.com/Forums/en-US/66fd3c68-e027-4593-9756-c5f8c1bfc0e5/what-are-the-possible-effects-of-a-query-memory-grant-that-is-too-large?forum=sqldatabaseengine

avatar image By Dan Holmes 725 asked Jan 21, 2016 at 06:28 PM
more ▼
(comments are locked)
10|10000 characters needed characters left

2 answers: sort voted first

Apart from splitting this statement into pieces

The session file did not contain actuals, but is there a reason you a resistant to this, at least as part of the solution? As written, the CTE is evaluated three times - splitting the same information each time. It seems sensible to me to materialize this intermediate result and index it to make the self joins efficient, and to reduce the repeated work.

are there other things that can be done to push the estimate closer to reality?

Again, there is nothing in the session file to say what 'reality' is here. There is no way to directly change the fixed cardinality estimation guess for a TVF, but you can simulate this by using TOP (@var) with OPTION (OPTIMIZE FOR (@var = literal_value)) appropriately, for example.

That said, there are two separate reasons for the sorts in this plan. The first is the choice of a many-to-many merge, which can be eliminated with a hint or by materializing the CTE and indexing as previously mentioned.

The second reason for sorting is the so-called performance spool. This optimizes for repeated calculations on the inner side of a nested loops join, but it is almost certainly counter-productive here. You can test for a plan without performance spools using trace flag 8690 e.g. OPTION (QUERYTRACEON 8690). The trick with TOP (@var) could also be used to control this.

Huge unused memory grants are a serious problem of course. You should be able to get a good plan, without sorting, using one of more of the techniques described above.

avatar image By SQLkiwi ♦ 6.6k answered Jan 23, 2016 at 07:24 AM
more ▼
(comments are locked)
avatar image Dan Holmes Jan 23, 2016 at 06:11 PM

"The session file did not contain actuals, but is there a reason you a resistant to this, " No. Really wanting alternatives because of the apparent nature of the estimates from a CLR implemented string split. I ultimately used a TSQL version of split() and got a better plan.

avatar image Dan Holmes Jan 26, 2016 at 04:38 PM

I like the TOP (@var) solution. I can certainly make a better guess with optimize for @var = 100 than the 1000000000 the optimizer is predicting. The querytraceon 8690 didn't change the plan.

10|10000 characters needed characters left

I blogged about this scenario. It has wreaked havoc on our production servers. In production i put the TSQL split function in use though i did see that i could control the memory requirement using the TOP (@var) OPTIMIZE FOR syntax. I don't know that i want to 'canonize' that pattern in our codebase though.

avatar image By Dan Holmes 725 answered Feb 09, 2016 at 06:50 PM
more ▼
(comments are locked)
10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question

Topics:

x631
x166
x26
x25

asked: Jan 21, 2016 at 06:28 PM

Seen: 246 times

Last Updated: Feb 09, 2016 at 06:51 PM