Crazy large estimates from multiple joins on a CLRTVF

Dan Holmes 2016-01-21 18:28:14

Apart from splitting this statement into pieces are there other things that can be done to push the estimate closer to reality?

The code runs fast but the estimate is contributing ( i think ) to the on and off problem i have with RESOURCE_SEMAPHORE_QUERY_COMPILE wait.

Switching the 'split' implementation to a TSQL based version the estimate is 1 row. (I think this behavior is known and has been for a while.)

A followup question to a larger audience about the ramifications of this large grant is here:
https://social.msdn.microsoft.com/Forums/en-US/66fd3c68-e027-4593-9756-c5f8c1bfc0e5/what-are-the-possible-effects-of-a-query-memory-grant-that-is-too-large?forum=sqldatabaseengine

SQLkiwi 2016-01-23 07:24:35
> Apart from splitting this statement into pieces

The session file did not contain actuals, but is there a reason you a resistant to this, at least as part of the solution? As written, the CTE is evaluated three times – splitting the same information each time. It seems sensible to me to materialize this intermediate result and index it to make the self joins efficient, and to reduce the repeated work.

are there other things that can be done to push the estimate closer to reality?

Again, there is nothing in the session file to say what 'reality' is here. There is no way to directly change the fixed cardinality estimation guess for a TVF, but you can simulate this by using TOP (@var) with OPTION (OPTIMIZE FOR (@var = literal_value)) appropriately, for example.

That said, there are two separate reasons for the sorts in this plan. The first is the choice of a many-to-many merge, which can be eliminated with a hint or by materializing the CTE and indexing as previously mentioned.

The second reason for sorting is the so-called performance spool. This optimizes for repeated calculations on the inner side of a nested loops join, but it is almost certainly counter-productive here. You can test for a plan without performance spools using trace flag 8690 e.g. OPTION (QUERYTRACEON 8690). The trick with TOP (@var) could also be used to control this.

Huge unused memory grants are a serious problem of course. You should be able to get a good plan, without sorting, using one of more of the techniques described above.

Dan Holmes 2016-01-23 18:11:34
"The session file did not contain actuals, but is there a reason you a resistant to this, "
No. Really wanting alternatives because of the apparent nature of the estimates from a CLR implemented string split. I ultimately used a TSQL version of split() and got a better plan.
Dan Holmes 2016-01-26 16:38:39
I like the TOP (@var) solution. I can certainly make a better guess with optimize for @var = 100 than the 1000000000 the optimizer is predicting. The querytraceon 8690 didn't change the plan.
Dan Holmes 2016-02-09 18:50:52
I blogged about this scenario. It has wreaked havoc on our production servers. In production i put the TSQL split function in use though i did see that i could control the memory requirement using the TOP (@var) OPTIMIZE FOR syntax. I don't know that i want to 'canonize' that pattern in our codebase though.