Crazy large estimates from multiple joins on a CLRTVF
The code runs fast but the estimate is contributing ( i think ) to the on and off problem i have with RESOURCE_SEMAPHORE_QUERY_COMPILE wait.
Switching the 'split' implementation to a TSQL based version the estimate is 1 row. (I think this behavior is known and has been for a while.)
A followup question to a larger audience about the ramifications of this large grant is here:
The session file did not contain actuals, but is there a reason you a resistant to this, at least as part of the solution? As written, the CTE is evaluated three times – splitting the same information each time. It seems sensible to me to materialize this intermediate result and index it to make the self joins efficient, and to reduce the repeated work.
are there other things that can be done to push the estimate closer to reality?
Again, there is nothing in the session file to say what 'reality' is here. There is no way to directly change the fixed cardinality estimation guess for a TVF, but you can simulate this by using TOP (@var) with OPTION (OPTIMIZE FOR (@var = literal_value)) appropriately, for example.
That said, there are two separate reasons for the sorts in this plan. The first is the choice of a many-to-many merge, which can be eliminated with a hint or by materializing the CTE and indexing as previously mentioned.
The second reason for sorting is the so-called performance spool. This optimizes for repeated calculations on the inner side of a nested loops join, but it is almost certainly counter-productive here. You can test for a plan without performance spools using trace flag 8690 e.g. OPTION (QUERYTRACEON 8690). The trick with TOP (@var) could also be used to control this.
Huge unused memory grants are a serious problem of course. You should be able to get a good plan, without sorting, using one of more of the techniques described above.