Why is the merge join in Node 48 [lowest right Inner Join] estimating a loss of 20% of the rows?
Both tables JOINed in the merge [Object 2 and Object 5] are completely repopulated [TRUNCATE then INSERT] a few hours (within the same jobstream) prior to this SELECTion.
The most likely causes are:
- Sampled or out-of-date statistics on the two large input tables.
- The effect on the KeyToSourceFile statistics histogram of the independent filters [ClaimType]='E' AND [PayStatus]<>'X' on the Claims Detail table. Filtered statistics could help there.
Join estimation is complex, but in outline: SQL Server takes the histograms for KeyToSourceFile (scaled for the independent filters, or from a filtered statistics object if available) from both tables, aligns each step (key boundary) using linear interpolation where necessary, then compares step-by-step to estimate how many rows will join.
This is a statistical process, subject to simplifying model assumptions and accumulating errors, so you should not in expect a perfect estimate, particularly for large tables, and/or where data changes have occurred since the statistics were collected.
FYI the SQL Server 2014 cardinality estimator sometimes produces less 'surprising' estimates in this sort of situation.
The post-execution plan will be interesting, though it is likely to show the cause of poor performance is not directly related to this particular join estimate.
Update based on post-execution plan
This query is executing the scalar UDFs [dbo].[IsStrNull] and [dbo].[ufn_years_old] hundreds of millions of times.
The current implementation of scalar UDFs in SQL Server means each execution of the function (per row!) runs within a new execution context (and T-SQL stack frame). This is approximately as bad as running a separate query for each row for each scalar function. These scalar functions should probably be directly in-lined or converted to in-line TVFs.
As another undesirable side-effect, using a scalar UDF also prevents parallelism, something one would expect to see in a query processing 50 million rows through 16 joins. The problems of scalar (and multi-statement) UDFs have been written about extensively over the years, here is one example from Hugo Kornelis:
There are also a couple of odd choices of physical join type (nested loops) where a hash or merge might be expected to be better-performing. The source query text is unavailable (truncated in the provided file) so it's hard to know exactly why that choice is being made. Perhaps an OUTER APPLY construction that the optimizer is unable to unnest.