Where does Sql Server get the cardinality estimates on Table3.Index3 and Table2.Index5
I can post the output of dbcc show_statistics – but they are not even close to the estimates on those two tables.
In the simplest case, the two inputs to the join have unfiltered histograms provided by statistics on base tables. For joins above the leaf level of the plan, or where filters are applied to the base tables, the input histograms have been modified by the intervening operations.
Where the base tables have filters applied, simple containment is assumed for the join selectivity estimation. The base histograms are first scaled down by the selectivity of the base table predicate(s), then the histograms are aligned to estimate join selectivity.
SQL Server 2014 uses base containment instead of simple containment. The selectivities of the base table predicates and join are calculated separately and simply multiplied together. SQL Server 2014 uses a coarse alignment of histogram steps for join selectivity estimation based on minimum and maximum values rather than aligning each step using linear interpolation.
Estimating join selectivity is a difficult problem. Try deriving a good estimate yourself from the raw statistical data.
Note: I'm talking about join selectivity so much because the question asked about estimation for the inner side of correlated nested loops joins.
C = (EQ_ROWS1 * EQ_ROWS2) + MIN((RANGE_ROWS1 * AVG_RANGE_ROWS1), (RANGE_ROWS2 * AVG_RANGE_ROWS2))
If frequency is used, the whole calculation is:
C = MIN((C1 * F2), (C2 * F1))
Where C = input cardinality, F = "All density" in the density vector of DBCC SHOW_STATISTICS.
After histograms are lined up, what then? Hypothetically, say one table's histogram step has a range of 100 containing 30 rows and 10 distinct values. And the other corresponding table has a step with the same range and 20 rows with 2 distinct values. I can't figure out (by google or by trial and error) how much this histogram step contributes to the estimated row count.
I only wonder because I'm reading articles about the new CE in 2014 and it seems easy enough to follow along with most examples. But I get stuck whenever I read something like "histograms are merged for join selectivity"