Where does Sql Server get the cardinality estimates on Table3.Index3 and Table2.Index5

Joe O'Brien 2014-03-26 23:23:47

Trying to understand how Sql Server uses statistics to figure out the cardinality of the seeks on Table3.Index3 and Table2.Index5. The problem with this entire plan is the cardinality estimates are bad and get worse with easy successive join.

I can post the output of dbcc show_statistics – but they are not even close to the estimates on those two tables.

SQLkiwi 2014-03-27 00:24:10
For joins with a single predicate, SQL Server (7.0 to 2012) estimates the selectivity of the join by aligning the two input histograms step by step, using linear interpolation where necessary.

In the simplest case, the two inputs to the join have unfiltered histograms provided by statistics on base tables. For joins above the leaf level of the plan, or where filters are applied to the base tables, the input histograms have been modified by the intervening operations.

Where the base tables have filters applied, simple containment is assumed for the join selectivity estimation. The base histograms are first scaled down by the selectivity of the base table predicate(s), then the histograms are aligned to estimate join selectivity.

SQL Server 2014 uses base containment instead of simple containment. The selectivities of the base table predicates and join are calculated separately and simply multiplied together. SQL Server 2014 uses a coarse alignment of histogram steps for join selectivity estimation based on minimum and maximum values rather than aligning each step using linear interpolation.

Estimating join selectivity is a difficult problem. Try deriving a good estimate yourself from the raw statistical data.

Note: I'm talking about join selectivity so much because the question asked about estimation for the inner side of correlated nested loops joins.

mjswart 2014-05-14 17:49:06
Hi Paul,
After histograms are lined up, what then? Hypothetically, say one table's histogram step has a range of 100 containing 30 rows and 10 distinct values. And the other corresponding table has a step with the same range and 20 rows with 2 distinct values. I can't figure out (by google or by trial and error) how much this histogram step contributes to the estimated row count.

I only wonder because I'm reading articles about the new CE in 2014 and it seems easy enough to follow along with most examples. But I get stuck whenever I read something like "histograms are merged for join selectivity"

SQLkiwi 2014-05-18 19:29:18
The details are complicated, but to give you a flavour: if aligned histograms are used to estimate cardinality for a single-column equi-join, the per-step logic is:

C = (EQ_ROWS1 * EQ_ROWS2) + MIN((RANGE_ROWS1 * AVG_RANGE_ROWS1), (RANGE_ROWS2 * AVG_RANGE_ROWS2))

If frequency is used, the whole calculation is:

C = MIN((C1 * F2), (C2 * F1))

Where C = input cardinality, F = "All density" in the density vector of DBCC SHOW_STATISTICS.