How does optimizer come up with this row estimate
The actual plan mediacounts_long01_2546_sec_hash_spill.queryanalysis is from a previous run of a similar query to show the hash spills.
This is a partitioned table with 3300+ partitions and data within the partitions is not balanced, data belongs to different US counties and some partitions has only 15K rows and some others have 25M + rows. If there is any way to elevate the rowcounts in this query I would like to attempt it to avoid spills. If I do not use a driver table and force a nested loops for this query I get a query plan like Long_Single_16770_Sec_HashNoSpill_NoDriver.queryanalysis which takes a lot more resources and time to execute.
Cardinality estimation for the inner side of a correlated nested loops join uses average values for any correlated parameters.
The cardinality of the table is 147,993,895. The all density figure for column CntyCd is 0.0006464124. Multiplying the two gives the 95665.1 estimate. There is an adjustment to account for the change in table cardinality since statistics were collected, but the effect is too small to see in this case.
Basing the inner side estimates on average values for correlated parameters can cause problems with memory consuming operators where there is a big variance from the average at run time. You could look to optimize the inner side of the loop for the largest partition using the normal OPTIMIZE FOR with a variable trick.