How does SS determine JOIN operator row estimates

Dan Holmes 2013-05-22 14:10:56

The lower right most HASH join produces an estimate of 2k rows but the sources have 2k and 24k estimated row counts. The probe is a PK value and this could just of easily been a NL with a seek on a smaller row set.

I think that low number is causing problems upstream but i don't know how to influence it.

!post-compose edit!

to follow up a bit, i see that statistics are involved somehow but i can't make the numbers in vector or density match.

Also i forced a LOOP join here:

INNER JOIN dbo.tblActualFRTrips at ON ar.DailyRunID = at.DailyRunID

INNER LOOP JOIN dbo.tblActualFRRouteStopTimes arst ON at.DailyTripID = arst.DailyTripID

and performance is much better. I hate to leave that in production code are their other options?


SQLkiwi 2013-05-23 01:56:37
###Join cardinality estimation

There are many steps involved in join cardinality estimation. The base table statistics are modified by predicates on those tables (seeks in your case). This operation results in a new, derived histogram (and other statistical information) for each join input. These derived statistics are only used during query plan compilation – they are not visible to end users.

The derived histogram steps from both sides generally will not match up value-wise, so linear interpolation is used to align the steps. Then, additional logic and assumptions are applied to determine a cardinality for the join result. Obviously there are many things that can go wrong here; join cardinality estimation is quite a hard problem.

Fixing the Cardinality Estimate

Where a poor cardinality estimate affects the rest of a large plan, the most reliable way to fix it is to materialize the result of the join (just the columns and rows needed later) in a temporary table. People seem to resist this use of temporary tables for some reason, but it is a sound technique.

Using INNER LOOP JOIN also forces the order of the joins to follow the textual order in the query, just as if you had also specified OPTION (FORCE ORDER), so be very careful with that. Forcing the order of the joins may produce a plan that works well today, but things change.

On the other hand, if your knowledge of the system and data means you know following the textual join order will always be a reasonable proposition, now and for always, the join hint is fine – though I would add the redundant OPTION (FORCE ORDER) hint to make that behaviour more obvious to future readers.

My preference would be the temporary table.

Dan Holmes 2013-05-23 11:23:28
thanks though i was hoping there was something that could be used to influence or 'help' the join cardinality.

I ended up with the FORCE ORDER. SQL was waiting until the end of the plan to join a couple small tables to a much larger already joined set. The FORCE ORDER hint and SQL that joined the tables based on the known size of their output, reduced the time and reads significantly.

I know this is potentially dangerous but in this case, i know the data pattern and for these tables know they will grow linearly and in that same relation to each other.

Kevin 2013-06-11 16:08:21
Why not try the temporary table approach, Dan? Is that out of the question with your application? Thanks, -Kev