nested loop join – outperforms its buddies

wayne 2013-08-13 09:42:53

ran into a performance problem the other day where the statement was aggregating a table with 115 million rows joined to another table (5k rows) across 5 columns. The otpimiser decided to use a hash join which resulted in around 5 minutes to complete.
using a forced loop hint the query completed in 1 minute and 40 seconds. it looks like the optimiser wont consider the loop join as a result of a multiple column join but i may be wrong in that assumption. in any case i had a question that i have yet to find an aswer.
The theory behind the nested loop join is that it is optimised for a small to large table join. I have however seen this algorythm been used where both tables have similar sizes and row counts of up to 10 million. in addition to this operator being selected the cost associated is always low. Does anyone know the variables at play when the optimiser decides to use this algorythm over others because the theory doesnt match what i have seen.
SQLkiwi 2013-08-13 14:08:23
The optimizer chooses a physical join operator based on cost estimates. These cost estimates are calculated using a model that has has been shown to produce reasonably good plans quickly for a wide variety of queries.

The costing model makes a number of assumptions and has a number of limitations.

One of the assumptions is that every query starts executing with a cold data cache. If your query touches data and index structures that are largely already in memory, the model will over-estimate the cost of a nested loop solution. On the other hand, if the optimizer produced a plan with very poor cold-cache performance, people would rightly be upset.

Another (related) assumption is that random I/O is much more expensive than sequential I/O. Hash and merge join (without sort) tend to produce sequential data access patterns. Nested loops tend to produce random I/Os on the inner side of the join.

Another thing that works against nested loops join is that the costs for parallel execution are quite biased against this physical join type. CPU costs for parallel hash join are reduced by the expected degree of parallelism. Operators on the inner side of a parallel nested loops join are not CPU-scaled in the same way.

I should also mention that the cost model does not directly account for the particular performance characteristics of the server hardware. The cost estimates for CPU and I/O use relatively fixed magic numbers which may not bear much resemblance to your particular system. The amount of memory available to the server is an input to the cost calculations (as is the number of processors expected to be available) but the general point is still true.

There is no particular correlation between physical join type and the number of columns in the join predicate(s).

I would expect that all or most of the factors described above were in play when you compared the performance of particular queries.