Can somebody tell me, why SQL server spills to TempDB in this example (plan 15) after the HASH JOIN to t_vertraege (second join from right side)?
Both - estimated data and estimated records are slightly greater than the actual. Ok, [kontokorrent_einzel] is a big table (6.5 mio records), so the data size is about 1 GB, but the server has 128 GB, so this should not be a problem.
BTW: the query runs ~ 15 min and I have no idea, how to make it faster. Originally it were all INNER JOINs, but (although I have actual statistics) it "believes" that it will return only ~200k when I use INNER JOINs. Without the forced HASH JOINs the query will need ~ 30 min (because it makes ~6 mio NESTED LOOPs to each of the both joins to t_termine)
By Thomas Franz 16 asked Jun 09, 2016 at 01:47 PM
The answer lies on the Plan Tree tab, which shows the number of rows processed by each thread at the hash join in question:
Memory is reserved for the hash join on the basis of the expected number and size of the rows. This reserved memory is acquired at the beginning of query execution. It cannot be increased once execution has started, no matter how much free memory the instance might have available at that time. This is by design.
For a row-mode parallel hash join, the memory grant is also evenly split between the hash table instances. With four threads, this means 25% each. Where the rows are skewed, with more rows ending up on one thread than another, the thread that exceeds its allowance will spill. Memory cannot be shared between threads.
In your plan, rows are allocated to hash join threads using hash partitioning of the
When the next service pack for 2014 ships, it will likely include the plan improvements shipped for 2012 SP3, which shows more detailed spill information, including which threads spilled in a parallel execution.
You should not use join hints like
If you want to force only hash joins for this query, use
This partly explains why your query is so slow: you are forcing SQL Server to access the tables in a particular order, and forcing it to build the hash table on the much larger inputs. This is wasteful (reserving 5.6GB of memory) and inefficient.
By SQLkiwi ♦ 6.6k answered Jun 14, 2016 at 12:57 PM