Hash join spills to disk even though there is plenty of memory granted for the query

GokhanVarol 2017-05-19 11:48:54

We are running SQL 2014 Ent Sp2 with 1.5TB of memory and 64 cores. There is close to 17 GB memory granted to this single threaded query, max memory used is 2.2gb, even though the hash join has plenty of memory available it seems like it's not using it and spilling to disk. Any idea why?
Thank you in advance.

alt text

alt text
link text

SQLkiwi 2017-05-24 10:46:30
The showplan information is a little bit misleading if taken at face value.

The memory fractions help in planning the amount of memory grant:

Naively, the optimizer could calculate the memory grant needed for each memory-consuming operator in the plan and simply add them up to get the total query memory grant. This would be wasteful, and limit the ability of the server to process memory-consuming plans concurrently.

A more optimal strategy considers when each operator requires memory, and for how long. A classic example is the hash join. Memory is needed for the hash table during both the build and probe phases, but any memory used by prior memory-consuming iterators on the build side of the join can be reused when the hash join transitions from building the hash table to probing for matches. The build side tree is guaranteed to have finished executing when this switch occurs, so it is safe to reuse any prior build-side memory grant (not the hash table itself of course).

Equally, a fully blocking operator like a Sort can provide a grant reuse opportunity when the sort transitions from reading input rows to providing sorted output. The sort memory itself cannot be reused (of course) when the switch to output occurs, but any memory grant used by pre-sort operators can be.

So, the assessment of overall memory grant takes reuse opportunities into account. Calculating input memory fractions (proportion of the total memory grant required when reading rows) and output memory fractions (proportion of the total memory grant required when delivering rows) for each operator is part of this process.

Taking the hash join that spills as an example: When reading rows to build the hash table, this operator potentially has access to its input memory fraction of the total query memory grant. The execution plan shows this figure is 0.524805. It is not 1 because this hash join is running at the same time as the Right Anti Semi Join hash operator, which has an input memory fraction of 0.475195. The sum of 0.524805 and 0.475195 is 1.

The total memory grant for the query is 17,060,288KB, but this does not mean that the hash inner join can consume 0.524805 * 17,060,288KB ~= 8,952,960KB despite what the spill warning text says.

The hash table is in use during both input and output phases, so the query processor must also consider how the inner hash join shares memory grant with plan operators that may be active when the inner hash join is producing rows. This is the output memory fraction, which takes account of the memory needs of the hash anti semi join and the sort at node 11. Clearly this sort cannot be active when the hash inner join is building its hash table, only when it is probing.

The output memory fraction for the inner hash join is 0.0648054. Adding this to the sort's input fraction (0.876515) and the anti semi join's output fraction (0.0586793) again sums to 1.

The point is that an output fraction of 0.0648054 only gives the inner hash join 0.0648054 * 17,060,288 = 1,105,598KB of memory grant. The hash table must fit within this amount of memory, or it will spill. The used memory reported in the spill text is 1,677,120KB. This is the granted 1,105,598KB plus 79,696 spilled 8KB pages.

All this talk of memory grant reuse aside, the fundamental point is that memory grant calculations are often directly proportional to the estimated size of the data (though the number of unique values may also be important e.g. for hash aggregate). So, row count (cardinality) and average row size are important.

If the counts or sizes (or number of distinct values) are wrong at runtime, the memory-consuming operator is likely to spill. In your case, it may be that variable length data is involved. SQL Server assumes 50% of the maximum as the average size for variable length data. If your data e.g. varchar is more than 50% of the maximum on average, the calculation may be wrong enough to incur a spill.

All the normal actions appropriate to improving cardinality/density/average row size estimates may be applicable to your case, it is impossible to say from the anonymized information provided. Beyond that, using tricks or hints to inflate the memory grant may be needed.

My preference would be to refactor your design to encompass columnstore and batch mode processing. Aside from the usual benefits, it also allows for sorts and hashes to acquire additional memory dynamically (documented and supported trace flag(s) required).