Rewrite options not to spill the sort

GokhanVarol 2013-07-12 13:35:26

I was watching this query run under maxdop 16, I was expecting to run it a lot quicker. I noticed task_internal_alloc value was over 51GB by the time the query completed. I tried adding a union all with high row statistics between the merge join and sort and but did not increase the memory grant for the query.
Thank you

link text

GokhanVarol 2013-07-12 15:01:15
A added an explicit sort by to remove the hash and that forces 100+GB memory grant, I am running this as a test right now.
alt text
GokhanVarol 2013-07-12 17:12:09
I am hoping to get this run within 30 minutes, did not come close to it yet.

link text

alt text_20130712_01.pesession

GokhanVarol 2013-07-12 21:54:31
I am shrinking the number of columns from these tables, that might help with duration
GokhanVarol 2013-07-13 04:41:39
By adding about max allowed filler columns into the data flow I was able to get around 4gb memory grant, but that's not enough.

link text

alt text

GokhanVarol 2013-07-13 04:53:32
Please ignore previous zip file, files appended to an existing zip file by mistake. I attached the correct one here.
link text

SQLkiwi 2013-07-13 09:26:27
There aren't too many options here to increase the memory grant for the sort. The idea of avoiding a spill is up against a number of factors. Not only is there a big skew between threads given the unbalanced data distribution from the driver table, but the memory allocation is also evenly split across 16 concurrent sorts (at DOP 16). Taken together, these factors mean that the memory grant would have to be quite enormous to ensure that no sort on no thread spilled at execution time.

The execution statistics provided by SQL Server do not indicate how many times a sort on a particular thread spilled (though you could get an idea of this using a trace). You have to ask yourself how well this query would perform even if per-thread sort spills could be avoided, and what the cost would be in terms of memory grant.

Ultimately, this query looks to sort 573 million wide rows. This is never going to be a fast operation no matter how much memory and how many threads are thrown at it. There are questions over the overall design, and in particular with a query that wants to perform several collocated merge joins on a common key (PclID, PclSeqNbr) and then also join to a quite different key (BatchDt, BatchSeq).

Refine the design so that keys are common (and ideally with a better data distribution per driver table entry), provide a covering index to avoid the sort, or split the query into separate stages for separate common join keys. It might even be possible to rewrite the query as a driver-within-a-driver, but the query is probably already too complex to maintain effectively.

Huge queries with lots of hints and code tricks are often a sign that something is more fundamentally wrong with the database design.

GokhanVarol 2013-07-13 13:37:50
Increasing DOP to 16 (instead of 8) in this case did not help as you mentioned. It divides the memory grants further per thread as you mentioned
I wil split the bottom section to seperate statement, index it etc and test if that would improve the duration.
Thank you

link text