How to fix Tempdb spill warning and reduce the long running query timing?

Anamikha1 2018-06-08 13:35:15

Kindly help me to optimize this plan and reduce the execution time from 13 seconds to <1 sec.Also how to optimize the plan and fix the tempdb spill warning. Have done the below steps 1.Rebuild index for each table 2.Update statistics for each table Hoowever, still the execution time is not reduced.
Hugo Kornelis 2018-06-09 08:52:47
Not sure what's wrong, but the attachment won't open for me. I have asked SQL Sentry to look into this.
Anamikha1 2018-06-09 13:20:21
Dear Hugo Kornelis,
I tried to upload using the tool.That is why the duplicate post.However, it is throwing error and not able to edit my post and attach as well.Kindly give your mail id to which I will send the attachment.
Thank you!!!
Hugo Kornelis 2018-06-10 08:48:45
Hi Anamikha,

SentryOne is aware of the issue with the forum. They could not give me an ETA for when it would be dealt with.

A somewhat limited way to share your execution plan is to use the "paste the Plan" webservice on Brent Ozar's website: https://www.brentozar.com/pastetheplan/
That is just the plan, not the additional information that Plan Explorer captures, but it might give the people here enough information to help you out.

Anamikha 2018-06-11 12:59:41
Please find the below link for the same.

https://www.brentozar.com/pastetheplan/?id=B1fZfg2lQ

Hugo Kornelis 2018-06-11 18:48:07
Thanks! Here are a few observations. I could not relate anything back to the query – the query shown on PasteThePlan is taken from the execution plan itself which only shows the start of the query, not the whole. Most of what I can see is the SELECT list, most of the pain is probably in the FROM clause which is past the point where the query got truncated. 🙁

Based on the plan you seem to have huge mis-estimation issues. All the way to the right, Table ORGANIZATION_DATA is read with a filter on ORGANIZATION_CD = '04398', returning one row. This is then joined to ORGANIZATION_TYPE and REGION_AREA (each returning one match, presumably because you join on their primary key).

But then the fun starts: the next join is to PERSONNEL_DTL, with a join on column ORGANIZATION_DATA_ID and an additional requirement that SOURCE_SYSTEM_CD = 'DEALERDAILY'. This join is estiamted to return 267 rows, in reality there are over 652,986. (The results of this are joined to PERSONNEL_TYPE on the primary key but that's irrelevant for performance)

A bit further down the plan I see an almost identical copy of the same pattern. However, in this case the final join to PERSONNEL_DTL has a different additional requirement (with some convoluted logic on PERSONNEL_DTL_EXP_DT and TERMINATION_FLG – that might be simplified but I have not tried to fully parse it so I may be mistaken). Here the estimates and actual row counts are different but it's another huge underestimation: 161 and 652,985.

Those 652,985 rows are then first, again, joined to PERSONNEL_TYPE on the primary key. After that this second branch does something the first branch doesn't: it joins to PERSONNEL_MASTER, also on the primary key. Based on the estimated 161 rows, the optimizer has chosen a Nested Loops join. This is far from optimal for the actual number of rows.

These two huge input streams are then combined in a so-called "Right Semi Join", usually used to implement a WHERE EXISTS or similar constuction. It uses Hash Match for this which is probably an okay choice – but because of the big mis-estimates the memory allocated for the Hash Match is insufficient and it spills to tempdb.

The output of the Hash Match (Right Semi Join) are the same 652,985 rows it reads from its right (bottom) input so apparently the WHERE EXISTS test is true for all rows. The estimator expected the oposite: it assumed that the 160 input rows would be reduced to just 1 after the WHERE EXISTS.

After that join the data goes into a Sort (Distinct Sort). It expcts to have to sort 1 row, in reality there are over 650K, so obviously this, too spills to tempdb.

The Distinct Sort returns 1001 rows. Not because al the other rows are duplicates and removed by the distinct process, but because it is only requested to produce 1001 rows, by a Top operator which probably correlates with a TOP (1001) in your query.

The remaining rows are then joined to some more tables, sorted again (yet another spill to tempdb, because the estimate is still 1 – but this spill is obviously far less critical). and that's it. This part would probably also benefit from better estimates, but very limited as compared to the rest.

(Continues in another comment because the UI of this website goes bonkers when answers go beyond a certain length).

Hugo Kornelis 2018-06-11 19:08:36
So based on what I see in the plan, there are a few problems with this query, and a few suggestions on how to solve this.

The most obvious is the cardinality estimation. Since statistics were already rebuilt, I have a hunch that this table has lots of values for ORGANIZATION_DATA_ID with just a few rows, and then just one or two with way more, perhaps millions (but at least 650,000). The optimizer does not know in advance that ORGANIZATION_CD '04398' leads to this outlier value for ORGANIZATION_DATA_ID; it bases its estimates on the averages.

A possible solution here is to use a separate query to do the join on ORGANIZATION_DATA, ORGANIZATION_TYPE, and REGION_AREA in advance and store the relevant results in variables. Then use those in the main query (this should also reduce the code duplication). If you then also add a WITH (RECOMPILE) hint to the query, the optimizer will have a much better chance of recognizing when it is run for the outlier valus of ORGANIZATION_DATA_ID. (But if this is a query that runs often, the increased compilations can have a negative impact on overall server performance so you should monitor for that).

Another potential issue is this EXISTS between (probably) a subquery and an almost identical subquery. Perhaps there are better ways to do that, but I cannot advice on that without knowing at least the full query – and often an understanding of the data is required as well.

I am also bothered by the Distinct operation in the query plan, which I assume correlates to a DISTINCT somewhere in your query. Far too often I see queries that join tables, introducing duplicates, and then have to use DISTINCT to remove those duplicates again. In this case I also notice that after the Hash Match that combines the twoo hhuge input streams, only columns are propagated that originate from the PERSONNEL_MASTER table. So why are you even joining to PERSONNEL_DTL? I assume that this is where the duplicates are introduced, that you then later have to get rid of again by use of the DISTINCT. If you can avoid the joins (eg by using EXISTS instead), you can avoid DISTINCT as well. And you might even get additional benefits such as perhaps the use of a different index on PERSONNEL_MASTER that negates the need to sort the data.

If I were in your place, I would spend some time to look very critically at the query, until I understand the actual logic that it is supposed to implement. And then I would probably rebuild it from scratch. The fragment of the query that I do see suggests that this query is build by nesting layer upon layer in subqueries, which is hardly ever a good idea (and very often a sign of a query generated by a tool).

BOTTOM LINE:

Step 1: Try if isolating the translation from ORGANIZATION_CD to ORGANIZATION_DATA_ID into its own query combined with a recompile hint on the remainder of this query helps to get you better estimates.

Step 2: Review the query. Understand the logic. Then rewrite from scratch and remove everything not needed (for instance, why are we even joining to REGION_AREA and ORGANIZATION_TYPE but not returning any columns from there? Do you have proper foreign keys set up? If not, then why not?

Good luck!