Improve the query performance

Arvind 2017-12-18 03:19:13

The query is running for more than 15mins and i'm looking for ways to improve the query performance.Could you please provie me directions..

SDyckes2 2017-12-18 03:56:33
The plan you have provided appears to be an estimated execution plan. An estimated plan does not accurately display what actually happened at runtime of the query. If you would collect an actual execution plan with Plan Explorer and save the plan in a .pesession format, we would be able to provide better assistance. Plan Explorer is a free tool available for download here.

Based on your estimated plan, I do not see any obvious reasons why it would take more than 15 minutes to return the estimated 1309 records. I suspect the actual plan will look a bit different, possibly highlighting an out of date statistics issue, but that is a complete guess.

Arvind 2017-12-18 06:41:20
Thanks for the info. Please find the actual execution plan of the query and it took more than 3 hours to execute the query. Could you please recommend an action plan. We have already run update statistics on all tables involved in the query.

link text

Hugo Kornelis 2017-12-18 12:09:00
I see some issues in your execution plan (details below), but I am surprised to hear that the query took over three hours. Did you monitor the server while it was running for things such as locking and blocking; physical IO; memory usage?

You have not indicated what version of SQL Server you are running. Since the execution plan doesn't indicate the cardinality estimator version, I assume it's 2012 or older. What is the exact version?

Now then, on to some issues I see in the plan.

It starts by finding the max(AsOfDate), then uses that to find all rows in AtomicPolicy with this as StartDate, plus the specified SourceSystemID and EndDate. This is done using an index that has these three columns as leading columns. This is very efficient. The number of rows returned is almost 7400, way more than the estimated 800. This bad estimation may affect the plan choice in a negative way.
This bad estimate may be due to the fact that the optimizer doesn't know what the max(AsOfDate) is, and that value affects the number of qualifying rows. (I would normally suspect bad statistics, but you already ruled that out).
Can you try splitting the logic as follows: first select the MAX(AsOfDate) into a variable, then use that variable in the query, while adding an OPTION (RECOMPILE) hint. (That hint is required to let SQL Server "sniff" the value of the parameter).

For each of the 7400 rows, it then tries to find matching rows in the Transact table (the NOT EXISTS clause in your query). It does so using an index seek, which appears efficient but may not really be. The index used is on the TransactionClaimPolicySubmissionFlag column; the TransactionPolicyClaimLink column (which is probably more selective) is not used. I do not know whether or not you have an index on this column, but I do know that even if you do it won't be used. That is because the APKey column it is compare to has data type int; TransactionPolicyClaimLink has another data type and must hence be converted.
The execution plan tells me that the Transact table has over 14 million rows. The Index Seek searches for the first row with SubmissionFlag P and the correct ClaimLink, but it has to go through all the P rows to do so. It stops after finding the first, but for most rows (6059 out of 7398) there is no such row at all, so it has to look at all the "P" rows. If, for instance, 10 million of the 14 million in that table are of type "P", this means going over 10 million rows 6059 times, plus going over on average half of them another 1339 times. That's a lot of work!
The ideal solution is to make sure that columns you compare in queries always have the same data type. However, that is not always possible. In this case, another option you can try is to add an explicit CAST of the APKey column to the data type of TransactionPolicyClaimLink. (But note that this just restricts the optimizer in another way, it is not ideal).

The remaining 6059 rows then go into a Hash Match. Since there are more rows than the estimate, this may have caused a spill. If you are on SQL 2012, then this will be indicated in the actual plan, it is not so either you are on an older version or there was no spill. (Which is good!)
If you did get a spill, then getting a better estimate (and as I said, I hope you will get a better estimate if you break the MAX(AsOfDate) out of the query) should fix that.

The final step is to use this Hash Match to join to the Policy table, on the PolicyKey column. This is pretty efficient. An index on PolicyKey in the Policy table might help a bit, but I don't think you will really notice the difference.
However, I notice that you do not even use any of the columns from the Policy table. And despite the name, the PolicyKey column is apparently not a key in this table, because there are on average 2.9 rows for each PolicyKey in the Policy table. Because of that, the result set grows from 6059 rows to 17656 rows, by introducing pure duplicates in the end result. Is that really the intended behaviour?

Short summary of the recommendations:

  1. Get MAX(AsOfDate) in a variable, use that variable in the query and add OPTION(RECOMPILE).
  2. Change the database so that Transact.TransactionPolicyClaimLink and AtomicPolicy.APKey have the same data type. If that is not possible, add an explicit CAST to convert APKey to the datatype of TransactionPolicyClaimLink.
  3. Verify that the join to Policy is actually required and correct. If not needed, remove this table from the query.
  4. Try adding a nonclustered column on Policy(PolicyKey). It will probably have ony a minimal effect, but who knows?