How to improve the performance of this query?

Khy2shy 2014-03-27 15:44:21

The query takes 2 hours to run and return a distinct of 251K records but the estimated records are 12 billions. How can I fix this?

Aaron Bertrand 2014-03-27 17:28:42
Not much to see, but 96% of the cost is the final sort. Have you considered not returning the output sorted, or not sorting all 20+ columns? What is the consumer doing with all 251,000 rows anyway? Is the order that important?
Khy2shy 2014-03-27 19:06:34
The sort is on the distinct. Is there away to get around this?
Aaron Bertrand 2014-03-27 19:15:50
Sorry, not following. Do you really need all 20+ columns to be part of the ORDER BY?
Khy2shy 2014-03-27 19:37:51
There is no ORDER BY clause in the query.
Khy2shy 2014-03-27 19:40:52
Here is the IO stats
Table 'Worktable'. Scan count 12943, logical reads 1588368, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE1'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE2'. Scan count 3, logical reads 4535055, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE3'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE4'. Scan count 1, logical reads 2010789007, physical reads 0, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE5'. Scan count 1, logical reads 1262184, physical reads 0, read-ahead reads 514076, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE6'. Scan count 1, logical reads 305192, physical reads 0, read-ahead reads 146, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE7'. Scan count 1, logical reads 11253, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE8'. Scan count 3, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TABLE9'. Scan count 1, logical reads 126, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Aaron Bertrand 2014-03-29 23:19:52
Is the distinct really necessary? This is typically to hide some other reason duplicates are being returned (but shouldn't be). Really not much I can suggest in terms of query changes without seeing the actual query… if you don't want to share your query here, you can e-mail it to me at abertrand@sqlsentry.net
Dan 2014-03-27 15:57:23
There's little information to go on there, but out of date statistics would cause bad estimation within a plan.
SQLkiwi 2014-04-01 12:27:45
> Table 'TABLE4'. Scan count 1, logical reads 2010789007

Two billion reads seems excessive for a scan that returns only 12 million rows without a predicate. You should look into the reasons for this.

That whole plan section is concerning:

Plan fragment

You should check to see you have proper indexes to support these joins and predicates. If there really is no alternative to large hash joins, you should at least look to enable parallelism.

The query takes 2 hours to run and return a distinct of 251K records but the estimated records are 12 billions. How can I fix this?

Even with up-to-date statistical information, cardinality estimation has limits. Queries with complex expressions and many joins often benefit from being broken down into smaller pieces. This is helpful to the optimizer and humans alike. Tuning smaller sections of code is always easier.

The query plan has several areas where cardinality estimation errors are rooted. Consider breaking the query down at these points using temporary tables. These may present additional tuning options including index creation.