Table with 789.6 million records taking 5 mins to execute the provided select stmt

georgeharris 2015-06-25 20:19:32

Thanks all for your help. Here is my query(ofcourse table names have been changed) which I'm trying to tune and currently taking 4-5 mins to finish . Also, you can see the tables details and other info below.

Database1.Schema1.Object5:

Total Records : 789.6 million

# of records between 01/01/2014 and 01/31/2014 : 28.2 million

My table has around 789 million records and it is partitioned on "Column19" by month and year .

Clustered index on Column19

Database1.Schema1.Object6:

Total Records : 24791

Database1.Schema1.Object7:

Total Records : 311

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 2 ms.

Table 'Database1.Schema1.Object6'. Scan count 9, logical reads 1082, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Database1.Schema1.Object7'. Scan count 4, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Database1.Schema1.Object5'. Scan count 9, logical reads 280072, physical reads 283, read-ahead reads 130274, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(17064 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 697975 ms, elapsed time = 254160 ms.

SQLkiwi 2015-06-25 23:02:34
It is hard to say based on an anonymized plan, but I would look to rewrite the query to push the GROUP BY aggregation closer to the Index Seek. As currently written, a very large number of rows have to be passed through the joins and repartitioning exchange to get to the aggregation part.

Also, given the relatively small number of unique values compared with the estimate:

Stream Agg Estimates

I would also try an OPTION (HASH GROUP) hint because hash aggregate usually performs better than Sort + Stream Aggregate with a smaller number of distinct values. The memory grant will be far too large of course, but that's a secondary problem.

Rewriting the query to aggregate earlier would definitely be my first instinct though.

georgeharris 2015-06-26 03:47:54
Thanks for your reply. So, you mean like some thing below correct ?

with cte as 
( 
Select column 1 ,
       sum(column2),
       column 3,
       sum(column4),
       column 5,
       sum(column6),
       sum(column7),
       column 8,
       column 9,
       column 10
from object 1 
where column 5 between 01/01/2014 and 01/31/2014
group by column 1,
         column 3,
         column 5,
         column 8
)
 
select a.column9,
       a.column10,
       sum(a.column2+b.column11),
       sum(a.column4+b.column12),
       sum(a.column6+b.column13),
       sum(a.column7+b.column14) 
from  cte a, 
      object b,
      object c 
where a.column1=b.column1
and   a.column3=b.column2
and   a.column8=b.columnn3
and   a.column9=c.column1
group by column9, 
         column10
SQLkiwi 2015-06-26 09:44:42
Yes that sort of idea.
georgeharris 2015-06-29 14:43:30
I have modified my query as you advised and noticed that the execution time dropped down to 40sec from 4mins in the Dev server. However, when I execute this updated query in prod I didn't see any difference it is still taking 4 mins. I have attached the SQL Sentry files from dev(tmp5031_Anonymized) and prod(Prod_optimzed_Query_Paln_Anonymizedlink text) as well ? Are you seeing any major difference in exec pans/operations or something else ? PLease see the attached files from SQL[Sentry.

link text

SQLkiwi 2015-07-09 11:40:09
Yes the plan is significantly different on prod. This may be due to different statistical information, a different configuration (e.g. more memory) or other factors. The production query plan is remarkable for its variance in estimates versus actual rows. Ensure statistics are up to date. Otherwise, you may need hints or a plan guide to get the optimal plan shape in production.