This query takes over 2 hrs to run, why?
2013-04-16 13:49:54
I am trying to rewrite this query, but no matter what I do it takes over 2 hrs to complete. I have even tried to make this plan parallel but still no luck with the duration it is taking complete. What is causing the problem in this query? Any options to optimize it? Thanks in advance.
Below is the query
SELECT s1.AssetID , s1.Date , s1.CalanderStartTime , MIN(t1.CalanderEndTime) AS CalanderEndTime , DATEDIFF(Minute, s1.CalanderStartTime, MIN(t1.CalanderEndTime)) [Minutes] FROM dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial s1 INNER JOIN dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial t1 ON s1.CalanderStartTime <= t1.CalanderEndTime AND s1.AssetId = t1.AssetId AND s1.Date = t1.Date AND NOT EXISTS ( SELECT * FROM dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial t2 WHERE t1.CalanderEndTime >= t2.CalanderStartTime AND t1.CalanderEndTime < t2.CalanderEndTime AND t1.AssetID = t2.AssetID AND t1.Date = t2.Date ) WHERE NOT EXISTS ( SELECT * FROM dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial s2 WHERE s1.CalanderStartTime > s2.CalanderStartTime AND s1.CalanderStartTime <= s2.CalanderEndTime AND s1.AssetID = s2.AssetID AND s1.Date = s2.Date ) GROUP BY s1.CalanderStartTime , s1.AssetID , s1.Date ORDER BY s1.CalanderStartTime
Aaron Bertrand 2013-04-16 13:59:22
How many rows in the table? Can you help map the indexes to the anonymized indexes in the plan, and indicate which columns they're on? Can you show some sample data and desired results (including some rows you want filtered out)? It's tough to suggest changes to the query without fully understanding the source data and what the end result is supposed to be.
2013-04-16 14:10:37
The table has 5,683,700 rows with just one index on the table used in the query
CREATE NONCLUSTERED INDEX IX_aravish_tmp_CU_IncidentCalendarWithOutDownOrPartial ON dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial(AssetID, Date, CalanderEndTime, CalanderStartTime, IncidentID)
CREATE NONCLUSTERED INDEX IX_aravish_tmp_CU_IncidentCalendarWithOutDownOrPartial ON dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial(AssetID, Date, CalanderEndTime, CalanderStartTime, IncidentID)
I don't much about the data. I have just been asked to optimize this query.
I have just attached an image with sample data
2013-04-16 14:18:14
After looking at this for a few minutes, I'd say it doesn't look like you really need to join tmp_CU_IncidentCalendarWithOutDownOrPartial to itself and then do two separate "WHERE NOT EXISTS" with the same table again. I think you should consider if there's a way you can do all the self comparison in one shot. If you're trying to find only the most recent transactions, you could do MAX(EndTime) GROUP BY AssetID, for instance. How many records are in this table?
2013-04-16 14:31:44
Also, I have a related question for others who may be posting answers. When using a "WHERE NOT EXISTS," I typically use SELECT 1 instead of SELECT *. This seems more optimized to me because you're returning an integer instead of a row of data, and all you want to know is whether something exists. SQL server may treat these the same though. Does anyone know which is better?
2013-04-16 14:34:06
They are identical – SQL Server optimizes that away. It knows that an EXISTS / NOT EXISTS just returns true or false as opposed to returning any data, so it completely ignores the column list. I use 1, like you, but for a different reason: self-documentation. SELECT 1 makes it absolutely clear that the subquery does not return real data, in the event that someone missed the leading EXISTS / NOT EXISTS.
2013-04-16 14:50:54
That's what I figured; thanks for confirming!
2013-04-16 15:25:15
I think you are trying to find the first Start time and the last end time for an assetID for a day. If so look at query below. I didn't have time to create test data but it should be close.
SELECT s1.AssetID , s1.Date , s1.CalanderStartTime , DATEDIFF(Minute, s1.CalanderStartTime, t1.CalanderEndTime) [Minutes] FROM (SELECT AssetID , [Date] , CalanderStartTime , ROW_NUMBER() OVER (PARTITION BY AssetID, [Date] ORDER BY AssetID, [Date], CalanderStartTime ASC ) AS StartOrder FROM dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial ) AS s1 LEFT OUTER JOIN (SELECT CalanderStartTime , [DATE] , ROW_NUMBER() OVER (PARTITION BY AssetID, [Date] ORDER BY AssetID, [Date], CalanderEndTime DESC) AS EndOrder FROM dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial ) t1 ON s1.AssetID = t1.AssetID AND s1.[DATE] = t1.[DATE] AND s1.StartOrder = t1.EndOrder WHERE s1.StartOrder = 1 AND t1.EndOrder = 1 ORDER BY s1.CalanderStartTime