This query takes over 2 hrs to run, why?

arvindravish 2013-04-16 13:49:54

I am trying to rewrite this query, but no matter what I do it takes over 2 hrs to complete. I have even tried to make this plan parallel but still no luck with the duration it is taking complete. What is causing the problem in this query? Any options to optimize it? Thanks in advance.

Below is the query

SELECT  s1.AssetID ,
    s1.Date ,
    s1.CalanderStartTime ,
    MIN(t1.CalanderEndTime) AS CalanderEndTime ,
    DATEDIFF(Minute, s1.CalanderStartTime, MIN(t1.CalanderEndTime)) [Minutes]
FROM   
    dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial s1
    INNER JOIN dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial t1 
   ON s1.CalanderStartTime <= t1.CalanderEndTime
             AND s1.AssetId = t1.AssetId
             AND s1.Date = t1.Date
             AND NOT EXISTS ( SELECT
             *
             FROM
             dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial t2
              WHERE
              t1.CalanderEndTime >= t2.CalanderStartTime
             AND t1.CalanderEndTime < t2.CalanderEndTime
             AND t1.AssetID = t2.AssetID
             AND t1.Date = t2.Date )
WHERE   NOT EXISTS ( SELECT *
                 FROM   dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial s2
                 WHERE  s1.CalanderStartTime > s2.CalanderStartTime
                        AND s1.CalanderStartTime <= s2.CalanderEndTime
                        AND s1.AssetID = s2.AssetID
                        AND s1.Date = s2.Date )
GROUP BY s1.CalanderStartTime ,
    s1.AssetID ,
    s1.Date
ORDER BY s1.CalanderStartTime
Aaron Bertrand 2013-04-16 13:59:22
How many rows in the table? Can you help map the indexes to the anonymized indexes in the plan, and indicate which columns they're on? Can you show some sample data and desired results (including some rows you want filtered out)? It's tough to suggest changes to the query without fully understanding the source data and what the end result is supposed to be.
arvindravish 2013-04-16 14:10:37
The table has 5,683,700 rows with just one index on the table used in the query
CREATE NONCLUSTERED INDEX IX_aravish_tmp_CU_IncidentCalendarWithOutDownOrPartial ON dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial(AssetID, Date, CalanderEndTime, CalanderStartTime, IncidentID)

I don't much about the data. I have just been asked to optimize this query.
I have just attached an image with sample data
alt text

BrianE 2013-04-16 14:18:14
After looking at this for a few minutes, I'd say it doesn't look like you really need to join tmp_CU_IncidentCalendarWithOutDownOrPartial to itself and then do two separate "WHERE NOT EXISTS" with the same table again. I think you should consider if there's a way you can do all the self comparison in one shot. If you're trying to find only the most recent transactions, you could do MAX(EndTime) GROUP BY AssetID, for instance. How many records are in this table?
BrianE 2013-04-16 14:31:44
Also, I have a related question for others who may be posting answers. When using a "WHERE NOT EXISTS," I typically use SELECT 1 instead of SELECT *. This seems more optimized to me because you're returning an integer instead of a row of data, and all you want to know is whether something exists. SQL server may treat these the same though. Does anyone know which is better?
Aaron Bertrand 2013-04-16 14:34:06
They are identical – SQL Server optimizes that away. It knows that an EXISTS / NOT EXISTS just returns true or false as opposed to returning any data, so it completely ignores the column list. I use 1, like you, but for a different reason: self-documentation. SELECT 1 makes it absolutely clear that the subquery does not return real data, in the event that someone missed the leading EXISTS / NOT EXISTS.
BrianE 2013-04-16 14:50:54
That's what I figured; thanks for confirming!
kfraz 2013-04-16 15:25:15
I think you are trying to find the first Start time and the last end time for an assetID for a day. If so look at query below. I didn't have time to create test data but it should be close.

SELECT  s1.AssetID ,
        s1.Date ,
        s1.CalanderStartTime ,
        DATEDIFF(Minute, s1.CalanderStartTime, t1.CalanderEndTime) [Minutes]
FROM    (SELECT AssetID ,
                [Date] ,
                CalanderStartTime ,
                ROW_NUMBER() OVER (PARTITION BY AssetID, [Date] 
                           ORDER BY AssetID, [Date], CalanderStartTime ASC
                          ) AS StartOrder
         FROM   dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial
        ) AS s1
        LEFT OUTER JOIN (SELECT CalanderStartTime ,
            [DATE] ,
            ROW_NUMBER() OVER (PARTITION BY AssetID,
            [Date] ORDER BY AssetID, [Date], CalanderEndTime DESC) AS EndOrder
                         FROM   dbo.tmp_CU_IncidentCalendarWithOutDownOrPartial
                        ) t1 ON s1.AssetID = t1.AssetID
                                AND s1.[DATE] = t1.[DATE]
                                AND s1.StartOrder = t1.EndOrder
WHERE   s1.StartOrder = 1
        AND t1.EndOrder = 1
ORDER BY s1.CalanderStartTime