how to improve performance of a Sql query for Median Calcuation?

suneel 2013-08-12 14:45:35

Hi Team,

Thanks for looking into the below SQL :

Let me give the exact scenario.

CREATE TABLE [LeadMedian](
[LeadStatusId] [int] NULL,
[LeadStatus] nvarchar NULL,
[leadid] nvarchar NOT NULL,
[AccountID] nchar NULL,
[leaddate] [datetime] NULL,
[LeadGeoID] [int] NULL,
[LeadProgramID] [int] NULL,
[CurrentLeadOwnerType] varchar NULL,
[CurrentLeadSalesTeam] nvarchar NULL,
[DaysInStatuS] [int] NOT NULL)

INSERT [LeadMedian] VALUES (1, N'Ready for Qualification', N'00Q6000000bz9ncEAA', N'0016000000I7aA4AAJ', CAST(0x00009FD600000000 AS DateTime), 325, 11141, N'Queue', N'00G60000000vZAxEAM', 559)
INSERT [LeadMedian] VALUES (1, N'Ready for Qualification', N'00Q6000000c0XfvEAE', N'0016000000Fyz3TAAR', CAST(0x00009FDB00000000 AS DateTime), 325, 11383, N'Queue', N'00G60000000vZAxEAM', 554)
INSERT [LeadMedian] VALUES (1, N'Ready for Qualification', N'00Q6000000cbbgLEAQ', N'0016000000FzikkAAB', CAST(0x00009FE900000000 AS DateTime), 325, 12971, N'Queue', N'00G60000000vZAxEAM', 540)
INSERT [LeadMedian] VALUES (11, N'Inactive', N'00Q6000000WQMVaEAP', N'0016000000FzDsMAAV', CAST(0x00009EDF00000000 AS DateTime), 325, 10413, N'Queue', N'00G60000000vZAxEAM', 582)
INSERT [LeadMedian] VALUES (11, N'Inactive', N'00Q6000000WQMVaEAP', N'0016000000FzDsMAAV', CAST(0x00009EDF00000000 AS DateTime), 325, 10413, N'Queue', N'00G60000000vZAxEAM', 224)
INSERT [LeadMedian] VALUES (11, N'Inactive', N'00Q6000000ZBKYvEAP', N'0016000000FwbyfAAB', CAST(0x00009F4E00000000 AS DateTime), 325, 11895, N'Queue', N'00G60000000vZAxEAM', 471)
INSERT [LeadMedian] VALUES (11, N'Inactive', N'00Q6000000ZBKYvEAP', N'0016000000FwbyfAAB', CAST(0x00009F4E00000000 AS DateTime), 325, 11895, N'Queue', N'00G60000000vZAxEAM', 224)
INSERT [LeadMedian] VALUES (11, N'Inactive', N'00Q6000000ZBKYzEAP', N'0016000000NcLG5AAN', CAST(0x00009F4E00000000 AS DateTime), 325, 11895, N'Queue', N'00G60000000vZAxEAM', 471)
INSERT [LeadMedian] VALUES (11, N'Inactive', N'00Q6000000ZBKYzEAP', N'0016000000NcLG5AAN', CAST(0x00009F4E00000000 AS DateTime), 325, 11895, N'Queue', N'00G60000000vZAxEAM', 224)

select * from [leadmedian]

I would calculate the Median based on the above scenario and those values should show in the report like below.

alt text

So i would require when i pass any lead id's example '00Q6000000c0XfvEAE','00Q6000000bz9ncEAA','00Q6000000cbbgLEAQ' then the Median value should be 554 and 00Q6000000WQMVaEAP,00Q6000000ZBKYvEAP,00Q6000000ZBKYzEAP for this median value =695.

or if i pass currentleadteam='00G60000000vZAxEAM' and status='Ready for Qualification' then also median should be 554

so if i pass any lead id it should be display the respective Median value.

Thanks in advance and really appreciate your valuable inputs.

I tried below query and working on median calculation but taking long time to run ETL job

WITH CTE AS (
SELECT L.CurrentLeadOwnerType,L.CurrentLeadSalesTeam, l.LEADID,l.DaysInStatus,L.LeadStatus,l.AccountID,l.LeadProgramID,
l.LeadGeoID,l.LeadDate, l.LeadStatusId AS StatusID,
ROW_NUMBER() OVER (PARTITION BY l.leadStatusID,L.CurrentLeadOwnerType,L.CurrentLeadSalesTeam ORDER BY L.Daysinstatus) AS RNK
FROM dbo.LeadMedian L
–WHERE l.LeadID in ('00Q6000000bz9ncEAA','00Q6000000cbbgLEAQ',
— '00Q6000000c0XfvEAE','00Q6000000ZBKYzEAP','00Q6000000WQMVaEAP','00Q6000000ZBKYvEAP')
— AND lv.DaysInStatus<>0
WHERE l.DaysInStatus<>0

), CTEMedian AS (
SELECT Statusid,CurrentLeadOwnerType,CurrentLeadSalesTeam ,CASE
WHEN MAXRNK % 2 = 0 THEN (SELECT SUM(DAYSINSTATUS) FROM CTE WHERE RNK IN (MAXRNK/2,MAXRNK/2 + 1) AND
Statusid =X.Statusid and
CurrentLeadOwnerType=x.CurrentLeadOwnerType and
CurrentLeadSalesTeam=x.CurrentLeadSalesTeam
)
ELSE (SELECT (DAYSINSTATUS) FROM CTE WHERE RNK =(MAXRNK + 1)/2 AND Statusid =X.Statusid and CurrentLeadOwnerType=x.CurrentLeadOwnerType and CurrentLeadSalesTeam=x.CurrentLeadSalesTeam )
END AS [CURRENT MEDIAN]
FROM (
SELECT CurrentLeadOwnerType,CurrentLeadSalesTeam,Statusid,MAX(RNK) AS MAXRNK FROM CTE
GROUP BY Statusid ,CurrentLeadOwnerType,CurrentLeadSalesTeam
)X

), CTEGroup AS (
SELECT b.[CURRENT MEDIAN] as Median ,A.CurrentLeadOwnerType,a.CurrentLeadSalesTeam ,A.LeadStatus
,A.AccountID,A.LeadGeoID,A.LeadDate,A.LeadProgramID,A.StatusID ,A.LeadID

  FROM CTE a INNER JOIN CTEMedian b ON a.Statusid=b.Statusid AND A.CurrentLeadOwnerType = B.CurrentLeadOwnerType
                                     AND A.CurrentLeadSalesTeam = B.CurrentLeadSalesTeam

), CTEfinal AS (
SELECT * FROM CTEGroup

  )

select CurrentLeadSalesTeam,currentleadownertype,
leadstatus,Median
from ctefinal –WHERE CurrentLeadSalesTeam='00G60000000vZAxEAM'

Currently i have in 18567344 leadid's in my database , for calculating these many leadid's is taking long time…is there any performance tuning on the above SQL.

Thanks in advance.

Regards,

Suneel

GokhanVarol 2013-08-13 05:17:28
Give a shot, create a supporting index if possible and try Itzik Ben Gan's offset / fetch suggestion (attached).

link text

Something like this

IF NOT EXISTS ( SELECT  1
                FROM    sys.indexes
                WHERE   object_id = OBJECT_ID('[dbo].[LeadMedian]')
                        AND [name] = 'CurrentLeadOwnerType_CurrentLeadSalesTeam_LeadStatusId' )
    CREATE NONCLUSTERED INDEX CurrentLeadOwnerType_CurrentLeadSalesTeam_LeadStatusId ON [dbo].[LeadMedian] (CurrentLeadOwnerType, CurrentLeadSalesTeam, LeadStatusId) INCLUDE (DaysInStatuS) WITH (DATA_COMPRESSION = ROW)
 
SELECT  a.CurrentLeadOwnerType
      , a.CurrentLeadSalesTeam
      , a.LeadStatusId
      , CAST(MEDIAN_DaysInStatuS AS INT) AS MEDIAN_DaysInStatuS
 
FROM    ( SELECT    a.CurrentLeadOwnerType
                  , a.CurrentLeadSalesTeam
                  , a.LeadStatusId
                  , ( SUM(CASE WHEN DaysInStatuS IS NOT NULL THEN 1
                               ELSE 0
                          END) - 1 ) / 2 AS DaysInStatuS_offset
                  , 2 - SUM(CASE WHEN DaysInStatuS IS NOT NULL THEN 1
                                 ELSE 0
                            END) % 2 AS DaysInStatuS_fetch
          FROM      [dbo].[LeadMedian] a
          GROUP BY  a.CurrentLeadOwnerType
                  , a.CurrentLeadSalesTeam
                  , a.LeadStatusId
        ) a
        CROSS APPLY ( SELECT    AVG(DaysInStatuS) AS MEDIAN_DaysInStatuS
                      FROM      ( SELECT    p.DaysInStatuS
                                  FROM      [dbo].[LeadMedian] p
                                  WHERE     DaysInStatuS_offset IS NOT NULL
                                            AND DaysInStatuS_fetch IS NOT NULL
                                            AND p.DaysInStatuS IS NOT NULL
                                            AND p.CurrentLeadOwnerType = a.CurrentLeadOwnerType
                                            AND p.CurrentLeadSalesTeam = a.CurrentLeadSalesTeam
                                            AND p.LeadStatusId = a.LeadStatusId
                                  ORDER BY  p.DaysInStatuS
                                          OFFSET ISNULL(a.DaysInStatuS_offset, 0) ROWS FETCH NEXT ISNULL(a.DaysInStatuS_fetch, 1) ROWS ONLY
                                ) b
                    ) b
suneel 2013-08-13 07:24:18
could you please provide the answer on my scenario..dont ming
suneel 2013-08-20 12:14:12
hi gokhanvarol,

thanks your inputs.

i executed the attached query and working in SQL server 2012 but not in 2008 R2.

since our production servers are using 2008 R2 .when i apply the above same query and there is an error

Msg 102, Level 15, State 1, Line 36
Incorrect syntax near 'OFFSET'.
Msg 102, Level 15, State 1, Line 36
Incorrect syntax near 'NEXT'

could you please provide me the solution in 2008 R2