Update on view over partitioned tables updating all clustered indexes

We have a table, which is partitioned through a date field into separate years.

There is a view over all of these tables (Call)

Schema is as follows:

 CREATE TABLE [dbo].[Call_2015](
     [calID] [uniqueidentifier] NOT NULL,
     [calPackageID] [int] NULL,
     [calClientID] [int] NULL,
     [calStartDate] [datetime] NOT NULL,
     [calEndDate] [datetime] NOT NULL,
     [calTimeIn] [char](5) NULL,
     [calTimeOut] [char](5) NULL,
     [calMinutes] [smallint] NULL,
     [calPreferredTimeIn] [char](5) NULL,
     [calPreferredTimeOut] [char](5) NULL,
     [calActualTimeIn] [char](5) NULL,
     [calActualTimeOut] [char](5) NULL,
     [calActualMinutes] [smallint] NULL,
     [calConfirmed] [smallint] NULL,
     [calCarerID] [int] NULL,
     [calRepCarerID] [int] NULL,
     [calOriginalCarerID] [int] NULL,
     [calContractID] [int] NULL,
     [calNeedID] [int] NULL,
     [calMedicationID] [int] NULL,
     [calFrequency] [smallint] NULL,
     [calFromDate] [datetime] NULL,
     [calWeekNo] [smallint] NULL,
     [calAlert] [smallint] NULL,
     [calNoLeave] [smallint] NULL,
     [calTimeCritical] [smallint] NULL,
     [calStatus] [smallint] NULL,
     [calClientAwayReasonID] [int] NULL,
     [calCarerAwayReasonID] [int] NULL,
     [calOutsideShift] [smallint] NULL,
     [calHistoryID] [int] NULL,
     [calInvoiceID] [int] NULL,
     [calWagesheetID] [int] NULL,
     [calReasonID] [int] NULL,
     [calCallConfirmID] [varchar](50) NULL,
     [calCreated] [datetime] NULL,
     [calUpdated] [datetime] NULL,
     [calVariation] [int] NULL,
     [calVariationUserID] [int] NULL,
     [calException] [smallint] NULL,
     [calRetained] [smallint] NULL,
     [calDoubleUpID] [uniqueidentifier] NULL,
     [calDoubleUpOrder] [smallint] NULL,
     [calNeedCount] [smallint] NULL,
     [calNoStay] [smallint] NULL,
     [calCoverCarerID] [int] NULL,
     [calPayAdjustment] [real] NULL,
     [calChargeAdjustment] [real] NULL,
     [calTeamID] [int] NULL,
     [calExpenses] [money] NULL,
     [calMileage] [real] NULL,
     [calOverrideStatus] [smallint] NULL,
     [calLocked] [smallint] NULL,
     [calDriver] [smallint] NULL,
     [calPostcode] [char](10) NULL,
     [calDayCentreID] [int] NULL,
     [calMustHaveCarer] [smallint] NULL,
     [calRoleID] [int] NULL,
     [calUnavailableCarerID] [int] NULL,
     [calClientInformed] [smallint] NULL,
     [calFamilyInformed] [smallint] NULL,
     [calMonthlyDay] [smallint] NULL,
     [calOriginalTimeIn] [char](5) NULL,
     [calLeadCarer] [smallint] NULL,
     [calCallTypeID] [int] NULL,
     [calActualStartDate] [datetime] NULL,
     [calActualEndDate] [datetime] NULL,
     [Table_Year] [int] NOT NULL,
  CONSTRAINT [PK_Call_2015] PRIMARY KEY CLUSTERED 
 (
     [Table_Year] ASC,
     [calID] ASC,
     [calStartDate] ASC,
     [calEndDate] ASC
 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
 ) ON [PRIMARY]
 
 GO
 
 SET ANSI_PADDING OFF
 GO
 
 ALTER TABLE [dbo].[Call_2015]  WITH CHECK ADD  CONSTRAINT [CK_Call_Year_2015] CHECK  (([Table_Year]=(2015)))
 GO
 
 ALTER TABLE [dbo].[Call_2015] CHECK CONSTRAINT [CK_Call_Year_2015]
 GO
 
 ALTER TABLE [dbo].[Call_2015]  WITH CHECK ADD  CONSTRAINT [CK_calStartDate_2015] CHECK  (([calStartDate]>=CONVERT([datetime],'01 Jan 2015 00:00:00',(0)) AND [calStartDate]<=CONVERT([datetime],'31 DEC 2015 23:59:59',(0))))
 GO
 
 ALTER TABLE [dbo].[Call_2015] CHECK CONSTRAINT [CK_calStartDate_2015]
 GO
 
 ALTER TABLE [dbo].[Call_2015] ADD  CONSTRAINT [DF_Call_2015_Table_Year]  DEFAULT ((2015)) FOR [Table_Year]
 GO

The update to the table is as follows:

 UPDATE Call SET
         calStartDate = CASE 
             WHEN calFrequency = 14 THEN dbo.funDate(@MonthlyDay, MONTH(calStartDate), YEAR(calStartDate))
             WHEN calFrequency IN (15,16) THEN dbo.funMonthlyCallDate(calFrequency, @MonthlyDay, calStartDate)
             ELSE DateAdd(d, @StartDay-1, (calStartDate - datepart(dw,calStartDate)+1)) 
         END,
         calEndDate = CASE 
             WHEN calFrequency = 14 THEN dbo.funDate(@MonthlyDay + @EndDay - @StartDay, MONTH(calStartDate), YEAR(calStartDate))
             WHEN calFrequency IN (15,16) THEN DATEADD(D, @EndDay - @StartDay, dbo.funMonthlyCallDate(calFrequency, @MonthlyDay, calStartDate))
             ELSE DateAdd(d, @StartDay-1+@DayCount, (calStartDate - datepart(dw,calStartDate)+1)) 
         END,
         calTimeIn = @TimeIn,
         calTimeOut = @TimeOut,
         calMinutes = @Minutes,
         calMonthlyDay = @MonthlyDay,
         calClientInformed = Null, 
         calFamilyInformed = Null
     WHERE calPackageID = @PackageID
     AND calClientID = @ClientID
     AND calWeekNo = @WeekNo
     AND (DatePart(dw, calStartDate) = @OriginalDay OR calFrequency IN (14,15,16))
     AND calStartDate BETWEEN @StartDate AND @EndDate
     AND (calInvoiceID = 0 OR calInvoiceID Is Null OR @InvoicesFinalised = 1)
     AND (calWagesheetID = 0 OR calWagesheetID Is Null OR @WagesFinalised = 1)
     AND (calLocked = 0 OR calLocked Is Null)
     AND (Table_Year = YEAR(@StartDate) 
             OR Table_Year =YEAR(@EndDate))

The SP updates a batch of rows dependant of input into @StartDate and @EndDate (updates all rows with a calStartDate between the two)

The problem then comes with the execution plan. There are huge IO costs to the operation, and I've nailed it down to how SQL is dealing with the update.

Currently we have 20 of these tables partitioned per year. Each update is causing an update of every single table's indexes, regardless of whether the table is actually touched by the update operation or not.

Execution Plan

Below this section it goes on to update, in the exact same manner, every table in the view.

I cannot see why this is, as I have specified the Table_Year (which the table is partitioned on) within the query text. Shouldn't SQL only update the necessary table?

avatar image By GeorgeP 16 asked Aug 04, 2014 at 09:45 AM
more ▼
(comments are locked)
10|10000 characters needed characters left

2 answers: sort voted first

These aren't actually partitioned tables, and even if they were, partition elimination wouldn't really work for updating indexes unless all indexes were also partition-aligned.

Since you are using Express Edition and can't actually use partitioning, I have a different approach to recommend: dynamic UPDATEs that only affect the table(s) represented in @StartDate / @EndDate. You'll have to populate the list of parameters twice; once with their data types - this should be easy as I presume these are declared somewhere already.

 DECLARE 
   @StartDate DATE = '20140605',
   @EndDate   DATE = '20150201';
 
 
 DECLARE 
   @SQL NVARCHAR(MAX) = N' ',
   @baseSQL NVARCHAR(MAX) = N'UPDATE dbo.[Call_$y$] SET
       calStartDate = CASE 
           WHEN calFrequency = 14 THEN dbo.funDate(@MonthlyDay, MONTH(calStartDate), YEAR(calStartDate))
           WHEN calFrequency IN (15,16) THEN dbo.funMonthlyCallDate(calFrequency, @MonthlyDay, calStartDate)
           ELSE DateAdd(d, @StartDay-1, (calStartDate - datepart(dw,calStartDate)+1)) 
       END,
       calEndDate = CASE 
           WHEN calFrequency = 14 THEN dbo.funDate(@MonthlyDay + @EndDay - @StartDay, MONTH(calStartDate), YEAR(calStartDate))
           WHEN calFrequency IN (15,16) THEN DATEADD(D, @EndDay - @StartDay, dbo.funMonthlyCallDate(calFrequency, @MonthlyDay, calStartDate))
           ELSE DateAdd(d, @StartDay-1+@DayCount, (calStartDate - datepart(dw,calStartDate)+1)) 
       END,
       calTimeIn = @TimeIn,
       calTimeOut = @TimeOut,
       calMinutes = @Minutes,
       calMonthlyDay = @MonthlyDay,
       calClientInformed = Null, 
       calFamilyInformed = Null
   WHERE calPackageID = @PackageID
   AND calClientID = @ClientID
   AND calWeekNo = @WeekNo
   AND (DatePart(dw, calStartDate) = @OriginalDay OR calFrequency IN (14,15,16))
   AND calStartDate BETWEEN @StartDate AND @EndDate
   AND (calInvoiceID = 0 OR calInvoiceID Is Null OR @InvoicesFinalised = 1)
   AND (calWagesheetID = 0 OR calWagesheetID Is Null OR @WagesFinalised = 1)
   AND (calLocked = 0 OR calLocked Is Null)
   AND (Table_Year = YEAR(@StartDate) 
           OR Table_Year =YEAR(@EndDate));',
   @params NVARCHAR(MAX) = N'@MonthlyDay INT,...,@StartDate DATETIME,@EndDate DATETIME';
   -- you will need to update the params list 
   -- to reflect all of the params coming in
 
 ;WITH y AS 
 (
   SELECT y FROM 
   (
     -- all valid years to choose from:
     VALUES(2014),(2015),(2016)
   ) AS y(y)
 )
 SELECT @SQL += REPLACE(@baseSQL, N'$y$', CONVERT(CHAR(4),y))
   FROM y 
   WHERE y >= YEAR(@StartDate) 
     AND y <= YEAR(@EndDate);
 
 EXEC sp_executesql @SQL,
   @params,
   @MonthlyDay,...,@StartDate,@EndDate;
 -- you will also need to update the list of params here
avatar image By Aaron Bertrand ♦ 1.7k answered Aug 04, 2014 at 02:08 PM
more ▼
(comments are locked)
10|10000 characters needed characters left

Below this section it goes on to update, in the exact same manner, every table in the view. I cannot see why this is, as I have specified the Table_Year (which the table is partitioned on) within the query text. Shouldn't SQL only update the necessary table?

The view meets all the partitioning requirements for both Table_Year and calStartDate. The latter column is modified by the UPDATE statement so the query optimizer has to produce a plan that is capable of moving rows between partitions.

In fact, rows could not move between partitions in this case because there is a 1:1 relationship between Table_Year and year-wise values of calStartDate, but the steps involved in that reasoning are far too opaque for the optimizer to make use of.

The new value for calStartDate is based on a complex expression that references variables. The query plan will be cached and could be reused when the variables have different values, which is just another factor that means the plan must be very general.

All these consideration lead to a plan that does not static feature partition elimination. It does, however, feature dynamic partition elimination:

On the reading side, the string of Filter operators immediately below the Concatenation are all startup filters. They evaluate their predicate before the subtree is executed. If the predicate evaluates to false, the subtree under the Filter is not executed.

The overall effect is that only tables under the view that could hold qualifying rows (depending on the runtime variable values) are accessed. Notice the execution plan only shows rows being read from one of the base tables, and the Actual Executions property for all other base tables is zero; these operators were not executed at all, at runtime. In the fragment below, the startup Filters mean only the green operators execute; the red ones never start at all:

Read-side plan fragment

On the writing side, The normal (not startup) Filter just to the right of each Clustered Index Update operator ensures that only changes for the current table are passed on. In the example plan, only one Clustered Index Update (and its associated nonclustered index maintenance operators) receives any rows:

Write-side plan fragment

avatar image By SQLkiwi ♦ 6.6k answered Aug 05, 2014 at 06:33 PM
more ▼
(comments are locked)
avatar image Aaron Bertrand ♦ Aug 05, 2014 at 06:40 PM

Great insight, the plan has to include all of the tables the update might touch when executed, but this does not mean they all will be touched. I regret not looking closer at the actuals. :\

avatar image SQLkiwi ♦ Aug 05, 2014 at 06:51 PM

In fairness, I don't blame you - the plan is huge :)

10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question

Topics:

x641
x455
x39
x9
x4

asked: Aug 04, 2014 at 09:45 AM

Seen: 179 times

Last Updated: Aug 05, 2014 at 06:51 PM