Update on view over partitioned tables updating all clustered indexes

GeorgeP 2014-08-04 09:45:50

We have a table, which is partitioned through a date field into separate years.

There is a view over all of these tables (Call)

Schema is as follows:

CREATE TABLE [dbo].[Call_2015](
    [calID] [uniqueidentifier] NOT NULL,
    [calPackageID] [int] NULL,
    [calClientID] [int] NULL,
    [calStartDate] [datetime] NOT NULL,
    [calEndDate] [datetime] NOT NULL,
    [calTimeIn] [char](5) NULL,
    [calTimeOut] [char](5) NULL,
    [calMinutes] [smallint] NULL,
    [calPreferredTimeIn] [char](5) NULL,
    [calPreferredTimeOut] [char](5) NULL,
    [calActualTimeIn] [char](5) NULL,
    [calActualTimeOut] [char](5) NULL,
    [calActualMinutes] [smallint] NULL,
    [calConfirmed] [smallint] NULL,
    [calCarerID] [int] NULL,
    [calRepCarerID] [int] NULL,
    [calOriginalCarerID] [int] NULL,
    [calContractID] [int] NULL,
    [calNeedID] [int] NULL,
    [calMedicationID] [int] NULL,
    [calFrequency] [smallint] NULL,
    [calFromDate] [datetime] NULL,
    [calWeekNo] [smallint] NULL,
    [calAlert] [smallint] NULL,
    [calNoLeave] [smallint] NULL,
    [calTimeCritical] [smallint] NULL,
    [calStatus] [smallint] NULL,
    [calClientAwayReasonID] [int] NULL,
    [calCarerAwayReasonID] [int] NULL,
    [calOutsideShift] [smallint] NULL,
    [calHistoryID] [int] NULL,
    [calInvoiceID] [int] NULL,
    [calWagesheetID] [int] NULL,
    [calReasonID] [int] NULL,
    [calCallConfirmID] [varchar](50) NULL,
    [calCreated] [datetime] NULL,
    [calUpdated] [datetime] NULL,
    [calVariation] [int] NULL,
    [calVariationUserID] [int] NULL,
    [calException] [smallint] NULL,
    [calRetained] [smallint] NULL,
    [calDoubleUpID] [uniqueidentifier] NULL,
    [calDoubleUpOrder] [smallint] NULL,
    [calNeedCount] [smallint] NULL,
    [calNoStay] [smallint] NULL,
    [calCoverCarerID] [int] NULL,
    [calPayAdjustment] [real] NULL,
    [calChargeAdjustment] [real] NULL,
    [calTeamID] [int] NULL,
    [calExpenses] [money] NULL,
    [calMileage] [real] NULL,
    [calOverrideStatus] [smallint] NULL,
    [calLocked] [smallint] NULL,
    [calDriver] [smallint] NULL,
    [calPostcode] [char](10) NULL,
    [calDayCentreID] [int] NULL,
    [calMustHaveCarer] [smallint] NULL,
    [calRoleID] [int] NULL,
    [calUnavailableCarerID] [int] NULL,
    [calClientInformed] [smallint] NULL,
    [calFamilyInformed] [smallint] NULL,
    [calMonthlyDay] [smallint] NULL,
    [calOriginalTimeIn] [char](5) NULL,
    [calLeadCarer] [smallint] NULL,
    [calCallTypeID] [int] NULL,
    [calActualStartDate] [datetime] NULL,
    [calActualEndDate] [datetime] NULL,
    [Table_Year] [int] NOT NULL,
 CONSTRAINT [PK_Call_2015] PRIMARY KEY CLUSTERED 
(
    [Table_Year] ASC,
    [calID] ASC,
    [calStartDate] ASC,
    [calEndDate] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
 
GO
 
SET ANSI_PADDING OFF
GO
 
ALTER TABLE [dbo].[Call_2015]  WITH CHECK ADD  CONSTRAINT [CK_Call_Year_2015] CHECK  (([Table_Year]=(2015)))
GO
 
ALTER TABLE [dbo].[Call_2015] CHECK CONSTRAINT [CK_Call_Year_2015]
GO
 
ALTER TABLE [dbo].[Call_2015]  WITH CHECK ADD  CONSTRAINT [CK_calStartDate_2015] CHECK  (([calStartDate]>=CONVERT([datetime],'01 Jan 2015 00:00:00',(0)) AND [calStartDate]<=CONVERT([datetime],'31 DEC 2015 23:59:59',(0))))
GO
 
ALTER TABLE [dbo].[Call_2015] CHECK CONSTRAINT [CK_calStartDate_2015]
GO
 
ALTER TABLE [dbo].[Call_2015] ADD  CONSTRAINT [DF_Call_2015_Table_Year]  DEFAULT ((2015)) FOR [Table_Year]
GO

The update to the table is as follows:

UPDATE Call SET
        calStartDate = CASE 
            WHEN calFrequency = 14 THEN dbo.funDate(@MonthlyDay, MONTH(calStartDate), YEAR(calStartDate))
            WHEN calFrequency IN (15,16) THEN dbo.funMonthlyCallDate(calFrequency, @MonthlyDay, calStartDate)
            ELSE DateAdd(d, @StartDay-1, (calStartDate - datepart(dw,calStartDate)+1)) 
        END,
        calEndDate = CASE 
            WHEN calFrequency = 14 THEN dbo.funDate(@MonthlyDay + @EndDay - @StartDay, MONTH(calStartDate), YEAR(calStartDate))
            WHEN calFrequency IN (15,16) THEN DATEADD(D, @EndDay - @StartDay, dbo.funMonthlyCallDate(calFrequency, @MonthlyDay, calStartDate))
            ELSE DateAdd(d, @StartDay-1+@DayCount, (calStartDate - datepart(dw,calStartDate)+1)) 
        END,
        calTimeIn = @TimeIn,
        calTimeOut = @TimeOut,
        calMinutes = @Minutes,
        calMonthlyDay = @MonthlyDay,
        calClientInformed = Null, 
        calFamilyInformed = Null
    WHERE calPackageID = @PackageID
    AND calClientID = @ClientID
    AND calWeekNo = @WeekNo
    AND (DatePart(dw, calStartDate) = @OriginalDay OR calFrequency IN (14,15,16))
    AND calStartDate BETWEEN @StartDate AND @EndDate
    AND (calInvoiceID = 0 OR calInvoiceID Is Null OR @InvoicesFinalised = 1)
    AND (calWagesheetID = 0 OR calWagesheetID Is Null OR @WagesFinalised = 1)
    AND (calLocked = 0 OR calLocked Is Null)
    AND (Table_Year = YEAR(@StartDate) 
            OR Table_Year =YEAR(@EndDate))

The SP updates a batch of rows dependant of input into @StartDate and @EndDate (updates all rows with a calStartDate between the two)

The problem then comes with the execution plan. There are huge IO costs to the operation, and I've nailed it down to how SQL is dealing with the update.

Currently we have 20 of these tables partitioned per year. Each update is causing an update of every single table's indexes, regardless of whether the table is actually touched by the update operation or not.

Execution Plan

Below this section it goes on to update, in the exact same manner, every table in the view.

I cannot see why this is, as I have specified the Table_Year (which the table is partitioned on) within the query text. Shouldn't SQL only update the necessary table?

Aaron Bertrand 2014-08-04 14:08:12
These aren't actually partitioned tables, and even if they were, partition elimination wouldn't really work for updating indexes unless all indexes were also partition-aligned.

Since you are using Express Edition and can't actually use partitioning, I have a different approach to recommend: dynamic UPDATEs that only affect the table(s) represented in @StartDate / @EndDate. You'll have to populate the list of parameters twice; once with their data types – this should be easy as I presume these are declared somewhere already.

DECLARE 
  @StartDate DATE = '20140605',
  @EndDate   DATE = '20150201';
 
 
DECLARE 
  @SQL NVARCHAR(MAX) = N' ',
  @baseSQL NVARCHAR(MAX) = N'UPDATE dbo.[Call_$y$] SET
      calStartDate = CASE 
          WHEN calFrequency = 14 THEN dbo.funDate(@MonthlyDay, MONTH(calStartDate), YEAR(calStartDate))
          WHEN calFrequency IN (15,16) THEN dbo.funMonthlyCallDate(calFrequency, @MonthlyDay, calStartDate)
          ELSE DateAdd(d, @StartDay-1, (calStartDate - datepart(dw,calStartDate)+1)) 
      END,
      calEndDate = CASE 
          WHEN calFrequency = 14 THEN dbo.funDate(@MonthlyDay + @EndDay - @StartDay, MONTH(calStartDate), YEAR(calStartDate))
          WHEN calFrequency IN (15,16) THEN DATEADD(D, @EndDay - @StartDay, dbo.funMonthlyCallDate(calFrequency, @MonthlyDay, calStartDate))
          ELSE DateAdd(d, @StartDay-1+@DayCount, (calStartDate - datepart(dw,calStartDate)+1)) 
      END,
      calTimeIn = @TimeIn,
      calTimeOut = @TimeOut,
      calMinutes = @Minutes,
      calMonthlyDay = @MonthlyDay,
      calClientInformed = Null, 
      calFamilyInformed = Null
  WHERE calPackageID = @PackageID
  AND calClientID = @ClientID
  AND calWeekNo = @WeekNo
  AND (DatePart(dw, calStartDate) = @OriginalDay OR calFrequency IN (14,15,16))
  AND calStartDate BETWEEN @StartDate AND @EndDate
  AND (calInvoiceID = 0 OR calInvoiceID Is Null OR @InvoicesFinalised = 1)
  AND (calWagesheetID = 0 OR calWagesheetID Is Null OR @WagesFinalised = 1)
  AND (calLocked = 0 OR calLocked Is Null)
  AND (Table_Year = YEAR(@StartDate) 
          OR Table_Year =YEAR(@EndDate));',
  @params NVARCHAR(MAX) = N'@MonthlyDay INT,...,@StartDate DATETIME,@EndDate DATETIME';
  -- you will need to update the params list 
  -- to reflect all of the params coming in
 
;WITH y AS 
(
  SELECT y FROM 
  (
    -- all valid years to choose from:
    VALUES(2014),(2015),(2016)
  ) AS y(y)
)
SELECT @SQL += REPLACE(@baseSQL, N'$y$', CONVERT(CHAR(4),y))
  FROM y 
  WHERE y >= YEAR(@StartDate) 
    AND y <= YEAR(@EndDate);
 
EXEC sp_executesql @SQL,
  @params,
  @MonthlyDay,...,@StartDate,@EndDate;
-- you will also need to update the list of params here
SQLkiwi 2014-08-05 18:33:41
>*Below this section it goes on to update, in the exact same manner, every table in the view. I cannot see why this is, as I have specified the Table_Year (which the table is partitioned on) within the query text. Shouldn't SQL only update the necessary table?*

The view meets all the partitioning requirements for both Table_Year and calStartDate. The latter column is modified by the UPDATE statement so the query optimizer has to produce a plan that is capable of moving rows between partitions.

In fact, rows could not move between partitions in this case because there is a 1:1 relationship between Table_Year and year-wise values of calStartDate, but the steps involved in that reasoning are far too opaque for the optimizer to make use of.

The new value for calStartDate is based on a complex expression that references variables. The query plan will be cached and could be reused when the variables have different values, which is just another factor that means the plan must be very general.

All these consideration lead to a plan that does not static feature partition elimination. It does, however, feature dynamic partition elimination:

On the reading side, the string of Filter operators immediately below the Concatenation are all startup filters. They evaluate their predicate before the subtree is executed. If the predicate evaluates to false, the subtree under the Filter is not executed.

The overall effect is that only tables under the view that could hold qualifying rows (depending on the runtime variable values) are accessed. Notice the execution plan only shows rows being read from one of the base tables, and the Actual Executions property for all other base tables is zero; these operators were not executed at all, at runtime. In the fragment below, the startup Filters mean only the green operators execute; the red ones never start at all:

Read-side plan fragment

On the writing side, The normal (not startup) Filter just to the right of each Clustered Index Update operator ensures that only changes for the current table are passed on. In the example plan, only one Clustered Index Update (and its associated nonclustered index maintenance operators) receives any rows:

Write-side plan fragment

Aaron Bertrand 2014-08-05 18:40:19
Great insight, the plan has to include all of the tables the update might touch when executed, but this does not mean they all will be touched. I regret not looking closer at the actuals. :\n
SQLkiwi 2014-08-05 18:51:05
In fairness, I don't blame you – the plan is huge 🙂