Incorrect row estimation given with a filtered index

Dave Ballantyne 2013-02-23 09:37:59

If i generate a calendar table, and I want to count all the weekend days in 2013 i can:

Select count(*) from calendar
where y =2013 and isWeekday =1 option(recompile)

This estimates at ~104 rows, baring rounding the actual row count.

This is not fast enough for my purposes so i add a filtered index.

Create index Weekends on calendar(y) where isWeekDay = 1

And row repeat the count

Select count(*) from calendar
where y =2013 and isWeekday =1 option(recompile)

The index is used in the seek but the estimate is now 365, 'worse' for want of a better word than without the index.

What gives ?

-------------------------------------------------------
FULL REPRO SCRIPT
 
 
use tempdb
go
 
Drop table Calendar
go
CREATE TABLE dbo.Calendar  
(  
     dt DATE NOT NULL 
         PRIMARY KEY CLUSTERED,  
 
     isWeekday BIT not null,
     Y integer not null
) 
go
 
with cte
as
( 
   Select N from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as x(N)
),
ctey
as
(
   Select dateadd(dd,ROW_NUMBER() over (order by (select null)),cast('19000101' as date) ) as dt
     from cte a cross join cte b cross join cte c cross join cte d cross join cte e
)
insert into Calendar(dt,isWeekday,y)
Select dt,
       case DATEPART(dw,dt)
        when 1 then 1
        when 7 then 1
        else 0 end,
        DATEPART(yy,dt)
 
 from ctey
 
 
 
go
select count(*) from Calendar
where y = 2013
  and isWeekday =1
option(recompile)
go
Create index Weekends on calendar(y) where isWeekDay = 1
go
select count(*) from Calendar
where y = 2013
  and isWeekday =1
option(recompile)
JasonRHall 2013-02-23 13:44:02
This is pretty interesting. I get the same estimate differences, but running the code I'm getting actual durations of 5ms without the filtered index and 0ms with the index, and it is pretty clear that going from scan to seek makes the difference.

Were I in your shoes, I would probably be happy with the results, but I'm also curious as to how the estimates are so different. I'll keep looking.

Dave Ballantyne 2013-02-23 14:28:05
Hi Jason,

Obviously happy with the drop in overall duration, just confused as to how the estimation could get worse.

For me a difference in estimated and actual is a pointer to a suboptimal plan and always keen to understand the underlying , and potentially valid, reasons for a discrepancy in them.

SQLkiwi 2013-02-23 13:51:39
This is a great question, Dave, thanks for sharing it. The filtered index execution plan does indeed show an estimate of 365 rows, and 104 actual:

Filtered index estimate

The answer to these sorts of questions is often found in the details of the execution plan, or in the underlying statistics, but that is not the case here.

Explanation

At a very early stage, the optimizer's logical view of your query is (simplified):

GbAgg <- Filter (isWeekday = 1) <- Filter (y = 2013) <- Scan

Where GbAgg is a group-by aggregate (the count(*)), the two Filters represent your query's predicates, and the Scan is a full scan of the Calendar table.

Cardinality estimates

At this point, the cardinality estimates derived from statistics are:

GbAgg (1) <- Filter (104) <- Filter (365) <- Scan (100,000)

The filter (year = 2013) is estimated to produce 365 rows (one row per day of the year). The second filter gets its 104 estimate from the statistics associated with your filtered index.

Index Matching

Part of the optimizer's work is to look at matching filters and scans to indexes. After this, the tree is:

GbAgg (1) <- Filter (104) <- Seek (365)

The way the index-matching code works, only index keys are matched to a Filter + Scan combination, so at this point only the y = 2013 filter is matched to the filtered index key (y).

Later on, the optimizer sees that the second Filter (isWeekday = 1) is redundant, since it matches the WHERE clause of the filtered index. It replaces the Filter with a no-operation (NOP):

GbAgg (1) <- NOP (104) <- Seek (365)

Optimizer output

We can see the final optimizer output with cardinality information using undocumented trace flags 8607, 8612 and the (well-known) 3604:

*** Output Tree: ***
 
  PhyOp_StreamGbAgg( ) [ Card=1 ]
     PhyOp_NOP [ Card=104 ]
        PhyOp_Range TBL: Calendar ASC  [ Card=365 ]
           ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [Calendar].Y
              ScaOp_Const Value=2013

Optimization has finished at this point, but this structure is not yet in a form the execution engine can handle.

Executable plan

For one thing, the execution engine does not have a physical operator that does nothing. The NOP group is removed from the tree during final conversion to an executable plan, leaving just the Stream Aggregate and Range Seek (on y = 2013). This form is very close to what we see in XML and graphical plans.

An unfortunate side-effect of removing the PhyOp_NOP group is that the 104 cardinality estimate is lost; we see only the 365 estimate on the Range Seek. Arguably, the NOP estimate could be copied to the operator below it, but that is not what happens right now.

Looks odd – but is it a problem?

The final plan looks odd to us, but remember the NOP group with the correct 104 cardinality estimate does exist within the optimizer's framework – it isn't removed until the optimizer has finished and the output is being translated to execution engine form.

If the query were more complex, the 104 estimate would still be available to operators further up the tree during optimization – they would not, generally speaking, be based on the 365 figure. For example:

select count(*) over ()
from Calendar 
where y = 2013
  and isWeekday =1;

The execution plan for that query is:

More complex plan

Notice the Seek still estimates 365 rows, but you have to imagine the removed NOP operator between the Seek and the next operator, the Segment. The Segment (and everything else above it) is based on the correct 104 row estimate from the 'invisible' NOP.

Workaround

For many reasons, it is often useful to include columns from a filtered index's WHERE clause in the key of that index:

CREATE INDEX Weekends ON dbo.Calendar(y, isWeekday) WHERE isWeekDay = 1;

With this change in place, there is no NOP group, and no weird-looking estimate:

104 estimate plan

Dave Ballantyne 2013-02-23 14:33:10
Hi Paul,

Great answer, and its quite leap to the concept of imagining the Noop sitting in the plan but it works.

Its interesting ( and unfortunate ) that this difference can only be seen with undocumented flags, I wonder if SQLSentry has any plans to exploit them inside plan explorer.

JasonRHall 2013-02-23 14:54:20
Yes this is a really great answer. It's definitely given me some more things to consider when I see strange estimates in other plans.

There are a lot of different reasons that something may or may not be included in Plan Explorer, but I can certainly make sure that this is considered, if it hasn't already been.

SQLkiwi 2013-02-26 05:04:37
It's hard to see how this sort of detail could be built into a commercial product responsibly. Not only are the required options undocumented, they do not work on all versions. In most cases, as here, the displayed estimates just look a bit odd – they don't affect plan choice, so overall I think it's probably best to leave this sort of thing for the rare circumstances that require it, and for the rare people that can appreciate it 🙂