Incorrect row estimation given with a filtered index

If i generate a calendar table, and I want to count all the weekend days in 2013 i can:

 Select count(*) from calendar
 where y =2013 and isWeekday =1 option(recompile)

This estimates at ~104 rows, baring rounding the actual row count.

This is not fast enough for my purposes so i add a filtered index.

 Create index Weekends on calendar(y) where isWeekDay = 1

And row repeat the count

 Select count(*) from calendar
 where y =2013 and isWeekday =1 option(recompile)

The index is used in the seek but the estimate is now 365, 'worse' for want of a better word than without the index.

What gives ?

 -------------------------------------------------------
 FULL REPRO SCRIPT
 
 
 use tempdb
 go
 
 Drop table Calendar
 go
 CREATE TABLE dbo.Calendar  
 (  
      dt DATE NOT NULL 
          PRIMARY KEY CLUSTERED,  
       
      isWeekday BIT not null,
      Y integer not null
 ) 
 go
 
 with cte
 as
 ( 
    Select N from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as x(N)
 ),
 ctey
 as
 (
    Select dateadd(dd,ROW_NUMBER() over (order by (select null)),cast('19000101' as date) ) as dt
      from cte a cross join cte b cross join cte c cross join cte d cross join cte e
 )
 insert into Calendar(dt,isWeekday,y)
 Select dt,
        case DATEPART(dw,dt)
         when 1 then 1
         when 7 then 1
         else 0 end,
         DATEPART(yy,dt)
         
  from ctey
 
 
 
 go
 select count(*) from Calendar
 where y = 2013
   and isWeekday =1
 option(recompile)
 go
 Create index Weekends on calendar(y) where isWeekDay = 1
 go
 select count(*) from Calendar
 where y = 2013
   and isWeekday =1
 option(recompile)
avatar image By Dave Ballantyne 263 asked Feb 23, 2013 at 09:37 AM
more ▼
(comments are locked)
avatar image JasonRHall ♦♦ Feb 23, 2013 at 01:44 PM

This is pretty interesting. I get the same estimate differences, but running the code I'm getting actual durations of 5ms without the filtered index and 0ms with the index, and it is pretty clear that going from scan to seek makes the difference.

Were I in your shoes, I would probably be happy with the results, but I'm also curious as to how the estimates are so different. I'll keep looking.

avatar image Dave Ballantyne Feb 23, 2013 at 02:28 PM

Hi Jason,

Obviously happy with the drop in overall duration, just confused as to how the estimation could get worse.

For me a difference in estimated and actual is a pointer to a suboptimal plan and always keen to understand the underlying , and potentially valid, reasons for a discrepancy in them.

10|10000 characters needed characters left

1 answer: sort voted first

This is a great question, Dave, thanks for sharing it. The filtered index execution plan does indeed show an estimate of 365 rows, and 104 actual:

Filtered index estimate

The answer to these sorts of questions is often found in the details of the execution plan, or in the underlying statistics, but that is not the case here.

Explanation

At a very early stage, the optimizer's logical view of your query is (simplified):

 GbAgg <- Filter (isWeekday = 1) <- Filter (y = 2013) <- Scan

Where GbAgg is a group-by aggregate (the count(*)), the two Filters represent your query's predicates, and the Scan is a full scan of the Calendar table.

Cardinality estimates

At this point, the cardinality estimates derived from statistics are:

 GbAgg (1) <- Filter (104) <- Filter (365) <- Scan (100,000)

The filter (year = 2013) is estimated to produce 365 rows (one row per day of the year). The second filter gets its 104 estimate from the statistics associated with your filtered index.

Index Matching

Part of the optimizer's work is to look at matching filters and scans to indexes. After this, the tree is:

 GbAgg (1) <- Filter (104) <- Seek (365)

The way the index-matching code works, only index keys are matched to a Filter + Scan combination, so at this point only the y = 2013 filter is matched to the filtered index key (`y`).

Later on, the optimizer sees that the second Filter (isWeekday = 1) is redundant, since it matches the WHERE clause of the filtered index. It replaces the Filter with a no-operation (NOP):

 GbAgg (1) <- NOP (104) <- Seek (365)

Optimizer output

We can see the final optimizer output with cardinality information using undocumented trace flags 8607, 8612 and the (well-known) 3604:

 *** Output Tree: ***
 
   PhyOp_StreamGbAgg( ) [ Card=1 ]
      PhyOp_NOP [ Card=104 ]
         PhyOp_Range TBL: Calendar ASC  [ Card=365 ]
            ScaOp_Comp x_cmpEq
               ScaOp_Identifier QCOL: [Calendar].Y
               ScaOp_Const Value=2013

Optimization has finished at this point, but this structure is not yet in a form the execution engine can handle.

Executable plan

For one thing, the execution engine does not have a physical operator that does nothing. The NOP group is removed from the tree during final conversion to an executable plan, leaving just the Stream Aggregate and Range Seek (on y = 2013). This form is very close to what we see in XML and graphical plans.

An unfortunate side-effect of removing the PhyOp_NOP group is that the 104 cardinality estimate is lost; we see only the 365 estimate on the Range Seek. Arguably, the NOP estimate could be copied to the operator below it, but that is not what happens right now.

Looks odd - but is it a problem?

The final plan looks odd to us, but remember the NOP group with the correct 104 cardinality estimate does exist within the optimizer's framework - it isn't removed until the optimizer has finished and the output is being translated to execution engine form.

If the query were more complex, the 104 estimate would still be available to operators further up the tree during optimization - they would not, generally speaking, be based on the 365 figure. For example:

 select count(*) over ()
 from Calendar 
 where y = 2013
   and isWeekday =1;

The execution plan for that query is:

More complex plan

Notice the Seek still estimates 365 rows, but you have to imagine the removed NOP operator between the Seek and the next operator, the Segment. The Segment (and everything else above it) is based on the correct 104 row estimate from the 'invisible' NOP.

Workaround

For many reasons, it is often useful to include columns from a filtered index's WHERE clause in the key of that index:

 CREATE INDEX Weekends ON dbo.Calendar(y, isWeekday) WHERE isWeekDay = 1;

With this change in place, there is no NOP group, and no weird-looking estimate:

104 estimate plan

sp.png (21.4 kB)
sp.png (38.2 kB)
sp.png (14.4 kB)
avatar image By SQLkiwi ♦ 6.6k answered Feb 23, 2013 at 01:51 PM
more ▼
(comments are locked)
avatar image Dave Ballantyne Feb 23, 2013 at 02:33 PM

Hi Paul,

Great answer, and its quite leap to the concept of imagining the Noop sitting in the plan but it works.

Its interesting ( and unfortunate ) that this difference can only be seen with undocumented flags, I wonder if SQLSentry has any plans to exploit them inside plan explorer.

avatar image JasonRHall ♦♦ Feb 23, 2013 at 02:54 PM

Yes this is a really great answer. It's definitely given me some more things to consider when I see strange estimates in other plans.

There are a lot of different reasons that something may or may not be included in Plan Explorer, but I can certainly make sure that this is considered, if it hasn't already been.

avatar image SQLkiwi ♦ Feb 26, 2013 at 05:04 AM

It's hard to see how this sort of detail could be built into a commercial product responsibly. Not only are the required options undocumented, they do not work on all versions. In most cases, as here, the displayed estimates just look a bit odd - they don't affect plan choice, so overall I think it's probably best to leave this sort of thing for the rare circumstances that require it, and for the rare people that can appreciate it :)

10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question

Topics:

x641
x166
x26
x2

asked: Feb 23, 2013 at 09:37 AM

Seen: 2547 times

Last Updated: Feb 26, 2013 at 05:04 AM