Accurate stats yet the histogram doesn't have that value

Dan Holmes 2014-09-11 18:32:11

I have a query that is:

SELECT * FROM table WHERE date IS NULL;

This is the first 4 rows from the histogram of that index. It doesn't have a NULL in the histogram. However, the plan shows that estimated rows and actual rows are the same: 18206. How did the optimizer figure that out? I am going to answer my own question and look for confirmation or rebuke.

The total number of rows in the table is 504974 (The header of stats is below). The sum of all key values from the histogram is 486591 there are two steps with average rows of 53 and 124. If you perform this: 504974 – 486591 – 53 – 126 you get 18206. Seems a coincidence for that to be not how the optimizer got that answer. How does it know though that it means the cardinality of the NULL values?

SELECT * FROM table WHERE date = '2009-09-05'

That value isn't in the histogram and is before the first step yet the optmizer recognizes that there are not any rows with that value and produces an esitmate of zero.

Name    Updated                 Rows    Rows Sampled    Steps   Density Average key length  String Index
date    Sep 11 2014 12:47PM 504974  504974  198 0.01129944  19.71157    NO 
RANGE_HI_KEY            RANGE_ROWS  EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
2009-09-06 00:00:00.000 0           109 0                   1
2009-09-16 00:00:00.000 0           746 0                   1
2009-09-26 00:00:00.000 0           304 0                   1
2009-10-06 00:00:00.000 0           674 0                   1
SQLkiwi 2014-09-12 03:21:20
Can you add the table definition and script the date statistics object please?
Dan Holmes 2014-09-12 11:30:23
This is sql2005. I have a db backup with only this table in it if you want it.

CREATE TABLE #tbltrips_IX_tbltrips_tripdates (id INT NOT NULL , tripdate DATETIME NULL, zeffectiveenddate DATETIME NULL, agencyid SMALLINT NOT NULL, PRIMARY KEY (id, agencyid));
CREATE INDEX tripdates ON #tbltrips_IX_tbltrips_tripdates (tripdate, zeffectiveenddate);
Dan Holmes 2014-09-12 13:05:47
I have found an interesting difference between 2005 and 2008r2. I restored the backup i made from the 2005 db on a 2008r2 instance. Then ran DBCC SHOW_STATISTICS. The NULL value is in that output. I didn't update stats before i did that. Now i am thinking that the null value is always there but the show_statistics command just didn't report it in 2005 but does in 2008+. Add to that, that i have seen histogram outputs with 201 rows and i think a good case can made for a 'reserved' row in the histogram for null values and that that reserved row was only exposed in 2008+.

RANGE_HI_KEY    RANGE_ROWS  EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
NULL    0   18217   0   1
SQLkiwi 2014-09-13 06:20:36
You might be right about the 2005 NULL row thing – it rings a faint bell. I'll fire up my old 2005 VM later to see if I can reproduce. EDIT: Confirmed. SQL Server 2005 DBCC SHOW_STATISTICS does not show the NULL step. Tried the same test on 2008 (not R2) and the NULL step is shown.