why does it have to read 19 billon rows?

Henrik Staun Poulsen 2016-10-13 09:21:43

hi all,

Why is this query doing a Clustered Index Seek, with a 19 billion row estimate?
There is a primary key and the there should only be 2.5 million rows returned.

Best regards,
Henrik

My table looks like this

CREATE TABLE oda.GDATWTGShadow10MinData (
    GDATWTGShadow10MINDATAID                        BIGINT         IDENTITY (1, 1) NOT NULL,
    ODSDataSourceID                                 SMALLINT       NOT NULL,
    EXTExtractJobID                                 INT            NOT NULL,
    LotsOfOtherColumns                              CHAR(200)      not NULL,
    CONSTRAINT PK_GDATWTGShadow10MINDATA PRIMARY KEY CLUSTERED (GDATWTGShadow10MINDATAID ASC, ODSDataSourceID ASC) WITH (DATA_COMPRESSION = ROW) ON PS_ODSDataSourceID ([ODSDataSourceID]),
);
 
CREATE UNIQUE NONCLUSTERED INDEX IX_EXTExtractJobID_TWTGShadow10MINDATAID_ODSDataSourceID
    ON oda.GDATWTGShadow10MinData(EXTExtractJobID ASC, TWTGShadow10MINDATAID ASC, ODSDataSourceID ASC) WITH (DATA_COMPRESSION = PAGE)
    ON PS_ODSDataSourceID (ODSDataSourceID);

and the partition scheme and function:

CREATE PARTITION SCHEME [PS_ODSDataSourceID] AS PARTITION [PF_ODSDataSourceID] TO ([OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData], [OperationalData])
GO
 
CREATE PARTITION FUNCTION [PF_ODSDataSourceID](smallint) AS RANGE LEFT FOR VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169)
GO
SQLkiwi 2016-10-13 12:35:09
There is no uniqueness constraint on the GDATWTGShadow10MINDATAID column alone. The primary key specifies that the combination of GDATWTGShadow10MINDATAID and ODSDataSourceID is unique, but that is not the same thing. Also note that GDATWTGShadow10MINDATAID having the identity property does not guarantee uniqueness either (reseeding or inserting with identity insert on will allow duplicates).

Even there were a suitable column uniqueness guarantee, SQL Server will not necessarily infer that a range of whole-number values will limit the rows returned, in the way a human naturally would. Estimates are generally computed from statistics, which may be adjusted for data modifications since they were collected, the sampling level (partitioned indexes no longer automatically get full scan stats from 2012), ascending keys (which is less than perfect in the new CE), and so on.

The original CE had modelling extensions under TF 2301 that would apply whole-number inferences in some cases, but that flag does not apply to the new CE. You could try compiling the query under the original CE (TF 9481) with and without 2301 to see what impact that has.

I would check the statistics and investigate how the new CE ascending key logic might be being applied.

Henrik Staun Poulsen 2016-10-13 12:52:30
Hi SQLkiwi,

Thank you very much for your answer. In the meantime, I ran

UPDATE STATISTICS oda.[GDATWTGShadow10MinData] (pk_GDATWTGShadow10MinData )

and now the query has an estimated cost of 14 (previously 272000), and it completed in 2 minutes (probably most of the time due to SSMS).

I understand that there is not uniqueness constraint on the tableID column alone, but until recently our SQL Server has not made these huge mistakes everywhere. I suspect that a bug has been introduced with CU2?

Either that, or our (Ola Hallengren) index maintenance jobs are taking a nap. 🙂

Best regards,
Henrik

SQLkiwi 2016-10-16 05:12:21
It's hard to say. As I tried to indicate in my answer, I think the root of the problem was SQL Server extrapolating from sampled stats, or applying ascending key adjustments. Overwriting the stats with new ones might solve the problem for now, but it could come back. With the original stats object gone, it's going to be tough to analyze further.