Statistics on non persisted computed columns

GokhanVarol 2013-06-20 18:20:57

I created some computed columns using a case and try_cast function in the case.
I cannot create the statistics on the column manually I get the error
"Column 'ID_ISNULL' in table 'dbo.TEST_DATES' cannot be used in an index or statistics or as a partition key because it is non-deterministic.".
But when I query the column I see statilink textstics are created on the column, unfortunately when I try to use talt texthe column in the where clause it does not seem to use the statistics on the column to get a row estimate.
I attached the script that will create the test table in tempdb and populate it with data.

How can I make the computed column deterministic (it actually is deterministic) ?
Why the row estimates does not use the existing auto created statistics?

link text

link text

SQLkiwi 2013-06-20 20:46:34
Why are your code examples always so complicated? ;c)

The issue is documented in the CAST and CONVERT Books Online topic.

Conversion from a string to a date is deterministic only when a deterministic style is specified. To specify a style, use TRY_CONVERT instead of TRY_CAST. You can use the COLUMNPROPERTY function to see if a column is deterministic or not.

Simple code that is careful with types and explicit about conversions is shown below ๐Ÿ™‚

CREATE TABLE dbo.Test
(
    ID integer NULL,
    Calc AS 
        CASE 
            WHEN ID NOT BETWEEN 19000000 AND 22000000 THEN NULL
            WHEN TRY_CONVERT(date, CONVERT(varchar(11), ID), 112) IS NOT NULL
                THEN TRY_CONVERT(date, CONVERT(varchar(11), ID), 112)
        END
);
 
-- Returns 1
SELECT COLUMNPROPERTY(OBJECT_ID(N'dbo.Test', N'U'), 'Calc', 'IsDeterministic');
 
-- A test row
INSERT dbo.Test (ID) VALUES (19730101)
 
-- Show the table contents
SELECT ID, Calc FROM dbo.Test;
 
-- Succeeds
CREATE STATISTICS stats_TD_Calc
ON dbo.Test (Calc);
GokhanVarol 2013-06-20 20:48:49
I just rewrote the function and only used datefromparts and replaced the existing one, but I am still running into the same error for some reason. (I also modified the attachment)


CASE WHEN ISDATE(ID) = 1 THEN DATEFROMPARTS(ID / 10000, ( ID / 100 ) % 100, ID % 100)
WHEN ID NOT BETWEEN 19000000 AND 22000000 THEN NULL
WHEN ISDATE(ID – ( ID % 100 ) + 1) = 1 THEN DATEFROMPARTS(ID / 10000, ( ID / 100 ) % 100, 1)
WHEN ISDATE(( ID / 10000 ) * 10000 + 100 + ID % 100) = 1 THEN DATEFROMPARTS(ID / 10000, 1, ID % 100)
WHEN ISDATE(( ID / 10000 ) * 10000 + 101) = 1 THEN DATEFROMPARTS(ID / 10000, 1, 1)
END

SQLkiwi 2013-06-20 20:50:51
Read my answer again!
GokhanVarol 2013-06-20 20:53:46
Thank you (datefromparts actually worked I mixed the scripts during testing)
GokhanVarol 2013-06-20 21:02:36
Now the columns are deterministic and I also created the statistics, but it seems like it's not using it.
SQLkiwi 2013-06-20 21:22:07
In both cases, the table scan is a full table scan, so of course the cardinality is 410001. The Filter applies the computed column predicate, and that is where the statistics are used. The fact that the filter isn't pushed into the scan (as a residual predicate) is a quirk of the implementation. If you create an index on e.g. the ID_CALC_CASE column the predicate will match resulting in a seek with a good cardinality estimate, without a separate filter.
GokhanVarol 2013-06-20 23:34:32
Is there any difference with a non persistent columns statistic and a regular columns statistics?

(the date in the database is stored as int but it can be 20130000 or 20139900 or 20130199 instead of valid, the logic needed is if only month wrong put 01, if only day wrong put 01 if both wrong put 0101 and try to convert again, some old mainframe logic needs to be duplicated identically, and I also found out as I was rewriting this 4 times that datefromparts is faster than try_cast/try_convert (by twice plus) for some reason)

SQLkiwi 2013-06-21 00:06:11
No difference.
GokhanVarol 2013-06-20 23:54:00
In todays implementation there is a matching view for every table and they use this inline table valued function to calculate the date, then people join this views and then filter on conditions like

ISNULL(SaleDtDerived, RecordingDtDerived) BETWEEN CAST('1990/01/01' AS DATE) AND GETDATE()

which creates wrong estimations plus more (estimations are usually very low, like 1/100 or more low). I am hoping with this calculated columns I can help that. I am also hoping I can some how help this ISNULL(SaleDtDerived, RecordingDtDerived) by creating another calculated column (this was in the sample text file) as below
IsNullCalCol as isnull( full case for col1, full case for col1)
and hoping sqlserver can match ISNULL(SaleDtDerived, RecordingDtDerived) to that somehow, I have not yet to test these.

link text

SQLkiwi 2013-06-21 00:07:59
Yes using a computed column in this way is a standard technique to get better estimates for complex predicates. Beware that computed columns cannot reference other computed columns, and an exact textual match is normally required.
GokhanVarol 2013-06-21 00:13:47
Exact textual match includes space / tab / newline / case
differences (I use sqlprompt and other free formatters may create newline / tab etc differences)?

And when user types
ISNULL(SaleDtDerived, RecordingDtDerived)
and I have a computed column which is
ComputeA as ISNULL(Exact Text Of SaleDtDerived opened, Exact Text Of RecordingDtDerived opened)

can the optimizer match to that also (since I cannot create a compute column depending another compute column)?

GokhanVarol 2013-06-21 00:25:25
ahh, but when I add a compute column sql server formats it all to a single line, then they should match regardless of if formatters change the script???
SQLkiwi 2013-06-21 00:26:43
Not spaces and tabs. no. I mean a computed column c1 = a * b would not match a query with a predicate on b * a. The matching is quite basic, so your ISNULL example would not work, as far as I recall. Try it.
GokhanVarol 2013-06-21 00:15:47
I guess I can use 9204 or 9292 to confirm?
SQLkiwi 2013-06-21 00:29:07
I believe so.
GokhanVarol 2013-06-21 13:04:40
… and if I choose to not create index on the columnns should I create a user single column statistics or I should leave sql server create a single column stattistics automatically?
SQLkiwi 2013-06-21 13:49:23
It really doesn't matter too much, though bear in mind the default behaviours of index and auto-statistics regarding full scan and sampling.
GokhanVarol 2013-06-21 04:46:58
After creating the table and inserting records into it there were no statistics yet.
I created 2 indexes

CREATE INDEX ID_CALC_CASE ON test (ID_CALC_CASE)
CREATE INDEX ID_ISNULL ON test (ID_ISNULL)

and that created 5 statistics, 2 uses the indexid , 4 does not. For the columns I created index there were 2 statistics on, why would that be?

I ran the query which has ISNULL(ID_CALC_CASE, ID2_CALC_CASE) = x whereas the table has a computed column ID_ISNULL as ISNULL(extracted compute clause of ID_CALC_CASE, extracted compute clause of ID2_CALC_CASE). It seems like in the query plan it was able to create a link ISNULL(ID_CALC_CASE, ID2_CALC_CASE) to ID_ISNULL, or did it?
If it did this is really cool ๐Ÿ™‚

alt text

link text

SQLkiwi 2013-06-21 13:51:20
Off the top of my head, I think the 'duplicate' statistics are created during optimization of the CREATE INDEX statement. Parallel index build plans use range partitioning, so the engine needs statistics to choose good range values for that. There may be other subtleties too, but there is nothing to be concerned about.
GokhanVarol 2013-06-21 04:51:43
When I check the table script, it converts the compute columns into a single line with brackets, paranthesis to a standard formatting, maybe that's how it's matching it

CREATE TABLE [dbo].[TEST] (
    [ID] int NULL,
    [ID2] int NULL,
    [ID_CALC_CASE] AS (case when [ID] IS NULL then NULL when NOT ([ID]>=(19000000) AND [ID]<=(22000000)) then NULL when ([ID]/(100))%(100)=(11) OR ([ID]/(100))%(100)=(9) OR ([ID]/(100))%(100)=(6) OR ([ID]/(100))%(100)=(4) then datefromparts([ID]/(10000),([ID]/(100))%(100),case when [ID]%(100)>=(1) AND [ID]%(100)<=(30) then [ID]%(100) else (1) end) when ([ID]/(100))%(100)=(12) OR ([ID]/(100))%(100)=(10) OR ([ID]/(100))%(100)=(8) OR ([ID]/(100))%(100)=(7) OR ([ID]/(100))%(100)=(5) OR ([ID]/(100))%(100)=(3) OR ([ID]/(100))%(100)=(1) then datefromparts([ID]/(10000),([ID]/(100))%(100),case when [ID]%(100)>=(1) AND [ID]%(100)<=(31) then [ID]%(100) else (1) end) when ([ID]/(100))%(100)=(2) then datefromparts([ID]/(10000),(2),case when ([ID]/(10000))%(4)=(0) then case when [ID]%(100)>=(1) AND [ID]%(100)<=(29) then [ID]%(100) else (1) end else case when [ID]%(100)>=(1) AND [ID]%(100)<=(28) then [ID]%(100) else (1) end end) else datefromparts([ID]/(10000),(1),case when [ID]%(100)>=(1) AND [ID]%(100)<=(31) then [ID]%(100) else (1) end) end),
    [ID2_CALC_CASE] AS (case when [ID2] IS NULL then NULL when NOT ([ID2]>=(19000000) AND [ID2]<=(22000000)) then NULL when ([ID2]/(100))%(100)=(11) OR ([ID2]/(100))%(100)=(9) OR ([ID2]/(100))%(100)=(6) OR ([ID2]/(100))%(100)=(4) then datefromparts([ID2]/(10000),([ID2]/(100))%(100),case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(30) then [ID2]%(100) else (1) end) when ([ID2]/(100))%(100)=(12) OR ([ID2]/(100))%(100)=(10) OR ([ID2]/(100))%(100)=(8) OR ([ID2]/(100))%(100)=(7) OR ([ID2]/(100))%(100)=(5) OR ([ID2]/(100))%(100)=(3) OR ([ID2]/(100))%(100)=(1) then datefromparts([ID2]/(10000),([ID2]/(100))%(100),case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(31) then [ID2]%(100) else (1) end) when ([ID2]/(100))%(100)=(2) then datefromparts([ID2]/(10000),(2),case when ([ID2]/(10000))%(4)=(0) then case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(29) then [ID2]%(100) else (1) end else case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(28) then [ID2]%(100) else (1) end end) else datefromparts([ID2]/(10000),(1),case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(31) then [ID2]%(100) else (1) end) end),
    [ID_ISNULL] AS (isnull(case when [ID] IS NULL then NULL when NOT ([ID]>=(19000000) AND [ID]<=(22000000)) then NULL when ([ID]/(100))%(100)=(11) OR ([ID]/(100))%(100)=(9) OR ([ID]/(100))%(100)=(6) OR ([ID]/(100))%(100)=(4) then datefromparts([ID]/(10000),([ID]/(100))%(100),case when [ID]%(100)>=(1) AND [ID]%(100)<=(30) then [ID]%(100) else (1) end) when ([ID]/(100))%(100)=(12) OR ([ID]/(100))%(100)=(10) OR ([ID]/(100))%(100)=(8) OR ([ID]/(100))%(100)=(7) OR ([ID]/(100))%(100)=(5) OR ([ID]/(100))%(100)=(3) OR ([ID]/(100))%(100)=(1) then datefromparts([ID]/(10000),([ID]/(100))%(100),case when [ID]%(100)>=(1) AND [ID]%(100)<=(31) then [ID]%(100) else (1) end) when ([ID]/(100))%(100)=(2) then datefromparts([ID]/(10000),(2),case when ([ID]/(10000))%(4)=(0) then case when [ID]%(100)>=(1) AND [ID]%(100)<=(29) then [ID]%(100) else (1) end else case when [ID]%(100)>=(1) AND [ID]%(100)<=(28) then [ID]%(100) else (1) end end) else datefromparts([ID]/(10000),(1),case when [ID]%(100)>=(1) AND [ID]%(100)<=(31) then [ID]%(100) else (1) end) end,case when [ID2] IS NULL then NULL when NOT ([ID2]>=(19000000) AND [ID2]<=(22000000)) then NULL when ([ID2]/(100))%(100)=(11) OR ([ID2]/(100))%(100)=(9) OR ([ID2]/(100))%(100)=(6) OR ([ID2]/(100))%(100)=(4) then datefromparts([ID2]/(10000),([ID2]/(100))%(100),case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(30) then [ID2]%(100) else (1) end) when ([ID2]/(100))%(100)=(12) OR ([ID2]/(100))%(100)=(10) OR ([ID2]/(100))%(100)=(8) OR ([ID2]/(100))%(100)=(7) OR ([ID2]/(100))%(100)=(5) OR ([ID2]/(100))%(100)=(3) OR ([ID2]/(100))%(100)=(1) then datefromparts([ID2]/(10000),([ID2]/(100))%(100),case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(31) then [ID2]%(100) else (1) end) when ([ID2]/(100))%(100)=(2) then datefromparts([ID2]/(10000),(2),case when ([ID2]/(10000))%(4)=(0) then case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(29) then [ID2]%(100) else (1) end else case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(28) then [ID2]%(100) else (1) end end) else datefromparts([ID2]/(10000),(1),case when [ID2]%(100)>=(1) AND [ID2]%(100)<=(31) then [ID2]%(100) else (1) end) end)),
    [ID_FIXED] date NULL)
SQLkiwi 2013-06-21 13:54:01
The scripted form is generated from metadata. Matching to the metadata form is done after parsing and processes like normalization – it's not text matching exactly. The deep details don't matter very much though. I'm not 100% convinced the expressions are matching as you describe but I haven't had enough spare time to check for myself yet.
GokhanVarol 2013-06-21 04:53:16