I'm trying to understand the XML statistics used.

Mart 2014-02-28 13:47:45

Hi All

I'm trying to understand how the optimizer is working out the stats on the this query. The table valued functions I guess are inbuilt for the purpose of working with XML so I was wondering where they live and how/if I can do anything about these wildly wrong stats.

This is a small part of a larger query that I'm tuing and I've not done too much with XML so any pointers or explanations as to why this looks so wrong and how/if I can make it better would be greatly apprecitated.

Why it's done with XML is something else, but that's a different question, for now I just need to understand.

thanks in advance


P.S. The bit I'm looking at is the table-valued function [XML Reader] that estimates at 180,000 rows yet actually there is one. Also the Table-valued function [XML Reader with XPath filter] which estimates at 752 rows but the actual is 1.

I've searched the internet and can see some internal functions have set statistics but I can't find anything about this and was wondering if there is a way to improve it? thanks.

Mikael Eriksson 2014-03-04 14:10:47
There is no statistics generated for XML columns and there is no statistics for XML variables. The content of the variable is not used at all when SQL Server creates the query plan except for the case when the variable is guaranteed to be null or guaranteed to be not null.

The estimates is guessed based on the expressions used when you query the XML.

I have found nothing that describes the rules used to make the estimated row counts other than a presentation by Michael Rys, Best Practices and Performance Tuning of XML Queries in SQL Server, that says "Base cardinality estimate is always 10'000 rows!".

In your query you can for instance influence the estimated number of rows by specifying that the root node ArrayOfString is only present once.

SELECT COUNT(Tab.c.value('node()[1]', 'NVARCHAR(100)'))
FROM @externalReferences.nodes('/ArrayOfString[1]/string') Tab(c)

The estimated number of rows for the Table-valued function that fetches the values for each node is then 4308.

A simpler way to write your XML query would be to use the function count().

select @externalReferences.value('count(/ArrayOfString/string/text())', 'int')

That will give you a query plan that looks like this showing the estimated values.

alt text

Mart 2014-03-05 00:25:33
Thank you so much Michael, that's great and gives me something to get my teeth into.