I'm trying to understand the XML statistics used.
I'm trying to understand how the optimizer is working out the stats on the this query. The table valued functions I guess are inbuilt for the purpose of working with XML so I was wondering where they live and how/if I can do anything about these wildly wrong stats.
This is a small part of a larger query that I'm tuing and I've not done too much with XML so any pointers or explanations as to why this looks so wrong and how/if I can make it better would be greatly apprecitated.
Why it's done with XML is something else, but that's a different question, for now I just need to understand.
thanks in advance
Mart
P.S. The bit I'm looking at is the table-valued function [XML Reader] that estimates at 180,000 rows yet actually there is one. Also the Table-valued function [XML Reader with XPath filter] which estimates at 752 rows but the actual is 1.
I've searched the internet and can see some internal functions have set statistics but I can't find anything about this and was wondering if there is a way to improve it? thanks.
The estimates is guessed based on the expressions used when you query the XML.
I have found nothing that describes the rules used to make the estimated row counts other than a presentation by Michael Rys, Best Practices and Performance Tuning of XML Queries in SQL Server, that says "Base cardinality estimate is always 10'000 rows!".
In your query you can for instance influence the estimated number of rows by specifying that the root node ArrayOfString is only present once.
SELECT COUNT(Tab.c.value('node()[1]', 'NVARCHAR(100)')) FROM @externalReferences.nodes('/ArrayOfString[1]/string') Tab(c)
The estimated number of rows for the Table-valued function that fetches the values for each node is then 4308.
A simpler way to write your XML query would be to use the function count().
select @externalReferences.value('count(/ArrayOfString/string/text())', 'int')
That will give you a query plan that looks like this showing the estimated values.
Mart