What could be causing this very large constant scan?

JNK 2013-06-14 14:01:46

This is a fairly complicated XML query. All the JOIN keys have been broken out into static columns, but most of the columns in the SELECT are in an XML field that we pull out using nodes and values methods.

There is a primary XML index on the xml column being queried.

There are absolutely no literal constants in the query, and only about 5k rows of actual data in the source table.

Mikael Eriksson 2013-06-14 15:11:03
It would be really interesting to see the query you have that built that plan. I would guess that the clustered index seek you have against Table7 really is against the internal table for the primary XML index. No idea what the constant scan is about.
JNK 2013-06-14 15:12:33
I think the constant scan is related to the implicit conversions involved. I'm going to try using an XML schema on the column to see if it helps.
Mikael Eriksson 2013-06-15 19:51:53
The large constant scan is used to filter rows from the XML index internal table on the hid column to give you the attributes you have specified in the query.

Test data:

create table T
(
  ID int identity primary key,
  XMLCol xml not null
)
 
create primary xml index IX_T on T(XMLCol)
 
go
 
insert into T(XMLCol)
select '<root><item '+
       (
       select ' Name'+cast(number as varchar(10))+'="1"'
       from master..spt_values
       where type = 'P' and
             number between 1 and 200 -- 200 attributes
       for xml path('')
       )+
       ' /></root>'
from master..spt_values
where type = 'P' and
      number between 1 and 100 -- 100 rows

A query that fetches two attributes from T

select I.N.value('@Name1', 'varchar(10)'),
       I.N.value('@Name2', 'varchar(10)')
from T
  cross apply T.XMLCol.nodes('/root/item') as I(N)

will give you a query plan that uses a Clustered Index Scan on the internal table for the nodes() part and a Clustered Index Seek to get the values from the attributes with a predicate on the seek to get the nodes you want.

alt text

That will hold true until you specify 16 or more attributes. For more than 16 attributes but less than 65 the predicate on the seek is moved out to a filter operator of its own.

alt text

When you query 65 or more attributes in the query you will get the nested loop join to the constant scan.

alt text

All above is true when you query against a table that has a fair amount of data and the attributes actually has values. With less data in the table you will get different query plans.

Here is a stored procedure you can use to test the threshold values if you like.

create procedure QueryT
  @NumAtt int
as
 
declare @S nvarchar(max)
 
set @S = 'select '+stuff((select ',I.N.value(''@Name'+cast(number as varchar(10))+''', ''varchar(10)'')'
                          from master..spt_values
                          where type = 'P' and
                                number between 1 and @NumAtt
                          for xml path('')), 1, 1, '')+'
          from T
          cross apply T.XMLCol.nodes(''/root/item'') as I(N)'
exec (@S) 
 
go
 
exec QueryT 15
exec QueryT 16
exec QueryT 65

Query plans above is generated with SQL Server 2012

I guess this answers your question but sadly it will not help you improve the performance of your query.

SQLkiwi 2013-06-16 03:22:01
+1 Good answer Mikael. The hard-coded numbers for the transition to separate filter and then constant scan nested loops is quite general, it's not just for XML plans. A maximum of 15 filter predicates can be pushed into a scan or seek as a residual predicate, and a maximum of 64 seeking operations can be performed by a seek operator.

As far as improving performance is concerned, I wonder if a different technique for shredding, like OPENXML would perform better when many attributes are involved?

Mikael Eriksson 2013-06-16 14:07:17
I have done some testing and using OPENXML can do better but it depends on the XML stored in the table.

OPENXML can only work on a single XML document so you have to either build a big XML using for xml path where the XML column is extracted as is and then used by OPENXML on the big XML or you can process the table with a cursor, adding rows to a temporary table as you shred the XML one row at a time.

I could not create a case where the "big XML" version actually did better then querying the XML column.

The cursor version did better when the cross apply against nodes() adds quite a few rows to the output. And the more rows you have in the table there needs to be more nodes added by nodes() for OPENXML to be faster.