It works but not sure how does it work

GokhanVarol 2013-05-24 04:04:14

I have a slightly expensive clr table valued function which can take many columns as input and can return none or many rows. The input to the function is already stored in a temporary table and I have a piece of code simply applying CLR function to temp table input and inserting into a new temp table. The original query plan screenshot is attached, it did not run in parallel. I was testing to make this query run in parallel and I created a table VTUTest.dbo.ParallelFeed_24031 and inserted 1 value in it (the value is 0) and faked it statistics (UPDATE STATISTICS dbo.ParallelFeed_24031 WITH ROWCOUNT = 10000000) and created a query that looked like had a high cost from a sort on VTUTest.dbo.ParallelFeed_24031 table and getting top 1 value and then inner joining to the input table on some fake join that will return 1 row always ( INNER JOIN VTUTest.dbo.VTUInput_24031 AS v ON SIGN(v.[Row]) -1 = pf.[Row] ) and then faked a variable @This_Is_to_Fake_the_optimizer_to_get_a_parallel_query_plan_Its_Value_Is_0 so that at compile time it filters out the clr function call so that the query runs nice and parallel and scales pretty well, I am happy with it so far.
What I cannot get my head around though is even though the first join has a top 1 in it it still manages to parallelize into 6 threads in equal chunks. At first I thought I had to find out the servers maxdop and put that many records into the temp table and do a join like (INNER JOIN VTUTest.dbo.VTUInput_24031 AS v ON v.[Row] % @maxdop = pf.[Row]) which also worked but has more lines of code and more variables.
Anyhow the code works so far (suggestions always more than welcome, I am trying things out here = I think I know what I am doing but I know in some time I will realize I had no clue ๐Ÿ™‚ ) but how can a 1 row table scan can end up in a nicely threaded query?
Thank you

alt text

DECLARE
@This_Is_to_Fake_the_optimizer_to_get_a_parallel_query_plan_Its_Value_Is_0 INT = 0
SELECT  v.[Row]
...
FROM    ( SELECT TOP ( 1 )
                    [Row]
          FROM      VTUTest.dbo.ParallelFeed_24031
          ORDER BY  [Row] ASC
        ) pf
        INNER JOIN VTUTest.dbo.VTUInput_24031 AS v ON SIGN(v.[Row] ) -1= pf.[Row]
        OUTER APPLY ( SELECT    *
                      FROM      aTrans.ACLRFUNCTION(Inputs) ) vtu
                      WHERE     @This_Is_to_Fake_the_optimizer_to_get_a_parallel_query_plan_Its_Value_Is_0 = 0
                    ) AS vtu
 
OPTION  ( RECOMPILE, OPTIMIZE FOR ( @This_Is_to_Fake_the_optimizer_to_get_a_parallel_query_plan_Its_Value_Is_0 = 1 ) )

link text

SQLkiwi 2013-05-24 04:32:38
> What I cannot get my head around though is even though the first join has a top 1 in it it still manages to parallelize into 6 threads in equal chunks.

The Distribute Streams exchange operator between the Top (1) and Nested Loops Join is running in Broadcast mode. The single row is sent (copied) to all parallel threads. Your plan is an example of a very special situation: the inner side of a nested loops join generally runs serially (per thread) but where there is a guarantee that one row arrives at the outer input, and there are no correlated parameters on the loops join, the query processor can broadcast the row and use parallelism on the inner side of the join. The clustered index scan is a proper parallel scan, hence the nice distribution of work.

GokhanVarol 2013-05-24 12:16:45
Is this a risky code, can this all of a sudden for some reason perform very poorly since it may not parallelize etc?
Thank you
SQLkiwi 2013-05-24 21:47:17
Hard to say for sure – the SQL text is incomplete in the analysis file. I just looked at the plan shape to explain how it works.
GokhanVarol 2013-05-24 22:14:04
Not sure why text is getting cut. I ran snd saved from sentry
Kevin 2013-05-24 19:48:43
Very informative, Paul. I was wondering what was happening inside of this query. Thanks!

-Kevin

SQLkiwi 2013-05-24 21:47:34
No worries ๐Ÿ™‚