Table Insert is given 95% of the estimated cost – why?

Dan Holmes 2013-03-08 15:52:20

This query shows 95% of the cost is the insert into the table variable. Hidden in the query in the left most 'Compute Scalar' node are 3 scalar UDFs. Since the optimizer can't see inside those are they just costed at 0?

alt text

This query also takes 30+ seconds but that duration is only on one server. When i pull the database locally to my laptop the query runs in 2. The difference is the scalar UDF's contents operate in parallel on the production box but serially locally. What could account for that? I have more detail about that on my blog. I will furnish any other information that would help.

http://dnhlmssql.blogspot.com/2013/03/when-query-plan-lies-sorta.html

thanks
danny

Aaron Bertrand 2013-03-08 15:55:22
Paul has some great info on reasons why a plan might not go parallel:

http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx

If the parallel plan is worse than a serial plan, it is often because statistics are out of date. But you can override using MAXDOP.

Dan Holmes 2013-03-08 16:44:00
That post doesn't talk about environmental factors. If the db is the same between servers and plan is serial on one but parallel on the other, what runtime factors contribute to the difference?
GregGonzalez 2013-03-08 21:59:01
Danny, can you upload the actual plans generated from Plan Explorer, on both the fast and slow machines? Not sure if you own Plan Explorer PRO, but it can provide more insight in the UDFs. If not, you can evaluate it for 15 days if you haven't already by clicking the PRO toolbar button.
SQLkiwi 2013-03-08 23:03:29
>This query shows 95% of the cost is the insert into the table variable. Hidden in the query in the left most 'Compute Scalar' node are 3 scalar UDFs. Since the optimizer can't see inside those are they just costed at 0?

Not quite zero, but pretty close. Conor Cunningham explains the reasons in his post SQL Server Scalar Operator Costing aka “umm, what costing?”

This query also takes 30+ seconds but that duration is only on one server. When I pull the database locally to my laptop the query runs in 2. The difference is the scalar UDF's contents operate in parallel on the production box but serially locally. What could account for that?

Any and all of the usual reasons, including those listed in my post Forcing a Parallel Execution Plan. The choice between a parallel and serial plan can be quite close, in estimated cost terms. Small differences in statistics (and the samples they were based on) can affect estimated costs, and hence the decision to use parallelism or not. Other factors that influence costing include the number of logical processors available to the SQL Server instance at the time the plan was compiled, the configured cost threshold, and even the maximum size of the buffer pool (various elements of costing are influenced by this). The list goes on.

Also, scalar T-SQL UDFs cache plans and take parameters, so parameter-sensitivity can be an issue too. Consider the following function in Adventure Works:

CREATE FUNCTION dbo.DAUDF (@TID integer)
RETURNS bigint
WITH SCHEMABINDING
AS
BEGIN
    RETURN
    (
        SELECT COUNT_BIG(*) 
        FROM Production.TransactionHistory AS th
        WHERE th.TransactionID > @TID
    );
END

With a zero cost threshold for parallelism, the following invocation uses parallelism inside the UDF:

SELECT dbo.DAUDF(0);

The cached plan uses parallelism. The following invocation returns no rows but reuses the cached plan and again uses parallelism:

SELECT dbo.DAUDF(1000000);

If you clear the plan cache and run the statements in reverse order, a serial plan is cached and reused.

Scalar T-SQL functions are just horrible things, I hope you have a super-stellar excuse for using them 🙂

Definitely check out the PRO version as Greg suggests.

Dan Holmes 2013-03-09 00:30:59
I don't have Plan Explorer Pro but i saved the plans as sqlplan from the production 30 second and my laptops 2 second. I then compared them as text. The only difference is the build number at the top.
Build="10.50.4270.0" for the production system
and
Build="10.50.2500.0" for my laptop

The other difference was the compile time node

only the time values are different.

Since the parallelism is in the UDF i didn't expect the plan to provide a difference.

To make a case to my boss, what would plan explorer pro give in the output that SSMS doesn't nor the free version of plan explore provide?

thx

Aaron Bertrand 2013-03-09 01:00:33
Hi Dan,

Compared to SSMS, the free version of Plan Explorer has a lot of powerful visualizations, exposes more information directly, and highlights key performance aspects of a plan – from color-coding the most expensive nodes on the graphical plan to highlighting key lookups in the grid views and making it clear which column(s) required the lookup – making it easier to assess any potential index changes. I could write a dissertation about this here, but see this albeit dated article:

http://sqlblog.com/blogs/aaron_bertrand/archive/2011/09/13/a-demo-kit-for-sql-sentry-plan-explorer.aspx

Aaron Bertrand 2013-03-09 01:03:43
And as for the features of PRO vs. free, you have the ability to capture actual wait stats for your session, capture the full call stack (so, for example, you could see all the invocations of your UDFs), open deadlock files using our enhanced representation, and the latest builds have a new custom layout feature which Greg just blogged about:

http://greg.blogs.sqlsentry.net/2013/03/plan-explorer-pro-query-plans-your-way.html