Why did SQL Server choose a parallel execution plan for this query?

sam.bishop 2016-10-24 14:48:42

I would really appreciate it if someone could help me understand why SQL Server chose to run this query in parallel instead of serially, and if there is a good way to help it detect that a serial execution would be preferrable.

The SQL has been copied from a stored procedure and simplified slightly. As you can see, a temporary table is created and populated with a list of account IDs. A query is then executed, joining that table to a few persistent tables. The execution plan partitions the work between threads based on account ID, which makes sense. However, in this case (and, in fact, in the common case), the table only contains one account ID. SQL Server reports that it only estimated that there would be one account ID in the table, so why did it choose to run the query in parallel? I thought I read somewhere that SQL Server will sometimes decide to run a cached parallel plan on a single thread, but I don't know the details regarding when and why it decides to do that.

I have included two versions of the query, one with and one without "OPTION (MAXDOP 1)". As you would expect, the serial version runs a little faster.

Vlady Oselsky 2016-10-24 18:59:21
What is the "cost threshold for parallelism" set to on the server? If it is a default of 5 then SQL Server will make a parallel plan for more queries that it should. You should look at making a change from a default, there are plenty of articles describing how to get this tuned. Additionally, you should try to create clustered index on temp table to give SQL Server a chance to get better estimates in row count.
sam.bishop 2016-10-24 19:56:28
Our DBAs have the cost threshold set really high (400), but SQL Server still executes more queries in parallel than we'd like. (The MAXDOP hint is used in a lot of our stored procedures.) That's one of the reasons I've dug so deeply into this query. Perhaps what I learn with this one can be applied to the others.

The temp table has a primary key, and the query plan shows a clustered index scan over it.

SQLkiwi 2016-10-31 10:39:49
The decision to use parallelism or not is taken at a relatively early stage, based on the estimated cost of the best plan found so far at that point in time.

At the end of serial-only search stage 1 optimization, the cost exceeded the threshold to explore a parallel alternative. Search stage 1 is re-run with the requirement to produce a parallel plan. At that point, the best serial and parallel plan costs are compared, and an irreversible decision is made to "go parallel" or not.

Your query has enough tables (5 minimum) to enter search stage 2 of optimizer processing, which is serial-only or parallel-only, depending on the "irreversible decision".

As it turns out, the final (stage 2) serial plan has a lower cost than the final parallel plan, but it is too late by then. Once the optimizer commits to parallelism in search stage 1, no serial alternatives are considered in stage 2.

The table only contains one row at compilation time, but that does not mean it will always be true for future executions that reuse the same cached plan. SQL Server does not directly take that into account though: the decision is made based on cardinality estimates and the cost model as detailed above.

It is unfortunate that SQL Server often builds parallel plans for a single row source, followed by a round robin distribute streams. This gives a plan that executes on a single thread (at least until any later repartitioning, which is not present in your example), but with the extra overhead of the exchange (Parallelism) operators compared to the truly serial alternate.

There's no good way (e.g. constraints) to communicate the single row source in general to the optimizer. You can sometimes use a FAST 1 hint, but if you're going to hint, MAXDOP 1 would often be the better choice.

Certainly the present case is an example where parallelism is not beneficial as the optimizer expected, so a hint (directive) is justified.

The parallelism cost model is not the greatest in many respects, but most often it works in the opposite direction: refusing to parallelize nested loops join plans where this would in fact be beneficial. Your example is somewhat unusual from that point of view.

SQL Server generally only runs a parallel plan on a single thread with exchanges disabled when it encounters resource limitations. See:

Myth: SQL Server Caches a Serial Plan with every Parallel Plan