Why did SQL Server choose a parallel execution plan for this query?
The SQL has been copied from a stored procedure and simplified slightly. As you can see, a temporary table is created and populated with a list of account IDs. A query is then executed, joining that table to a few persistent tables. The execution plan partitions the work between threads based on account ID, which makes sense. However, in this case (and, in fact, in the common case), the table only contains one account ID. SQL Server reports that it only estimated that there would be one account ID in the table, so why did it choose to run the query in parallel? I thought I read somewhere that SQL Server will sometimes decide to run a cached parallel plan on a single thread, but I don't know the details regarding when and why it decides to do that.
I have included two versions of the query, one with and one without "OPTION (MAXDOP 1)". As you would expect, the serial version runs a little faster.
At the end of serial-only search stage 1 optimization, the cost exceeded the threshold to explore a parallel alternative. Search stage 1 is re-run with the requirement to produce a parallel plan. At that point, the best serial and parallel plan costs are compared, and an irreversible decision is made to "go parallel" or not.
Your query has enough tables (5 minimum) to enter search stage 2 of optimizer processing, which is serial-only or parallel-only, depending on the "irreversible decision".
As it turns out, the final (stage 2) serial plan has a lower cost than the final parallel plan, but it is too late by then. Once the optimizer commits to parallelism in search stage 1, no serial alternatives are considered in stage 2.
The table only contains one row at compilation time, but that does not mean it will always be true for future executions that reuse the same cached plan. SQL Server does not directly take that into account though: the decision is made based on cardinality estimates and the cost model as detailed above.
It is unfortunate that SQL Server often builds parallel plans for a single row source, followed by a round robin distribute streams. This gives a plan that executes on a single thread (at least until any later repartitioning, which is not present in your example), but with the extra overhead of the exchange (Parallelism) operators compared to the truly serial alternate.
There's no good way (e.g. constraints) to communicate the single row source in general to the optimizer. You can sometimes use a FAST 1 hint, but if you're going to hint, MAXDOP 1 would often be the better choice.
Certainly the present case is an example where parallelism is not beneficial as the optimizer expected, so a hint (directive) is justified.
The parallelism cost model is not the greatest in many respects, but most often it works in the opposite direction: refusing to parallelize nested loops join plans where this would in fact be beneficial. Your example is somewhat unusual from that point of view.
SQL Server generally only runs a parallel plan on a single thread with exchanges disabled when it encounters resource limitations. See: