repartition streams operator is showing 17% cost in the query plan

PARAG 2016-06-01 15:21:46

I am having a temp table with a clustered index scan operator which is being passed to a hash match join operator

But before being passed to hash match join operator, there is a Parallelism (repartition streams) operator which is consuming 17% cost in the execution plan.

How can I reduce this processing cost in the plan ?


Aaron Bertrand 2016-06-01 15:26:22
One quick thing to try (no guarantee if it will make the query faster) is to eliminate parallelism using OPTION (MAXDOP 1).
PARAG 2016-06-01 18:02:10
we don't want to disable parallel execution completely for the query by including MAXDOP 1 hint

Is there any other way, I can get rid off repartition streams from the query plan ?

Does it have something to do with lack of indexes or how the table has been defined ?

SQLkiwi 2016-06-02 05:49:27
The 17% cost is an estimate, based on the estimated number of rows and their distribution of values, plugged into the optimizer's cost model. This calculation is used to choose between plan alternatives, but the numbers are actually quite unlikely to reflect the actual time spent on that operation on your particular hardware. A post-execution (actual) plan was not provided, so it is hard to say what the impact of that particular operator is.

Note that there is a bitmap filter applied to the Table Scan. This (if effective) may well result in many fewer rows encountering the exchange (Repartition Streams Parallelism operator) than expected. This is particularly true since the Bitmap is a type applied heuristically after optimization is completed – the optimizer knows nothing about it.

In general, exchanges are required on both inputs to a hash or merge join, to distribute rows that will join to the same thread. The exception, for hash join only, is where the complete hash table is copied to all threads (build side partitioning mode Broadcast). Since this consumes more memory, a broadcast hash join is only chosen if the number of build side rows is expected to be small (a few hundred rows or so).

Your other option to avoid exchanges is to persuade the optimizer to choose a nested loops join instead of hash or merge. This would require an index on #tmpInventoryCache (and possibly a hint, because the optimizer tends to over-estimate the cost of parallel nested loops join).

A suitable index would be keyed on InaPrSKU, OptionList and include [Date], InaBlockAddToCart, InaIsBackOrdered, InaQty, InaRgID, Suid, Inventory, WarehouseIncludedInInvQuantityRollup.

If this represents most of the columns of the table, you might choose to make InaPrSKU, OptionList the clustered index on that temporary table, unique if that applies. That might be the best option in any case, but it's hard to be sure about that from here.