About sort or hash in a parallel query plan
Especially for hash if hash is built on table a in 8 threads and table b is pulled into this join how can it be made sure that any data coming from any 8 threads of table b is matched against all data in the hash? If all this is done on 8 threads and pushed into a same memory table then lot's of synchronization would be necessary and how would that (if it's the case) affect the plan/execution?
I am asking this I guess I did not fully understand how a parallel hash or sort work.
Is there an article explaining this?
Hash parallel join may also use broadcast partitioning, where rows from the build input are sent to all probe-side threads. This eliminates the need for an exchange on the probe input, but the duplication of rows means more memory will be needed. The optimizer generally selects broadcast parallel hash join where the probe input has a low cardinality.
Batch mode parallelism over a columnstore works a bit differently – a single hash table is built and 'lock-free' (compare and exchange) techniques are used to synchronize access to the shared hash table.
Parallel sorts also generally use hash partitioning. They may also use range partitioning in index building plans, but the idea is the same – DOP sorts get a fraction of the rows to be sorted each, as determined by the partitioning scheme used by the exchange.