I was reading about parallel queries in postgresql 10.
I came across the property max_parallel_workers_per_gather. It sets the maximum number of workers that can be started by a single Gather or Gather Merge node.
I am not able to understand gather and gather merge node.
With parallel query in picture, there are two new types of nodes introduced in postgresql 10 viz, Gather node and other is Gather-Merge node.
Now, if seq-scan has to be performed in parallel, then gather node comes into picture, which you may think of as a master node for the parallel workers which are performing sequential scan. e.g.
-> Parallel Seq Scan on t (cost=0.00..7.60 rows=102 width=8)
Filter: (i < 50)
(4 rows)
Here, there are two workers which are in parallel scanning the table using seq-scan and gather is the process which collects the result of this data and passes it to the node above it, if any.
Now, just like in seq-scan the order of tuples is not maintained so is the case in gather node. However, for the cases wherein you need to maintain the order, gather-merge node comes into picture. For example when index scan is used in parallel then gather-merge will be above it to collect the results of parallel scans respecting the order of indexes.
What is the difference between max_parallel_workers_per_gather and max_parallel_processes. What is meant by gather node.
As you mentioned above, max_parallel_workers_per_gather is the total number of parallel workers allowed for a node. The parameter max_parallel_processes sets the upper limit of total parallel workers running. As a base rule, the value of max_parallel_workers_per_gather should be less than or equal to max_parallel_processes, otherwise the value of max_parallel_processes will be the maximum allowable workers for a parallel node.