Thread: how does PostgreSQL determine how many parallel processes to start
Hi all, I know that parallel processes can be limited by max_parallel_workers_per_gather and max_parallel_workers, as well as the condition to consider a parallel plan is min_table_scan_size (and index). But I would like to understand, once a table has been considered for a parallel plan, and there is room for other workers, how will PostgreSQL decide to start another process? Thanks, Luca
On Fri, 2021-02-19 at 10:38 +0100, Luca Ferrari wrote: > I know that parallel processes can be limited by > max_parallel_workers_per_gather and max_parallel_workers, as well as > the condition to consider a parallel plan is min_table_scan_size (and > index). But I would like to understand, once a table has been > considered for a parallel plan, and there is room for other workers, > how will PostgreSQL decide to start another process? During planning, it will generate parallel and non-parallel plans and take the one it estimates to be cheapest. At execution time, PostgreSQL will use as many of the planned workers as are currently available (max_parallel_workers). Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Fri, Feb 19, 2021 at 10:43 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > At execution time, PostgreSQL will use as many of the planned workers > as are currently available (max_parallel_workers). Thanks, but just to make it clear, assuming I execute almost simultanously two identical queries that can be therefore be parallelized, does that mean that the first one will be executed with the max available parallele capacity and the second will "starve" on parllelism being executed sequentially. Is this correct? As a consequence to that, this also could mean that a query over a small table could take more advanatge (in parallel sense) than a scan on a larger table that was issued just a moment after (assuming both table can be scanned in parallel), right? Luca
On Fri, 2021-02-19 at 11:21 +0100, Luca Ferrari wrote: > > At execution time, PostgreSQL will use as many of the planned workers > > as are currently available (max_parallel_workers). > > Thanks, but just to make it clear, assuming I execute almost > simultanously two identical queries that can be therefore be > parallelized, does that mean that the first one will be executed with > the max available parallele capacity and the second will "starve" on > parllelism being executed sequentially. Is this correct? > As a consequence to that, this also could mean that a query over a > small table could take more advanatge (in parallel sense) than a scan > on a larger table that was issued just a moment after (assuming both > table can be scanned in parallel), right? Precisely. That is why you have "max_parallel_workers_per_gather" to limit the number of parallel workers available to a single query. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com