Re: increased max_parallel_workers_per_gather results in fewerworkers? - Mailing list pgsql-performance

From Sebastian Dressler
Subject Re: increased max_parallel_workers_per_gather results in fewerworkers?
Date
Msg-id 5FFAB96B-056C-423F-A65A-81BA6C09C7EB@swarm64.com
Whole thread Raw
In response to Re: increased max_parallel_workers_per_gather results in fewerworkers?  (Philip Semanchuk <philip@americanefficient.com>)
Responses Re: increased max_parallel_workers_per_gather results in fewerworkers?  (Philip Semanchuk <philip@americanefficient.com>)
List pgsql-performance
Hi Philip,

On 4. Jun 2020, at 18:41, Philip Semanchuk <philip@americanefficient.com> wrote:
[...]

Also, there are more configuration settings related to parallel queries you might want to look into. Most notably:

parallel_setup_cost
parallel_tuple_cost
min_parallel_table_scan_size

Especially the last one is a typical dealbreaker, you can try to set it to 0 for the beginning. Good starters for the others are 500 and 0.1 respectively.

Aha! By setting min_parallel_table_scan_size=0, Postgres uses the 6 workers I expect, and the execution time decreases nicely. 

I posted a clumsily-anonymized plan for the “bad” scenario here --
https://gist.github.com/osvenskan/ea00aa71abaa9697ade0ab7c1f3b705b

There are 3 sort nodes in the plan. When I get the “bad” behavior, the sorts have one worker, when I get the good behavior, they have multiple workers (e.g. 6).

I also think, what Luis pointed out earlier might be a good option for you, i.e. setting

    parallel_leader_participation = off;

And by the way, this 1 worker turns actually into 2 workers in total with leader participation enabled.

This brings up a couple of questions —
1) I’ve read that this is Postgres’ formula for the max # of workers it will consider for a table —

  max_workers = log3(table size / min_parallel_table_scan_size)

Does that use the raw table size, or does the planner use statistics to estimate the size of the subset of the table that will be read before allocating workers?

"table size" is the number of PSQL pages, i.e. relation-size / 8 kB. This comes from statistics.

2) There are 9 tables in this query ranging in size from 72Kb to 17Gb. Does Postgres decide on a table-by-table basis to allocate multiple workers, or…?

AFAIK, it will do it per-table initially but then the final result depends on the chosen gather node.

Thank you so much for the suggestion, I feel un-stuck now that I have an idea to experiment with.

You are welcome, we are actually about to publish a blog post which has some more suggestions for parallelism in.

Cheers,
Sebastian

pgsql-performance by date:

Previous
From: Philip Semanchuk
Date:
Subject: Re: increased max_parallel_workers_per_gather results in fewerworkers?
Next
From: Philip Semanchuk
Date:
Subject: Re: increased max_parallel_workers_per_gather results in fewerworkers?