cmdv3=# show max_parallel_workers_per_gather; max_parallel_workers_per_gather --------------------------------- 2 (1 row)
It was set to 8.
Il giorno mar 21 gen 2020 alle ore 16:06 Nicola Contu <nicola.contu@gmail.com> ha scritto:
Hey Thomas,
after a few months, we started having this issue again.
So we revert the work_mem parameter to 600MB instead of 2GB.
But the issue is still there. A query went to segmentation fault, the DB went to recovery mode and our app went to read only for a few minutes.
I understand we can increase max_connections so we can have many more segments.
My question is : is there a way to understand the number of segments we reached?
Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have about 500 shared segments.
We would like to increase that number to 300 or 400 but would be great to understand if there is a way to make sure we will solve the issue as it requires a restart of the service.
I know you were also talking about a redesign this part in PostgreSQL. Do you know if anything has changed in any of the newer versions after 11.5?
Thanks a lot,
Nicola
Il giorno gio 12 set 2019 alle ore 01:01 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Wed, Sep 11, 2019 at 11:20 PM Nicola Contu <nicola.contu@gmail.com> wrote: > If the error persist I will try to revert the work_mem. > Thanks a lot
Hi Nicola,
It's hard to say exactly what the cause of the problem is in your case and how to avoid it, without knowing what your query plans look like. PostgreSQL allows 64 + 2 * max_connections segments to exist a time, and it needs a number of them that depends on work_mem (in the case of Parallel Hash Join and Parallel Bitmap Index Scan), and also depends on the number of Gather nodes that appear in the plan, which in some unusual cases can result from partitioning.
I've seen people reaching this error by running a lot of parallel queries concurrently. If that's the cause, then you can definitely get some relief by turning work_mem down, or by turning max_connections up (even though you don't want to allow more connections -- because it influences the formula for deciding on the DSM segment limit). We should probably adjust some of the internal constants to give us more slots, to avoid that problem, as discussed here:
I've also seen people reaching this error by somehow coming up with plans that had a very large number of Gather nodes in them, corresponding to partitions; that's probably a bad plan (it'd presumably be better to terminate parallelism higher up in the plan, but these plans do seem to exist in the wild; I don't recall exactly why). I think we need a bit of a redesign so that if there are multiple Gather nodes, they share the same main DSM segment, instead of blowing through this limit.