Re: replication_slots usability issue - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: replication_slots usability issue
Date
Msg-id 20181031021023.GA7862@paquier.xyz
Whole thread Raw
In response to Re: replication_slots usability issue  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Oct 30, 2018 at 10:52:54AM -0700, Andres Freund wrote:
> On 2018-10-30 11:51:09 +0900, Michael Paquier wrote:
>> Er...  At the same time, shouldn't RestoreSlotFromDisk() *not* use PANIC
>> if more slots are found in pg_replslot than max_replication_slots can
>> handle.  A FATAL is fine at startup, PANIC blows up a core file, which
>> is clearly overdoing it if the goal is to give a recommendation at the
>> end.
>
> I can't get particularly excited about this.

This gets only used by the startup process to restore slot information,
so for example if you have an instance with two slots, shut it down,
reduce max_replication_slots to 1, and restart then you generate a core
dump, which is a bit overdoing it to simply let the user know that he
just needs to bump up one parameter and to give him a recommendation to
do so :)

For example:
2018-10-31 11:04:08 JST [9014]: [11-1] db=,user=,app=,client= DEBUG:
starting up replication slots
2018-10-31 11:04:08 JST [9014]: [12-1] db=,user=,app=,client= DEBUG:
restoring replication slot from "pg_replslot/popo3/state"
2018-10-31 11:04:08 JST [9014]: [13-1] db=,user=,app=,client= DEBUG:
restoring replication slot from "pg_replslot/popo2/state"
2018-10-31 11:04:08 JST [9014]: [14-1] db=,user=,app=,client= PANIC:
too many replication slots active before shutdown
2018-10-31 11:04:08 JST [9014]: [15-1] db=,user=,app=,client= HINT:
Increase max_replication_slots and try again.

> I guess we can change it, but I'd only do so in master.

I won't fight hard for a back-patch, now I think that we should
back-patch a fix.  Generating a core file if a state is unexpected is
fine because you expect the system to react so, being able to trigger
one based on a parameter switch just to give a recommendation is not in
my opinion.

For example with the attached the system reacts as follows:
2018-10-31 11:07:23 JST [10063]: [14-1] db=,user=,app=,client= FATAL:
too many replication slots active before shutdown
2018-10-31 11:07:23 JST [10063]: [15-1] db=,user=,app=,client= HINT:
Increase max_replication_slots and try again.
2018-10-31 11:07:23 JST [10061]: [6-1] db=,user=,app=,client= LOG:
startup process (PID 10063) exited with exit code 1
2018-10-31 11:07:23 JST [10061]: [7-1] db=,user=,app=,client= LOG:
aborting startup due to startup process failure
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Should pg 11 use a lot more memory building an spgist index?
Next
From: Amit Langote
Date:
Subject: Re: ToDo: show size of partitioned table