Thread: pgsql: walreceiver uses a temporary replication slot by default

pgsql: walreceiver uses a temporary replication slot by default

From
Peter Eisentraut
Date:
walreceiver uses a temporary replication slot by default

If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot.  A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.

Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion:
https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/329730827848f61eb8d353d5addcbd885fa823da

Modified Files
--------------
doc/src/sgml/config.sgml                           | 20 +++++++++++
.../libpqwalreceiver/libpqwalreceiver.c            |  4 +++
src/backend/replication/walreceiver.c              | 41 ++++++++++++++++++++++
src/backend/utils/misc/guc.c                       |  9 +++++
src/backend/utils/misc/postgresql.conf.sample      |  1 +
src/include/replication/walreceiver.h              |  7 ++++
6 files changed, 82 insertions(+)


Re: pgsql: walreceiver uses a temporary replication slot by default

From
Robert Haas
Date:
On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> walreceiver uses a temporary replication slot by default
>
> If no permanent replication slot is configured using
> primary_slot_name, the walreceiver now creates and uses a temporary
> replication slot.  A new setting wal_receiver_create_temp_slot can be
> used to disable this behavior, for example, if the remote instance is
> out of replication slots.
>
> Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> Discussion:
https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com

Neither the commit message for this patch nor any of the comments in
the patch seem to explain why this is a desirable change.

I assume that's probably discussed on the thread that is linked here,
but you shouldn't have to dig through the discussion thread to figure
out what the benefits of a change like this are.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: walreceiver uses a temporary replication slot by default

From
Peter Eisentraut
Date:
On 2020-01-23 21:49, Robert Haas wrote:
> On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>> walreceiver uses a temporary replication slot by default
>>
>> If no permanent replication slot is configured using
>> primary_slot_name, the walreceiver now creates and uses a temporary
>> replication slot.  A new setting wal_receiver_create_temp_slot can be
>> used to disable this behavior, for example, if the remote instance is
>> out of replication slots.
>>
>> Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
>> Discussion:
https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com
> 
> Neither the commit message for this patch nor any of the comments in
> the patch seem to explain why this is a desirable change.
> 
> I assume that's probably discussed on the thread that is linked here,
> but you shouldn't have to dig through the discussion thread to figure
> out what the benefits of a change like this are.

You are right, this has gotten a bit lost in the big thread.

The rationale is basically the same as why client-side tools like 
pg_basebackup use a temporary slot: So that the WAL data that they are 
interested in doesn't disappear while they are connected.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: walreceiver uses a temporary replication slot by default

From
Jehan-Guillaume de Rorthais
Date:
Hello,

On Mon, 10 Feb 2020 16:37:53 +0100
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2020-01-23 21:49, Robert Haas wrote:
> > On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org>
> > wrote:  
> >> walreceiver uses a temporary replication slot by default
> >>
> >> If no permanent replication slot is configured using
> >> primary_slot_name, the walreceiver now creates and uses a temporary
> >> replication slot.  A new setting wal_receiver_create_temp_slot can be
> >> used to disable this behavior, for example, if the remote instance is
> >> out of replication slots.
> >>
> >> Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> >> Discussion:
> >> https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com  
> > 
> > Neither the commit message for this patch nor any of the comments in
> > the patch seem to explain why this is a desirable change.
> > 
> > I assume that's probably discussed on the thread that is linked here,
> > but you shouldn't have to dig through the discussion thread to figure
> > out what the benefits of a change like this are.  
> 
> You are right, this has gotten a bit lost in the big thread.
> 
> The rationale is basically the same as why client-side tools like 
> pg_basebackup use a temporary slot: So that the WAL data that they are 
> interested in doesn't disappear while they are connected.

In my humble opinion, I prefer the previous behavior, streaming without
temporary slot, for one reason: primary availability. 

Should the standby lag far behind the primary (no matter the root cause),
the standby was disconnected because of missing WAL. Worst case scenario, we
must rebuild it, hopefully from backups. Best case scenario, it fetches WALs
from PITR backup. As soon as the later is possible in the stack, I consider slot
like a burden from the operability point of view. If standbys can not fetch
archived WAL from PITR, then we can consider slots.

With temp slot created by default, if one standby lag far behind, it can make
the primary unavailable. We have nothing yet to forbid a slot to fill the
pg_wal partition. How new users creating their first cluster would react in such
situation? I suppose the original discussion was mostly targeting them?
Recovering from this is way more scary than building a standby.

So the default behavior might not be desirable and maybe
wal_receiver_create_temp_slot might be off by default?

Note that Kyotaro HORIGUCHI is working on a patch to restricting maximum keep
segments by repslots:

https://www.postgresql.org/message-id/flat/20190627162256.4f4872b8%40firost#6cba1177f766e7ffa5237789e748da38

Regards,



Re: pgsql: walreceiver uses a temporary replication slot by default

From
Fujii Masao
Date:

On 2020/02/12 7:53, Jehan-Guillaume de Rorthais wrote:
> Hello,
> 
> On Mon, 10 Feb 2020 16:37:53 +0100
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> 
>> On 2020-01-23 21:49, Robert Haas wrote:
>>> On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org>
>>> wrote:
>>>> walreceiver uses a temporary replication slot by default
>>>>
>>>> If no permanent replication slot is configured using
>>>> primary_slot_name, the walreceiver now creates and uses a temporary
>>>> replication slot.  A new setting wal_receiver_create_temp_slot can be
>>>> used to disable this behavior, for example, if the remote instance is
>>>> out of replication slots.
>>>>
>>>> Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
>>>> Discussion:
>>>> https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com
>>>
>>> Neither the commit message for this patch nor any of the comments in
>>> the patch seem to explain why this is a desirable change.
>>>
>>> I assume that's probably discussed on the thread that is linked here,
>>> but you shouldn't have to dig through the discussion thread to figure
>>> out what the benefits of a change like this are.
>>
>> You are right, this has gotten a bit lost in the big thread.
>>
>> The rationale is basically the same as why client-side tools like
>> pg_basebackup use a temporary slot: So that the WAL data that they are
>> interested in doesn't disappear while they are connected.
> 
> In my humble opinion, I prefer the previous behavior, streaming without
> temporary slot, for one reason: primary availability.

+1
  
> Should the standby lag far behind the primary (no matter the root cause),
> the standby was disconnected because of missing WAL. Worst case scenario, we
> must rebuild it, hopefully from backups. Best case scenario, it fetches WALs
> from PITR backup. As soon as the later is possible in the stack, I consider slot
> like a burden from the operability point of view. If standbys can not fetch
> archived WAL from PITR, then we can consider slots.
> 
> With temp slot created by default, if one standby lag far behind, it can make
> the primary unavailable. We have nothing yet to forbid a slot to fill the
> pg_wal partition. How new users creating their first cluster would react in such
> situation? I suppose the original discussion was mostly targeting them?
> Recovering from this is way more scary than building a standby.
> 
> So the default behavior might not be desirable and maybe
> wal_receiver_create_temp_slot might be off by default?
> 
> Note that Kyotaro HORIGUCHI is working on a patch to restricting maximum keep
> segments by repslots:
> 
> https://www.postgresql.org/message-id/flat/20190627162256.4f4872b8%40firost#6cba1177f766e7ffa5237789e748da38

Yeah, I think it's better to disable this option until something like
Horiguchi-san's proposal will have been committed, i.e., until
the upper limit on the number (or size) of WAL files that remain
for slots become configurable.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pgsql: walreceiver uses a temporary replication slot by default

From
Michael Paquier
Date:
On Wed, Feb 12, 2020 at 06:11:06PM +0900, Fujii Masao wrote:
> On 2020/02/12 7:53, Jehan-Guillaume de Rorthais wrote:
>> In my humble opinion, I prefer the previous behavior, streaming without
>> temporary slot, for one reason: primary availability.
>
> +1
>
>> With temp slot created by default, if one standby lag far behind, it can make
>> the primary unavailable. We have nothing yet to forbid a slot to fill the
>> pg_wal partition. How new users creating their first cluster would react in such
>> situation? I suppose the original discussion was mostly targeting them?
>> Recovering from this is way more scary than building a standby.
>>
>> So the default behavior might not be desirable and maybe
>> wal_receiver_create_temp_slot might be off by default?
>>
>> Note that Kyotaro HORIGUCHI is working on a patch to restricting maximum keep
>> segments by repslots:
>>
>> https://www.postgresql.org/message-id/flat/20190627162256.4f4872b8%40firost#6cba1177f766e7ffa5237789e748da38
>
> Yeah, I think it's better to disable this option until something like
> Horiguchi-san's proposal will have been committed, i.e., until
> the upper limit on the number (or size) of WAL files that remain
> for slots become configurable.

Even with that, are we sure this extra feature would be a reason
sufficient to change the default value of this option to be enabled?
I am not sure about that either.  My opinion is that this option is
useful to have and that it is not really a problem if you have slot
monitoring on the primary (or a standby for cascading).  And I'd like
to believe that it is a common practice lately for base backups,
archivers based on pg_receivewal or even logical decoding, but it
could be surprising for some users who do not do that yet.  So
Jehan-Guillaume's arguments sound also sensible to me (he also
maintains an automatic failover solution called PAF).

From what I can see nobody really likes the current state of things
for this option, and that does not come down only to its default
value.  The default GUC value and the way the parameter is loaded by
the WAL sender are problematic, still easy enough to fix.  How do we
move on from here?  I could post a patch based on what Sergei Kornilov
has sent around [1], but that's Peter's feature.  Any opinions?

[1]: https://www.postgresql.org/message-id/20200122055510.GH174860@paquier.xyz
--
Michael

Attachment

Re: pgsql: walreceiver uses a temporary replication slot by default

From
Kyotaro Horiguchi
Date:
At Thu, 13 Feb 2020 16:48:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Wed, Feb 12, 2020 at 06:11:06PM +0900, Fujii Masao wrote:
> > On 2020/02/12 7:53, Jehan-Guillaume de Rorthais wrote:
> >> In my humble opinion, I prefer the previous behavior, streaming without
> >> temporary slot, for one reason: primary availability.
> > 
> > +1
> >
> >> With temp slot created by default, if one standby lag far behind, it can make
> >> the primary unavailable. We have nothing yet to forbid a slot to fill the
> >> pg_wal partition. How new users creating their first cluster would react in such
> >> situation? I suppose the original discussion was mostly targeting them?
> >> Recovering from this is way more scary than building a standby.
> >> 
> >> So the default behavior might not be desirable and maybe
> >> wal_receiver_create_temp_slot might be off by default?
> >> 
> >> Note that Kyotaro HORIGUCHI is working on a patch to restricting maximum keep
> >> segments by repslots:
> >> 
> >> https://www.postgresql.org/message-id/flat/20190627162256.4f4872b8%40firost#6cba1177f766e7ffa5237789e748da38
> > 
> > Yeah, I think it's better to disable this option until something like
> > Horiguchi-san's proposal will have been committed, i.e., until
> > the upper limit on the number (or size) of WAL files that remain
> > for slots become configurable.
> 
> Even with that, are we sure this extra feature would be a reason
> sufficient to change the default value of this option to be enabled?

I think the feature (slot limit) is not going to be an reason to
enable it (tmp slot). In the first place I think we cannot determine
the default value generally workable..

> I am not sure about that either.  My opinion is that this option is
> useful to have and that it is not really a problem if you have slot
> monitoring on the primary (or a standby for cascading).  And I'd like
> to believe that it is a common practice lately for base backups,
> archivers based on pg_receivewal or even logical decoding, but it
> could be surprising for some users who do not do that yet.  So
> Jehan-Guillaume's arguments sound also sensible to me (he also
> maintains an automatic failover solution called PAF). 
> 
> From what I can see nobody really likes the current state of things
> for this option, and that does not come down only to its default
> value.  The default GUC value and the way the parameter is loaded by
> the WAL sender are problematic, still easy enough to fix.  How do we
> move on from here?  I could post a patch based on what Sergei Kornilov
> has sent around [1], but that's Peter's feature.  Any opinions?
> 
> [1]: https://www.postgresql.org/message-id/20200122055510.GH174860@paquier.xyz

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center