Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [HACKERS] Restricting maximum keep segments by repslots
Date
Msg-id 20200428162941.GA6196@alvherre.pgsql
Whole thread Raw
In response to Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: [HACKERS] Restricting maximum keep segments by repslots  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On 2020-Apr-28, Kyotaro Horiguchi wrote:

> At Mon, 27 Apr 2020 18:33:42 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> > On 2020-Apr-08, Kyotaro Horiguchi wrote:
> > 
> > > At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 

> > Thanks for the fix!  I propose two changes:
> > 
> > 1. reword the error like this:
> > 
> > ERROR:  replication slot "regression_slot3" cannot be advanced
> > DETAIL:  This slot has never previously reserved WAL, or has been invalidated
> 
> Agreed to describe what is failed rather than the cause.  However,
> logical replications slots are always "previously reserved" at
> creation.

Bah, of course.  I was thinking in making the equivalent messages all
identical in all callsites, but maybe they should be different when
slots are logical.  I'll go over them again.

> > 2. use the same error in one other place, to wit
> >    pg_logical_slot_get_changes() and pg_replication_slot_advance().  I
> >    made the DETAIL part the same in all places, but the ERROR line is
> >    adjusted to what each callsite is doing.
> >    I do think that this change in test_decoding is a bit unpleasant:
> > 
> > -ERROR:  cannot use physical replication slot for logical decoding
> > +ERROR:  cannot get changes from replication slot "repl"
> > 
> >    The test is
> >       -- check that we're detecting a streaming rep slot used for logical decoding
> >       SELECT 'init' FROM pg_create_physical_replication_slot('repl');
> >       SELECT data FROM pg_logical_slot_get_changes('repl', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts',
'1');
> 
> The message may be understood as "No change has been made since
> restart_lsn". Does something like the following work?
> 
> ERROR:  replication slot "repl" is not usable to get changes

That wording seems okay, but my specific point for this error message is
that we were trying to use a physical slot to get logical changes; so
the fact that the slot has been invalidated is secondary and we should
complain about the *type* of slot rather than the restart_lsn.


> By the way there are some other messages that doesn't render the
> symptom but the cause.
> 
> "cannot use physical replication slot for logical decoding"
> "replication slot \"%s\" was not created in this database"
> 
> Don't they need the same amendment?

Maybe, but I don't want to start rewording every single message in uses
of replication slots ... I prefer to only modify the ones related to the
problem at hand.

> > > > On the other hand, physical replication doesn't break by invlidation.
> > > > [...]

> > Anyway I think the current patch can be applied as is -- and if we want
> > physical replication to have some other behavior, we can patch for that
> > afterwards.
> 
> Agreed here. The false-invalidation doesn't lead to any serious
> consequences.

But does it?  What happens, for example, if we have a slot used to get a
pg_basebackup, then time passes before starting to stream from it and is
invalidated?  I think this "works fine" (meaning that once we try to
stream from the slot to replay at the restored base backup, we will
raise an error immediately), but I haven't tried.

The worst situation would be producing a corrupt replica.  I don't think
this is possible.

The ideal behavior I think would be that pg_basebackup aborts
immediately when the slot is invalidated, to avoid wasting more time
producing a doomed backup.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Fixes for two separate bugs in nbtree VACUUM's page deletion
Next
From: Emre Hasegeli
Date:
Subject: Re: Bogus documentation for bogus geometric operators