Re: Non-reserved replication slots and slot advancing - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Non-reserved replication slots and slot advancing
Date
Msg-id 20180705231608.GA2366@paquier.xyz
Whole thread Raw
In response to Re: Non-reserved replication slots and slot advancing  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Non-reserved replication slots and slot advancing  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Thu, Jul 05, 2018 at 12:35:29PM -0400, Alvaro Herrera wrote:
> Do we use the term "reserved" anywhere else?  It just doesn't click for
> me.  Other than "All rights reserved", that is ...

This is used in a couple of places in the docs:
$ git grep -i slot | grep reserved
src/sgml/catalogs.sgml:
if the <literal>LSN</literal> of this slot has never been reserved.
src/sgml/func.sgml:
replication slot be reserved immediately; otherwise
src/sgml/protocol.sgml:
Drops a replication slot, freeing any reserved server-side resources

> As for the patch itself: why is the problem that the slot is "not
> reserved" in the first place?  I think what we should be actually
> checking is that the target LSN is within valid limits, ie. the end
> state of the slot after the operation, rather than the initial state of
> the slot before the operation.
>
> If we made this code check the end state, we could make the error
> message be something like "target LSN is not within the allocated range"
> or something like that.

The code relies on the LSN used within the slot to define if it can be
advanced or not.

/*
 * Check if the slot is not moving backwards.  Physical slots rely simply
 * on restart_lsn as a minimum point, while logical slots have confirmed
 * consumption up to confirmed_lsn, meaning that in both cases data older
 * than that is not available anymore.
 */
if (OidIsValid(MyReplicationSlot->data.database))
    minlsn =  MyReplicationSlot->data.confirmed_flush;
else
    minlsn = MyReplicationSlot->data.restart_lsn;

And then is issues an error only if the LSN to move to is lower than the
minimum found.
if (moveto < minlsn)
{
    ReplicationSlotRelease();
    ereport(ERROR,
            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
             errmsg("cannot move slot to %X/%X, minimum is %X/%X",
                    (uint32) (moveto >> 32), (uint32) moveto,
                    (uint32) (minlsn >> 32), (uint32) minlsn)));
}

For a non-reserved slot, there is nothing smaller than 0, so this error
would not trigger, and even if we enforce it to trigger with minlsn ==
InvalidXLogRecPtr then it would make no sense.  So the root of the
problem is that there is no lower-bound which can be used as a
comparison base.  Well, logically that would be the LSN of the oldest
segment still present in pg_wal, but if you begin to authorize that then
you have race conditions with checkpoints running in parallel which
could recycle the requested position.  So it seems to me that ERRORing
when there is no way to define correctly the bounds where the move can
be done is the safest solution.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: documentation fixes for partition pruning, round three
Next
From: Thomas Munro
Date:
Subject: Re: Cache invalidation after authentication (on-the-fly role creation)