Home > mailing lists

Re: Physical replication slot advance is not persistent - Mailing list pgsql-hackers

From	Alexey Kondratov
Subject	Re: Physical replication slot advance is not persistent
Date	December 25, 2019 13:51:57
Msg-id	ada09c82-df58-8204-9296-283514adeb6e@postgrespro.ru Whole thread Raw
In response to	Re: Physical replication slot advance is not persistent (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses	Re: Physical replication slot advance is not persistent
List	pgsql-hackers

Tree view

On 25.12.2019 07:03, Kyotaro Horiguchi wrote:
> At Tue, 24 Dec 2019 20:12:32 +0300, Alexey Kondratov <a.kondratov@postgrespro.ru> wrote in
>> I dig into the code and it happens because of this if statement:
>>
>>      /* Update the on disk state when lsn was updated. */
>>      if (XLogRecPtrIsInvalid(endlsn))
>>      {
>>          ReplicationSlotMarkDirty();
>>          ReplicationSlotsComputeRequiredXmin(false);
>>          ReplicationSlotsComputeRequiredLSN();
>>          ReplicationSlotSave();
>>      }
> Yes, it seems just broken.
>
>> Attached is a small patch, which fixes this bug. I have tried to
>> stick to the same logic in this 'if (XLogRecPtrIsInvalid(endlsn))'
>> and now pg_logical_replication_slot_advance and
>> pg_physical_replication_slot_advance return InvalidXLogRecPtr if
>> no-op.
>>
>> What do you think?
> I think we shoudn't change the definition of
> pg_*_replication_slot_advance since the result is user-facing.

Yes, that was my main concern too. OK.

> The functions return a invalid value only when the slot had the
> invalid value and failed to move the position. I think that happens
> only for uninitalized slots.
>
> Anyway what we should do there is dirtying the slot when the operation
> can be assumed to have been succeeded.
>
> As the result I think what is needed there is just checking if the
> returned lsn is equal or larger than moveto. Doen't the following
> change work?
>
> -    if (XLogRecPtrIsInvalid(endlsn))
> +    if (moveto <= endlsn)

Yep, it helps with physical replication slot persistence after advance, 
but the whole validation (moveto <= endlsn) does not make sense for me. 
The value of moveto should be >= than minlsn == confirmed_flush / 
restart_lsn, while endlsn == retlsn is also always initialized with 
confirmed_flush / restart_lsn. Thus, your condition seems to be true in 
any case, even if it was no-op one, which we were intended to catch.

Actually, if we do not want to change pg_*_replication_slot_advance, we 
can just add straightforward validation that either confirmed_flush, or 
restart_lsn changed after slot advance guts execution. It will be a 
little bit bulky, but much more clear and will never be affected by 
pg_*_replication_slot_advance logic change.

Another weird part I have found is this assignment inside 
pg_logical_replication_slot_advance:

/* Initialize our return value in case we don't do anything */
retlsn = MyReplicationSlot->data.confirmed_flush;

It looks redundant, since later we do the same assignment, which should 
be reachable in any case.

I will recheck everything again and try to come up with something during 
this week.

Regards

-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

pgsql-hackers by date:

From: Abbas Butt
Date: 25 December 2019, 13:02:14
Subject: How to test GSSAPI based encryption support

From: Julien Rouhaud
Date: 25 December 2019, 14:12:42
Subject: Re: Add pg_file_sync() to adminpack

Re: Physical replication slot advance is not persistent - Mailing list pgsql-hackers

Previous

Next