Re: Physical replication slot advance is not persistent - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Physical replication slot advance is not persistent
Date
Msg-id 20200128080114.GB145179@paquier.xyz
Whole thread Raw
In response to Re: Physical replication slot advance is not persistent  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Physical replication slot advance is not persistent  (Craig Ringer <craig@2ndquadrant.com>)
Re: Physical replication slot advance is not persistent  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Jan 21, 2020 at 02:07:30PM +0900, Michael Paquier wrote:
> On Mon, Jan 20, 2020 at 11:00:14AM -0800, Andres Freund wrote:
>> Hm. I'm think testing this with real LSNs is a better idea. What if the
>> node actually already is past FF/FFFFFFFF at this point? Quite unlikely,
>> I know, but still. I.e. why not get the current LSN after the INSERT,
>> and assert that the slot, after the restart, is that?
>
> Sure.  If not disabling autovacuum in the test, we'd just need to make
> sure if that advancing is at least ahead of the INSERT position.

Actually, as the advancing happens only up to this position we just
need to make sure that the LSN reported by the slot is the same as the
position advanced to.  I have switched the test to just do that
instead of using a fake LSN.

> Anyway, I am still not sure if we should got down the road to just
> mark the slot as dirty if any advancing is done and let the follow-up
> checkpoint to the work, if the advancing function should do the slot
> flush, or if we choose one and make the other an optional choice (not
> for back-branches, obviously.  Based on my reading of the code, my
> guess is that a flush should happen at the end of the advancing
> function.

I have been chewing on this point for a couple of days, and as we may
actually crash between the moment the slot is marked as dirty and the
moment the slot information is made consistent, we still have a risk
to have the slot go backwards even if the slot information is saved.
The window is much narrower, but well, the docs of logical decoding
mention that this risk exists.  And the patch becomes much more
simple without changing the actual behavior present since the feature
has been introduced for logical slots.  There could be a point in
having a new option to flush the slot information, or actually a
separate function to flush the slot information, but let's keep that
for a future possibility.

So attached is an updated patch which addresses the problem just by
marking a physical slot as dirty if any advancing is done.  Some
documentation is added about the fact that an advance is persistent
only at the follow-up checkpoint.  And the tests are fixed to not use
a fake LSN but instead advance to the latest LSN position produced.

Any objections?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Peter Eisentraut
Date:
Subject: Re: Allow to_date() and to_timestamp() to accept localized names