Re: Physical replication slot advance is not persistent - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Physical replication slot advance is not persistent
Date
Msg-id CAMsr+YHNWq+U9EFKTcUN=AB_F0g_ppCo=1zcnymGcPJfW63b1w@mail.gmail.com
Whole thread Raw
In response to Re: Physical replication slot advance is not persistent  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
((On Tue, 21 Jan 2020 at 11:06, Michael Paquier <michael@paquier.xyz> wrote:

> > The replication interface should not immediately flush changes to the
> > slot replay position on advance. It should be marked dirty and left to
> > be flushed by the next checkpoint. Doing otherwise potentially
> > introduces a lot of unnecessary fsync()s and may have an unpleasant
> > impact on performance.
>
> Some portions of the advancing code tells a different story.  It seems
> to me that the intention behind the first implementation of slot
> advancing was to get things flushed if any advancing was done.

Taking a step back here, I have no concerns with proposed changes for
pg_replication_slot_advance(). Disregard my comments about safety with
the SQL interface for the purposes of this thread, they apply only to
logical slots and are really unrelated to
pg_replication_slot_advance().

Re your comment above: For slot advances in general the flush to disk
is done lazily for performance reasons, but I think you meant
pg_replication_slot_advance() specifically.

pg_replication_slot_advance() doesn't appear to make any promises as
to immediate durability either way. It updates the required LSN
immediately with ReplicationSlotsUpdateRequiredLSN() so it
theoretically marks WAL as removable before it's flushed. But I don't
think we'll ever actually remove any WAL segments until checkpoint, at
which point we'll also flush any dirty slots, so it doesn't really
matter. For logical slots the lsn and xmin are both protected by the
effective/actual tracking logic and can't advance until the slot is
flushed.

The app might be surprised if the slot goes backwards after an
pg_replication_slot_advance() followed by a server crash though.

> The
> check doing that is actually broken from the start, but that's another
> story.  Could you check with Petr what was the intention here or drag
> his attention to this thread?  He is the original author of the
> feature.  So his output would be nice to have.

I'll ask him. He's pretty bogged at the moment though, and I've done a
lot of work in this area too. (See e.g. the catalog_xmin in hot
standby feedback changes).

> > The SQL interface advances the slot restart position and marks the
> > slot dirty *before the client has confirmed receipt of the data and
> > flushed it to disk*. So there's a data loss window. If the client
> > disconnects or crashes before all the data from that function call is
> > safely flushed to disk it may lose the data, then be unable to fetch
> > it again from the server because of the restart_lsn position advance.
>
> Well, you have the same class of problems with for example synchronous
> replication.  The only state a client can be sure of is that it
> received a confirmation that the operation happens and completed.
> Any other state tells that the operation may have happened.  Or not.
> Now, being sure that the data of the new slot has been flushed once
> the advancing function is done once the client got the confirmation
> that the work is done is a property which could be interesting to some
> class of applications.

That's what we already provide for the streaming interface for slot access.

I agree there's no need to shove a fix to the SQL interface for
phys/logical slots into this same discussion. I'm just trying to make
sure we don't "fix" a "bug" that's actually an important part of the
design by trying to fix a perceived-missing flush in the streaming
interface too. I am not at all confident that the test coverage for
this is sufficient right now, since we lack a good way to make
postgres delay various lazy internal activity to let us reliably
examine intermediate states in a race-free way, so I'm not sure tests
would catch it.

> > Really, we should add a "no_advance_position" option to the SQL
> > interface, then expect the client to call a second function that
> > explicitly advances the restart_lsn and confirmed_flush_lsn. Otherwise
> > no SQL interface client can be crashsafe.
>
> Hm.  Could you elaborate more what you mean here?  I am not sure to
> understand.  Note that calling the advance function multiple times has
> no effects, and the result returned is the actual restart_lsn (or
> confirmed_flush_lsn of course).

I've probably confused things a bit here. I don't mind if whether or
not pg_replication_slot_advance() forces an immediate flush, I think
there are reasonable arguments in both directions.

In the above I was talking about how pg_logical_slot_get_changes()
presently advances the slot position immediately, so if the client
loses its connection before reading and flushing all the data it may
be unable to recover. And while pg_logical_slot_peek_changes() lets
the app read the data w/o advancing the slot, it has to then do a
separate pg_replication_slot_advance() which has to do the decoding
work again. I'd like to improve that, but I didn't intend to confuse
or sidetrack this discussion in the process. Sorry.

We don't have a SQL-level interface for reading physical WAL so there
are no corresponding concerns there.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
Next
From: Craig Ringer
Date:
Subject: Re: pg13 PGDLLIMPORT list