Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id 20220218222319.yozkbhren7vkjbi5@alap3.anarazel.de
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses RE: Synchronizing slots from primary to standby  ("kato-sho@fujitsu.com" <kato-sho@fujitsu.com>)
Re: Synchronizing slots from primary to standby  (James Coleman <jtc331@gmail.com>)
Re: Synchronizing slots from primary to standby  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
Hi,

On 2022-02-11 15:28:19 +0100, Peter Eisentraut wrote:
> On 05.02.22 20:59, Andres Freund wrote:
> > On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > >  From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > > From: Peter Eisentraut<peter@eisentraut.org>
> > > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > > Subject: [PATCH v3] Synchronize logical replication slots from primary to
> > >   standby
> > I've just skimmed the patch and the related threads. As far as I can tell this
> > cannot be safely used without the conflict handling in [1], is that correct?
> 
> This or similar questions have been asked a few times about this or similar
> patches, but they always come with some doubt.

I'm certain it's a problem - the only reason I couched it was that there could
have been something clever in the patch preventing problems that I missed
because I just skimmed it.


> If we think so, it would be
> useful perhaps if we could come up with test cases that would demonstrate
> why that other patch/feature is necessary.  (I'm not questioning it
> personally, I'm just throwing out ideas here.)

The patch as-is just breaks one of the fundamental guarantees necessary for
logical decoding, that no rows versions can be removed that are still required
for logical decoding (signalled via catalog_xmin). So there needs to be an
explicit mechanism upholding that guarantee, but there is not right now from
what I can see.

One piece of the referenced patchset is that it adds information about removed
catalog rows to a few WAL records, and then verifies during replay that no
record can be replayed that removes resources that are still needed. If such a
conflict exists it's dealt with as a recovery conflict.

That itself doesn't provide prevention against removal of required, but it
provides detection. The prevention against removal can then be done using a
physical replication slot with hot standby feedback or some other mechanism
(e.g. slot syncing mechanism could maintain a "placeholder" slot on the
primary for all sync targets or something like that).

Even if that infrastructure existed / was merged, the slot sync stuff would
still need some very careful logic to protect against problems due to
concurrent WAL replay and "synchronized slot" creation. But that's doable.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Emit a warning if the extension's GUC is set incorrectly
Next
From: Andrew Dunstan
Date:
Subject: Re: killing perl2host