Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Minimal logical decoding on standbys
Date
Msg-id 20230407154757.ywqnldz4nsycap3g@awork3.anarazel.de
Whole thread Raw
In response to Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Minimal logical decoding on standbys
Re: Minimal logical decoding on standbys
List pgsql-hackers
Hi,

On 2023-04-07 17:13:13 +0200, Drouvot, Bertrand wrote:
> On 4/7/23 9:50 AM, Andres Freund wrote:
> > I added a check for !invalidated to
> > ReplicationSlotsComputeRequiredLSN() etc.
> > 
> 
> looked at 65-0001 and it looks good to me.
> 
> > Added new patch moving checks for invalid logical slots into
> > CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes
> > no sense. As far as I can tell the old message in
> > pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
> > "never previously reserved WAL"
> > 
> 
> looked at 65-0002 and it looks good to me.
> 
> > Split "Handle logical slot conflicts on standby." into two. I'm not sure that
> > should stay that way, but it made it easier to hack on
> > InvalidateObsoleteReplicationSlots.
> > 
> 
> looked at 65-0003 and the others.

Thanks for checking!


> > Todo:
> > - write a test that invalidated logical slots stay invalidated across a restart
> 
> Done in 65-66-0008 attached.

Cool.


> > - write a test that invalidated logical slots do not lead to retaining WAL
> 
> I'm not sure how to do that since pg_switch_wal() and friends can't be executed on
> a standby.

You can do it on the primary and wait for the records to have been applied.


> > - Further evolve the API of InvalidateObsoleteReplicationSlots()
> >    - pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
> >    - rename xid to snapshotConflictHorizon, that'd be more in line with the
> >      ResolveRecoveryConflictWithSnapshot and easier to understand, I think
> > 
> 
> Done. The new API can be found in v65-66-InvalidateObsoleteReplicationSlots_API.patch
> attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a switch/case
> can now be used.

Integrated. I moved the cause to the first argument, makes more sense to me
that way.


> The "default" case does not emit an error since this code runs as part
> of checkpoint.

I made it an error - it's a programming error, not some data level
inconsistency if that ever happens.


> > - The test could stand a bit of cleanup and consolidation
> >    - No need to start 4 psql processes to do 4 updates, just do it in one
> >      safe_psql()
> 
> Right, done in v65-66-0008-New-TAP-test-for-logical-decoding-on-standby.patch attached.

> >    - the sequence of drop_logical_slots(), create_logical_slots(),
> >      change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
> >      repeated quite a few times
> 
> grouped in reactive_slots_change_hfs_and_wait_for_xmins() in 65-66-0008 attached.
> 
> >    - the stats queries checking for specific conflict counts, including
> >      preceding tests, is pretty painful. I suggest to reset the stats at the
> >      end of the test instead (likely also do the drop_logical_slot() there).
> 
> Good idea, done in 65-66-0008 attached.
> 
> >    - it's hard to correlate postgres log and the tap test, because the slots
> >      are named the same across all tests. Perhaps they could have a per-test
> >      prefix?
> 
> Good point. Done in 65-66-0008 attached. Thanks to that and the stats reset the
> check for invalidation is now done in a single function "check_for_invalidation" that looks
> for invalidation messages in the logfile and in pg_stat_database_conflicts.
> 
> Thanks for the suggestions: the TAP test is now easier to read/understand.

Integrated all of these.


I think pg_log_standby_snapshot() should be added in "Allow logical decoding
on standby", not the commit adding the tests.


Is this patchset sufficient to subscribe to a publication on a physical
standby, assuming the publication is created on the primary? If so, we should
have at least a minimal test. If not, we should note that restriction
explicitly.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Introduce array_shuffle() and array_sample()
Next
From: Tom Lane
Date:
Subject: Re: Making background psql nicer to use in tap tests