Re: 035_standby_logical_decoding unbounded hang - Mailing list pgsql-hackers

From Noah Misch
Subject Re: 035_standby_logical_decoding unbounded hang
Date
Msg-id 20240215204816.cb.nmisch@google.com
Whole thread Raw
In response to Re: 035_standby_logical_decoding unbounded hang  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: 035_standby_logical_decoding unbounded hang
List pgsql-hackers
On Wed, Feb 14, 2024 at 03:31:16PM +0000, Bertrand Drouvot wrote:
> On Sat, Feb 10, 2024 at 05:02:27PM -0800, Noah Misch wrote:
> > The 035_standby_logical_decoding.pl hang is
> > a race condition arising from an event sequence like this:
> > 
> > - Test script sends CREATE SUBSCRIPTION to subscriber, which loses the CPU.
> > - Test script calls pg_log_standby_snapshot() on primary.  Emits XLOG_RUNNING_XACTS.
> > - checkpoint_timeout makes a primary checkpoint finish.  Emits XLOG_RUNNING_XACTS.
> > - bgwriter executes LOG_SNAPSHOT_INTERVAL_MS logic.  Emits XLOG_RUNNING_XACTS.
> > - CREATE SUBSCRIPTION wakes up and sends CREATE_REPLICATION_SLOT to standby.
> > 
> > Other test code already has a solution for this, so the attached patches add a
> > timeout and copy the existing solution.  I'm also attaching the hack that
> > makes it 100% reproducible.

> I did a few tests and confirm that the proposed solution fixes the corner case.

Thanks for reviewing.

> What about creating a sub, say wait_for_restart_lsn_calculation() in Cluster.pm
> and then make use of it in create_logical_slot_on_standby() and above? (something
> like wait_for_restart_lsn_calculation-v1.patch attached).

Waiting for restart_lsn is just a prerequisite for calling
pg_log_standby_snapshot(), so I wouldn't separate those two.  If we're
extracting a sub, I would move the pg_log_standby_snapshot() call into the sub
and make the API like one of these:

  $standby->wait_for_subscription_starting_point($primary, $slot_name);
  $primary->log_standby_snapshot($standby, $slot_name);

Would you like to finish the patch in such a way?



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: index prefetching
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions