RE: DROP DATABASE deadlocks with logical replication worker in PG 15.1 - Mailing list pgsql-bugs

From houzj.fnst@fujitsu.com
Subject RE: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Date
Msg-id OS0PR01MB571678C898EA980B444BC7CE94C49@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: DROP DATABASE deadlocks with logical replication worker in PG 15.1  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (vignesh C <vignesh21@gmail.com>)
List pgsql-bugs
On Wednesday, January 18, 2023 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Jan 18, 2023 at 1:34 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2023-01-17 06:23:45 +0530, Amit Kapila wrote:
> >
> > > There is an analysis of the test
> > > failure in the email [2] which explains the race condition that
> > > leads to test failure. Thinking again about the failure, I feel we
> > > can instead change the failed test (t/004_sync.pl) to either ensure
> > > that both the walsenders (corresponding to sync worker and apply
> > > worker) exits after dropping the subscription and before checking
> > > the remaining slots on publisher or wait for slots to become zero in
> > > the test.
> >
> > How about waiting for the table to start to be synced (and thus the
> > slot to be
> > created) before issuing the drop subscription?
> >
> 
> In this test [1], the initial sync fails due to a unique constraint violation, so
> checking that the sync has started is a bit tricky. We can probably check
> sync_error_count in pg_stat_subscription_stats to ensure that sync has started to
> fail which will ideally ensure that the sync has started. I am not sure this would be
> completely safe. The other possible ways are (a) after creating a subscription,
> wait for two slots to get created in the publisher, and then after dropping
> subscription wait for slots to become zero on the publisher; (b) after dropping
> the subscription, wait for slots to become zero.
> 
> I think one of (a) or (b) will work.

I think in the mentioned testcase, the tablesync worker will keep restarting which
means the table sync slot is also being dropped and re-created ... . So, (a) waiting for
two slots to get created might not work as the slot will get dropped soon. I
think (b) waiting for slot to become zero would be a simpler way to make the test
stable. And here are the patches that tries to do it for all affected branches.

Best regards,
Hou zj

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: index not used for bigint without explicit cast
Next
From: David Rowley
Date:
Subject: Re: BUG #17753: pg_dump --if-exists bug