Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1 - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Date
Msg-id CAA4eK1K+mpN-nz4j2WobyFjJAtLzV2pzjb_QZf0yATjVM6dOtQ@mail.gmail.com
Whole thread Raw
In response to Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Andres Freund <andres@anarazel.de>)
Responses RE: DROP DATABASE deadlocks with logical replication worker in PG 15.1
List pgsql-bugs
On Wed, Jan 18, 2023 at 1:34 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-01-17 06:23:45 +0530, Amit Kapila wrote:
>
> > There is an analysis of the test
> > failure in the email [2] which explains the race condition that leads
> > to test failure. Thinking again about the failure, I feel we can
> > instead change the failed test (t/004_sync.pl) to either ensure that
> > both the walsenders (corresponding to sync worker and apply worker)
> > exits after dropping the subscription and before checking the
> > remaining slots on publisher or wait for slots to become zero in the
> > test.
>
> How about waiting for the table to start to be synced (and thus the slot to be
> created) before issuing the drop subscription?
>

In this test [1], the initial sync fails due to a unique constraint
violation, so checking that the sync has started is a bit tricky. We
can probably check sync_error_count in pg_stat_subscription_stats to
ensure that sync has started to fail which will ideally ensure that
the sync has started. I am not sure this would be completely safe. The
other possible ways are (a) after creating a subscription, wait for
two slots to get created in the publisher, and then after dropping
subscription wait for slots to become zero on the publisher; (b) after
dropping the subscription, wait for slots to become zero.

I think one of (a) or (b) will work.

[1]
# Table tap_rep already has the same records on both publisher and subscriber
# at this time. Recreate the subscription which will do the initial copy of
# the table again and fails due to unique constraint violation.
$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
PUBLICATION tap_pub"
);
...
...

-- 
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: index not used for bigint without explicit cast
Next
From: Roman Cervenak
Date:
Subject: IN clause behaving badly with missing comma and line break