Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1 - Mailing list pgsql-bugs

From Andres Freund
Subject Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Date
Msg-id 20230117200432.xaoenn7ni7srb2l2@awork3.anarazel.de
Whole thread Raw
In response to Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
Hi,

On 2023-01-17 06:23:45 +0530, Amit Kapila wrote:
> As per my initial analysis, I have added this code to hold/resume
> interrupts during slot creation due to the test failure (in buildfarm)
> reported in the email [1]. It is clearly a wrong fix as per the report
> and discussion in this thread.

Yea. You really can never hold interrupts across some thing that could
indefinitely be blocked. A HOLD_INTERRUPTS() while doing error recovery (as in
DisableSubscriptionAndExit()) is fine, that's basically a finite amount of
work. But doing so while issuing SQL commands to another node, or anything
else that could just block indefinitely, isn't.


> There is an analysis of the test
> failure in the email [2] which explains the race condition that leads
> to test failure. Thinking again about the failure, I feel we can
> instead change the failed test (t/004_sync.pl) to either ensure that
> both the walsenders (corresponding to sync worker and apply worker)
> exits after dropping the subscription and before checking the
> remaining slots on publisher or wait for slots to become zero in the
> test.

How about waiting for the table to start to be synced (and thus the slot to be
created) before issuing the drop subscription? If the slot hadn't yet been
created, the test doesn't prove that we successfully clean up...

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Possible wrong result with some "in" subquery with non-existing columns
Next
From: Masahiko Sawada
Date:
Subject: Re: BUG #17741: vacuum process hangs after pg_surgery manipulations