Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1 - Mailing list pgsql-bugs

From Andres Freund
Subject Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Date
Msg-id 20230114160201.en5g5bna66x6lnuw@awork3.anarazel.de
Whole thread Raw
In response to DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Lakshmi Narayanan Sreethar <lakshmi@timescale.com>)
Responses Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Andres Freund <andres@anarazel.de>)
Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
Hi,

Good catch.

The problem is here:

On 2023-01-13 20:53:49 +0530, Lakshmi Narayanan Sreethar wrote:
> #7  0x0000559cccbe1e71 in LogicalRepSyncTableStart
> (origin_startpos=0x7fffb26f7728) at
> /pg15.1/src/backend/replication/logical/tablesync.c:1353

Because the logical rep code explicitly prevents interrupts:

    /*
     * Create a new permanent logical decoding slot. This slot will be used
     * for the catchup phase after COPY is done, so tell it to use the
     * snapshot to make the final data consistent.
     *
     * Prevent cancel/die interrupts while creating slot here because it is
     * possible that before the server finishes this command, a concurrent
     * drop subscription happens which would complete without removing this
     * slot leading to a dangling slot on the server.
     */
    HOLD_INTERRUPTS();
    walrcv_create_slot(LogRepWorkerWalRcvConn,
                       slotname, false /* permanent */ , false /* two_phase */ ,
                       CRS_USE_SNAPSHOT, origin_startpos);
    RESUME_INTERRUPTS();

Which is just completely entirely wrong. Independent of this issue even. Not
allowing termination for the duration of command executed over network?

This is from:

commit 6b67d72b604cb913e39324b81b61ab194d94cba0
Author: Amit Kapila <akapila@postgresql.org>
Date:   2021-03-17 08:15:12 +0530

    Fix race condition in drop subscription's handling of tablesync slots.

    Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
    Subscription to drop such slots. However, it is possible that before
    tablesync worker could get the acknowledgment of slot creation, drop
    subscription stops it and that can lead to a dangling slot on the
    publisher. Prevent cancel/die interrupts while creating a slot in the
    tablesync worker.

    Reported-by: Thomas Munro as per buildfarm
    Author: Amit Kapila
    Reviewed-by: Vignesh C, Takamichi Osumi
    Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com


But this can't be the right fix.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17751: DDL CREATE statement accepts invalid default values for FOREIGN KEY constraint.
Next
From: Tom Lane
Date:
Subject: Re: BUG #17751: DDL CREATE statement accepts invalid default values for FOREIGN KEY constraint.