Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK
Date
Msg-id CAHGQGwEgOKJAXKHkiXVs6snQGpWQZo04Y=oLZqQV07vcJmxxPw@mail.gmail.com
Whole thread Raw
In response to [HACKERS] DROP SUBSCRIPTION and ROLLBACK  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Tue, Feb 7, 2017 at 9:10 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Hi all,
>
> While testing logical replciation I found that if the transaction
> issued DROP SUBSCRIPTION rollbacks then the logical repliation stops
> and the subscription can never be removed later. The document says
> that the replication worker associated with the subscription will not
> stop until after the transaction that issued this command has
> committed but it doesn't work.

Yeah, this is a bug.

ISTM that CREATE SUBSCRIPTION also has the similar issue. It creates
the replication slot on the publisher side before the transaction has been
committed. Even if the transaction is rollbacked, that replication slot is
not removed. That is, in a transaction block, we should not connect to
the publisher. Instead, the launcher or worker should do.

> The cause of this is that DropSubscription stops the apply worker and
> drops corresponding replication slot on publisher side without waiting
> for commit or rollback. The launcher process launches the apply worker
> again but the launched worker will fail to start logical replication
> because corresponding replication slot is already removed. And the
> orphan subscription can not be removed later.
>
> I think the logical replication should not stop and the corresponding
> replication slot and replication origin should not be removed until
> the transaction commits.

Yes.

> The solution for this I came up with is that the launcher process
> stops the apply worker after DROP SUBSCRIPTION is committed rather
> than DropSubscription does. And the apply worker drops replication
> slot and replication origin before exits. Attached draft patch fixes
> this issue.
>
> Please give me feedback.

The patch failed to apply to HEAD.

+ worker = logicalrep_worker_find(subid);
+ if (worker) {
- heap_close(rel, NoLock);
- return;
+ if (stmt->drop_slot)
+ worker->drop_slot = true;
+ worker->need_to_stop = true;

"drop_slot" and "need_to_stop" seem to be set to true even if the transaction
is rollbacked. This would cause the same problem that you're trying to fix.

I think that we should make the launcher periodically checks pg_subscription
and stop the worker if there is no its corresponding subscription. Then,
if necessary, the worker should remove its replication slot from the publisher.

Regards,

--
Fujii Masao



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] pg_restore is broken on 9.2 version.
Next
From: Pavel Raiskup
Date:
Subject: [HACKERS] [PATCH] configure-time knob to set default ssl ciphers