Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table
Date
Msg-id CAD21AoB6MJfRxMJpLSEvqMyigU9BSP5aMBYG28QpbcW2C1X8FA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Wed, May 10, 2017 at 2:46 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, May 8, 2017 at 8:42 PM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>> On 08/05/17 11:27, Masahiko Sawada wrote:
>>> Hi,
>>>
>>> I encountered a situation where DROP SUBSCRIPTION got stuck when
>>> initial table sync is in progress. In my environment, I created
>>> several tables with some data on publisher. I created subscription on
>>> subscriber and drop subscription immediately after that. It doesn't
>>> always happen but I often encountered it on my environment.
>>>
>>> ps -x command shows the following.
>>>
>>>  96796 ?        Ss     0:00 postgres: masahiko postgres [local] DROP
>>> SUBSCRIPTION
>>>  96801 ?        Ts     0:00 postgres: bgworker: logical replication
>>> worker for subscription 40993    waiting
>>>  96805 ?        Ss     0:07 postgres: bgworker: logical replication
>>> worker for subscription 40993 sync 16418
>>>  96806 ?        Ss     0:01 postgres: wal sender process masahiko [local] idle
>>>  96807 ?        Ss     0:00 postgres: bgworker: logical replication
>>> worker for subscription 40993 sync 16421
>>>  96808 ?        Ss     0:00 postgres: wal sender process masahiko [local] idle
>>>
>>> The DROP SUBSCRIPTION process (pid 96796) is waiting for the apply
>>> worker process (pid 96801) to stop while holding a lock on
>>> pg_subscription_rel. On the other hand the apply worker is waiting for
>>> acquiring a tuple lock on pg_subscription_rel needed for heap_update.
>>> Also table sync workers (pid 96805 and 96807) are waiting for the
>>> apply worker process to change their status.
>>>
>>
>> Looks like we should kill apply before dropping dependencies.
>
> Sorry, after investigated I found out that DROP SUBSCRIPTION process
> is holding AccessExclusiveLock on pg_subscription (, not
> pg_subscription_rel) and apply worker is waiting for acquiring a lock
> on it.

Hmm it seems there are two cases. One is that the apply worker waits
to acquire AccessShareLock on pg_subscription but DropSubscription
already acquired AcessExclusiveLock on it and waits for the apply
worker to finish. Another case is that the apply worker waits to
acquire a tuple lock on pg_subscrption_rel but DropSubscription (maybe
droppoing dependencies) already acquired it.

> So I guess that the dropping dependencies are not relevant with
> this.  It seems to me that the main cause is that DROP SUBSCRIPTION
> waits for apply worker to finish while keeping to hold
> AccessExclusiveLock on pg_subscription. Perhaps we need to contrive
> ways to reduce lock level somehow.
>
>>
>>> Also, even when DROP SUBSCRIPTION is done successfully, the table sync
>>> worker can be orphaned because I guess that the apply worker can exit
>>> before change status of table sync worker.
>>
>> Well the tablesync worker should stop itself if the subscription got
>> removed, but of course again the dependencies are an issue, so we should
>> probably kill those explicitly as well.
>
> Yeah, I think that we should ensure that the apply worker exits after
> killed all involved table sync workers.
>

Barring any objections, I'll add these two issues to open item.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [HACKERS] delta relations in AFTER triggers
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] multi-column range partition constraint