Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table
Date
Msg-id CAD21AoCEhOwP3VTsVvpQK3G=gas28azv6gtFny=Sc3Ee6+-DzQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Responses Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, May 11, 2017 at 6:16 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 11/05/17 10:10, Masahiko Sawada wrote:
>> On Thu, May 11, 2017 at 4:06 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Wed, May 10, 2017 at 11:57 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> Barring any objections, I'll add these two issues to open item.
>>>
>>> It seems to me that those open items have not been added yet to the
>>> list. If I am following correctly, they could be defined as follows:
>>> - Dropping subscription may stuck if done during tablesync.
>>> -- Analyze deadlock issues with DROP SUBSCRIPTION and apply worker process.
>
> I think the solution to this is to reintroduce the LWLock that was
> removed and replaced with the exclusive lock on catalog [1]. I am afraid
> that correct way of handling this is to do both LWLock and catalog lock
> (first LWLock under which we kill the workers and then catalog lock so
> that something that prevents launcher from restarting them is held till
> the end of transaction).

I agree to reintroduce LWLock and to stop logical rep worker first and
then modify catalog. That way we can reduce catalog lock level (maybe
to RowExclusiveLock) so that apply worker can see it. Also I think
that we need to do more things like in order to prevent that we keep
to hold LWLock until end of transaction, because holding LWLock until
end of transaction is not good idea and could be cause of deadlock. So
for example we can commit the transaction in DropSubscription after
cleaned pg_subscription record and all its dependencies and then start
new transaction for the remaining work. Of course we also need to
disallow DROP SUBSCRIPTION being executed in a user transaction
though.

>
>>> -- Avoid orphaned tablesync worker if apply worker exits before
>>> changing its status.
>>
>
> The behavior question I have about this is if sync workers should die
> when apply worker dies (ie they are tied to apply worker) or if they
> should be tied to the subscription.
>
> I guess taking down all the sync workers when apply worker has exited is
> easier to solve. Of course it means that if apply worker restarts in
> middle of table synchronization, the table synchronization will have to
> start from scratch. That being said, in normal operation apply worker
> should only exit/restart if subscription has changed or has been
> dropped/disabled and I think sync workers want to exit/restart in that
> situation as well.

I agree that sync workers are tied to the apply worker.

>
> So for example having shmem detach hook for an apply worker (or reusing
> the existing one) that searches for all the other workers for same
> subscription and shuts them down as well sounds like solution to this.

Seems reasonable solution.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] [POC] hash partitioning
Next
From: Neha Khatri
Date:
Subject: Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()