Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
Date
Msg-id CAHGQGwHqQVHmQ7wM=eLNnp1_oy-GVSSAcaJXWjE4nc2twSqXRg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Thu, Feb 2, 2017 at 2:36 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
>>> Then, the reason for the TRY-CATCH cluase is that I found that
>>> some functions called from there can throw exceptions.
>>
>> Yes, but all LWLocks should be released by normal error recovery.
>> It should not be necessary for this code to clean that up by hand.
>> If it were necessary, there would be TRY-CATCH around every single
>> LWLockAcquire in the backend, and we'd have an unreadable and
>> unmaintainable system.  Please don't add a TRY-CATCH unless it's
>> *necessary* -- and you haven't explained why this one is.

Yes.

> Putting hands into the code and at the problem, I can see that
> dropping a subscription on a node makes it unresponsive in case of a
> stop. And that's just because calls to LWLockRelease are missing as in
> the patch attached. A try/catch problem should not be necessary.

Thanks for the patch!

With the patch, LogicalRepLauncherLock is released at the end of
DropSubscription(). But ISTM that the lock should be released just after
logicalrep_worker_stop() and there is no need to protect the removal of
replication slot with the lock.
   /*   * If we found worker but it does not have proc set it is starting up,   * wait for it to finish and then kill
it.  */   while (worker && !worker->proc)   {
 

ISTM that the above loop in logicalrep_worker_stop() is not necessary
because LogicalRepLauncherLock ensures that the above condition is
always false. Thought? Am I missing something?

If the above condition is true, which means that there is the worker slot
having the "subid" of the worker to kill, but its "proc" has not been set yet.
Without LogicalRepLauncherLock, this situation can happen after "subid"
is set by the launcher and before "proc" is set by the worker. But
LogicalRepLauncherLock protects those operations, so logicalrep_worker_stop()
called while holding the lock should always think the above condition is false.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: [HACKERS] Deadlock in XLogInsert at AIX
Next
From: David Fetter
Date:
Subject: Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATEand DELETE