Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table
Date
Msg-id CAD21AoBZoUxT4a12Djeo=pJOc0fme_ePLzNv3oJ0iR4bed7iDQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table
List pgsql-hackers
On Mon, Jun 26, 2017 at 12:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sun, Jun 25, 2017 at 7:35 PM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>> (was away for a while, got some time now for this again)
>>
>> On 22/06/17 04:43, Peter Eisentraut wrote:
>>> The alternative is that we use the LockSharedObject() approach that was
>>> already alluded to, like in the attached patch.  Both approaches would
>>> work equally fine AFAICT.
>>
>> I agree, but I think we need bigger overhaul of the locking/management
>> in general. So here is patch which does much more changes.
>>
>> The patch does several important things (in no particular order):
>> - Split SetSubscriptionRelState into AddSubscriptionRelState and
>> UpdateSubscriptionRelState for the reasons said upstream, it's cleaner,
>> there is no half-broken upsert logic and it has proper error checking
>> for each action.
>>
>> - Do LockSharedObject in ALTER SUBSCRIPTION, DROP SUBSCRIPTION (this one
>> is preexisting but mentioning it for context), SetSubscriptionRelState,
>> AddSubscriptionRelState, and in the logicalrep_worker_launch. This means
>> we use granular per object locks to deal with concurrency.
>>
>> - Because of above, the AccessExclusiveLock on pg_subscription is no
>> longer needed, just normal RowExlusiveLock is used now.
>>
>> - logicalrep_worker_stop is also simplified due to the proper locking
>>
>> - There is new interface logicalrep_worker_stop_at_commit which is used
>> by ALTER SUBSCRIPTION ... REFRESH PUBLICATION and by transactional
>> variant of DROP SUBSCRIPTION to only kill workers at the end of transaction.
>>
>> - Locking/reading of subscription info is unified between DROP and ALTER
>> SUBSCRIPTION commands.
>>
>> - DROP SUBSCRIPTION will kill all workers associated with subscription,
>> not just apply.
>>
>> - The sync worker checks during startup if the relation is still subscribed.
>>
>> - The sync worker will exit when waiting for apply and apply has shut-down.
>>
>> - The log messages around workers and removed or disabled subscription
>> are now more consistent between startup and normal runtime of the worker.
>>
>> - Some code deduplication and stylistic changes/simplification in
>> related areas.
>>
>> - Fixed catcache's IndexScanOK() handling of the subscription catalog.
>>
>> It's bit bigger patch but solves issues from multiple threads around
>> handling of ALTER/DROP subscription.
>>
>> A lot of the locking that I added is normally done transparently by
>> dependency handling, but subscriptions and subscription relation status
>> do not use that much as it was deemed to bloat pg_depend needlessly
>> during the original patch review (it's also probably why this has
>> slipped through).
>>
>

I've reviewed this patch briefly.

@@ -515,6 +533,31 @@ logicalrep_worker_stop(Oid subid, Oid relid)}
/*
+ * Request worker to be stopped on commit.
+ */
+void
+logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
+{
+       LogicalRepWorkerId *wid;
+       MemoryContext old;
+
+       old = MemoryContextSwitchTo(TopTransactionContext);
+
+       wid = (LogicalRepWorkerId *) palloc(sizeof(LogicalRepWorkerId));
+
+       /*
+       wid = MemoryContextAlloc(TopTransactionContext,
+
sizeof(LogicalRepWorkerId));
+       */
+       wid->subid = subid;
+       wid->relid = relid;
+
+       on_commit_stop_workers = lappend(on_commit_stop_workers, wid);
+
+       MemoryContextSwitchTo(old);
+}

logicalrep_worker_stop_at_commit() has a problem that new_list()
called by lappend() allocates the memory from current memory context.
It should switch the memory context and then allocate the memory for
wid and append it to the list.

--------
@@ -754,9 +773,25 @@ ApplyLauncherShmemInit(void)voidAtEOXact_ApplyLauncher(bool isCommit){
-       if (isCommit && on_commit_launcher_wakeup)
-               ApplyLauncherWakeup();
+       ListCell *lc;
+
+       if (isCommit)
+       {
+               foreach (lc, on_commit_stop_workers)
+               {
+                       LogicalRepWorkerId *wid = lfirst(lc);
+                       logicalrep_worker_stop(wid->subid, wid->relid);
+               }
+
+               if (on_commit_launcher_wakeup)
+                       ApplyLauncherWakeup();

Stopping the logical rep worker in AtEOXact_ApplyLauncher doesn't
support the prepared transaction. Since we allocate the list
on_commit_stop_workers in TopTransactionContext the postgres crashes
if we execute any query after prepared transaction that removes
subscription relation state. Also after fixed this issue, we still
need to something: the list of on_commit_stop_workers is not
associated the prepared transaction.  A query next to "preapre
transaction" tries to stop workers at the commit. There was similar
discussion before.

--------
+
+                               ensure_transaction();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+           rstate->relid, rstate->state,
+           rstate->lsn);


Should we commit the transaction if we started a new transaction
before update the subscription relation state, or it could be deadlock
risk.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [HACKERS] Setting pd_lower in GIN metapage
Next
From: Shubham Barai
Date:
Subject: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index