Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table |
Date | |
Msg-id | CAD21AoBZoUxT4a12Djeo=pJOc0fme_ePLzNv3oJ0iR4bed7iDQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table
|
List | pgsql-hackers |
On Mon, Jun 26, 2017 at 12:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sun, Jun 25, 2017 at 7:35 PM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> (was away for a while, got some time now for this again) >> >> On 22/06/17 04:43, Peter Eisentraut wrote: >>> The alternative is that we use the LockSharedObject() approach that was >>> already alluded to, like in the attached patch. Both approaches would >>> work equally fine AFAICT. >> >> I agree, but I think we need bigger overhaul of the locking/management >> in general. So here is patch which does much more changes. >> >> The patch does several important things (in no particular order): >> - Split SetSubscriptionRelState into AddSubscriptionRelState and >> UpdateSubscriptionRelState for the reasons said upstream, it's cleaner, >> there is no half-broken upsert logic and it has proper error checking >> for each action. >> >> - Do LockSharedObject in ALTER SUBSCRIPTION, DROP SUBSCRIPTION (this one >> is preexisting but mentioning it for context), SetSubscriptionRelState, >> AddSubscriptionRelState, and in the logicalrep_worker_launch. This means >> we use granular per object locks to deal with concurrency. >> >> - Because of above, the AccessExclusiveLock on pg_subscription is no >> longer needed, just normal RowExlusiveLock is used now. >> >> - logicalrep_worker_stop is also simplified due to the proper locking >> >> - There is new interface logicalrep_worker_stop_at_commit which is used >> by ALTER SUBSCRIPTION ... REFRESH PUBLICATION and by transactional >> variant of DROP SUBSCRIPTION to only kill workers at the end of transaction. >> >> - Locking/reading of subscription info is unified between DROP and ALTER >> SUBSCRIPTION commands. >> >> - DROP SUBSCRIPTION will kill all workers associated with subscription, >> not just apply. >> >> - The sync worker checks during startup if the relation is still subscribed. >> >> - The sync worker will exit when waiting for apply and apply has shut-down. >> >> - The log messages around workers and removed or disabled subscription >> are now more consistent between startup and normal runtime of the worker. >> >> - Some code deduplication and stylistic changes/simplification in >> related areas. >> >> - Fixed catcache's IndexScanOK() handling of the subscription catalog. >> >> It's bit bigger patch but solves issues from multiple threads around >> handling of ALTER/DROP subscription. >> >> A lot of the locking that I added is normally done transparently by >> dependency handling, but subscriptions and subscription relation status >> do not use that much as it was deemed to bloat pg_depend needlessly >> during the original patch review (it's also probably why this has >> slipped through). >> > I've reviewed this patch briefly. @@ -515,6 +533,31 @@ logicalrep_worker_stop(Oid subid, Oid relid)} /* + * Request worker to be stopped on commit. + */ +void +logicalrep_worker_stop_at_commit(Oid subid, Oid relid) +{ + LogicalRepWorkerId *wid; + MemoryContext old; + + old = MemoryContextSwitchTo(TopTransactionContext); + + wid = (LogicalRepWorkerId *) palloc(sizeof(LogicalRepWorkerId)); + + /* + wid = MemoryContextAlloc(TopTransactionContext, + sizeof(LogicalRepWorkerId)); + */ + wid->subid = subid; + wid->relid = relid; + + on_commit_stop_workers = lappend(on_commit_stop_workers, wid); + + MemoryContextSwitchTo(old); +} logicalrep_worker_stop_at_commit() has a problem that new_list() called by lappend() allocates the memory from current memory context. It should switch the memory context and then allocate the memory for wid and append it to the list. -------- @@ -754,9 +773,25 @@ ApplyLauncherShmemInit(void)voidAtEOXact_ApplyLauncher(bool isCommit){ - if (isCommit && on_commit_launcher_wakeup) - ApplyLauncherWakeup(); + ListCell *lc; + + if (isCommit) + { + foreach (lc, on_commit_stop_workers) + { + LogicalRepWorkerId *wid = lfirst(lc); + logicalrep_worker_stop(wid->subid, wid->relid); + } + + if (on_commit_launcher_wakeup) + ApplyLauncherWakeup(); Stopping the logical rep worker in AtEOXact_ApplyLauncher doesn't support the prepared transaction. Since we allocate the list on_commit_stop_workers in TopTransactionContext the postgres crashes if we execute any query after prepared transaction that removes subscription relation state. Also after fixed this issue, we still need to something: the list of on_commit_stop_workers is not associated the prepared transaction. A query next to "preapre transaction" tries to stop workers at the commit. There was similar discussion before. -------- + + ensure_transaction(); + UpdateSubscriptionRelState(MyLogicalRepWorker->subid, + rstate->relid, rstate->state, + rstate->lsn); Should we commit the transaction if we started a new transaction before update the subscription relation state, or it could be deadlock risk. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: