Deadlock between logrep apply worker and tablesync worker - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Deadlock between logrep apply worker and tablesync worker |
Date | |
Msg-id | 1412708.1674417574@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Deadlock between logrep apply worker and tablesync worker
Re: Deadlock between logrep apply worker and tablesync worker |
List | pgsql-hackers |
On my machine, the src/test/subscription/t/002_types.pl test usually takes right about 1.5 seconds: $ time make check PROVE_FLAGS=--timer PROVE_TESTS=t/002_types.pl ... [14:22:12] t/002_types.pl .. ok 1550 ms ( 0.00 usr 0.00 sys + 0.70 cusr 0.25 csys = 0.95 CPU) [14:22:13] I noticed however that sometimes (at least one try in ten, for me) it takes about 2.5 seconds: [14:22:16] t/002_types.pl .. ok 2591 ms ( 0.00 usr 0.00 sys + 0.69 cusr 0.28 csys = 0.97 CPU) [14:22:18] and I've even seen 3.5 seconds. I dug into this and eventually identified the cause: it's a deadlock between a subscription's apply worker and a tablesync worker that it's spawned. Sometimes the apply worker calls wait_for_relation_state_change (to wait for the tablesync worker to finish) while it's holding a lock on pg_replication_origin. If that's the case, then when the tablesync worker reaches process_syncing_tables_for_sync it is able to perform UpdateSubscriptionRelState and reach the transaction commit below that; but when it tries to do replorigin_drop_by_name a little further down, it blocks on acquiring ExclusiveLock on pg_replication_origin. So we have an undetected deadlock. We escape that because wait_for_relation_state_change has a 1-second timeout, after which it rechecks GetSubscriptionRelState and is able to see the committed relation state change; so it continues, and eventually releases its transaction and the lock, permitting the tablesync worker to finish. I've not tracked down the exact circumstances in which the apply worker ends up holding a problematic lock, but it seems likely that it corresponds to cases where its main loop has itself called replorigin_drop_by_name, a bit further up, for some other concurrent tablesync operation. (In all the cases I've traced through, the apply worker is herding multiple tablesync workers when this happens.) I experimented with having the apply worker release its locks before waiting for the tablesync worker, as attached. This passes check-world and it seems to eliminate the test runtime instability, but I wonder whether it's semantically correct. This whole business of taking table-wide ExclusiveLock on pg_replication_origin looks like a horrid kluge that we should try to get rid of, not least because I don't see any clear documentation of what hazard it's trying to prevent. Another thing that has a bad smell about it is the fact that process_syncing_tables_for_sync uses two transactions in the first place. There's a comment there claiming that it's for crash safety, but I can't help suspecting it's really because this case becomes a hard deadlock without that mid-function commit. It's not great in any case that the apply worker can move on in the belief that the tablesync worker is done when in fact the latter still has catalog state updates to make. And I wonder what we're doing with having both of them calling replorigin_drop_by_name ... shouldn't that responsibility belong to just one of them? So I think this whole area deserves a hard look, and I'm not at all sure that what's attached is a good solution. regards, tom lane diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 4647837b82..bb45c2107f 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -539,15 +539,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn) LWLockRelease(LogicalRepWorkerLock); /* - * Enter busy loop and wait for synchronization worker to - * reach expected state (or die trying). + * If we have a transaction, we must commit it to release + * any locks we have (it's quite likely we hold lock on + * pg_replication_origin, which the sync worker will need + * to update). Then start a new transaction so we can + * examine catalog state. */ - if (!started_tx) + if (started_tx) + { + CommitTransactionCommand(); + StartTransactionCommand(); + } + else { StartTransactionCommand(); started_tx = true; } + /* + * Enter busy loop and wait for synchronization worker to + * reach expected state (or die trying). + */ wait_for_relation_state_change(rstate->relid, SUBREL_STATE_SYNCDONE); }
pgsql-hackers by date: