From 559dda09b35870d3630a65cbca682e50343c6f0f Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Thu, 25 Jul 2019 09:14:58 +0800 Subject: [PATCH] Fix a delay in PostgreSQL shutdown caused by logical replication Due to a race with WAL writing during shutdown, if logical walsenders were running then PostgreSQL's shutdown could be delayed by up to wal_sender_timeout/2 while it waits for the walsenders to shut down. The walsenders wait for new WAL or end-of-wal which won't come until shutdown so there's a deadlock. The walsender timeout eventually breaks the deadlock. The issue was introduced by PostgreSQL 10 commit c6c3334364 "Prevent possibility of panics during shutdown checkpoint". A logical walsender never enters WALSNDSTATE_STOPPING and allows the checkpointer to continue shutdown. Instead it exits when it reads end-of-WAL. But if it reads the last WAL record written before shutdown and that record doesn't generate a client network write, it can mark itself caught up and go to sleep without checking to see if it's been asked to shut down. Fix by making sure the logical walsender always checks if it's been asked to shut down before it allows the walsender main loop to go to sleep. When this issue happens the walsender(s) can be seen to be sleeping in WaitLatchOrSocket in WalSndLoop until woken by the keepalive timeout. The checkpointer will be waiting in WalSndWaitStopping() for the walsenders to enter WALSNDSTATE_STOPPING or exit, whichever happens first. The postmaster will be waiting in ServerLoop for the checkpointer to finish the shutdown checkpoint. --- src/backend/replication/walsender.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index b489c9c27f..c565e208bc 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -2907,18 +2907,29 @@ XLogSendLogical(void) * point, then we're caught up. */ if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) - { WalSndCaughtUp = true; - - /* - * Have WalSndLoop() terminate the connection in an orderly - * manner, after writing out all the pending data. - */ - if (got_STOPPING) - got_SIGUSR2 = true; - } } + /* + * If we've recently sent up to the currently flushed WAL + * and are shutting down, we can safely wrap up by flushing + * buffers and exchanging CopyDone messages. It doesn't matter + * if more WAL may be written before shutdown because no + * WAL written after replication slots are checkpointed + * can result in invocation of logical decoding hooks and + * output to the client. + * + * We could instead WalSndSetState(WALSNDSTATE_STOPPING) + * to allow shutdown to continue and put the walsender + * in a state where any unexpected WAL records Assert. + * But this is safer as it reduces the risk of panics in + * hard-to-reach-and-test code. + */ + if (got_STOPPING && WalSndCaughtUp) + got_SIGUSR2 = true; + /* Update shared memory status */ { WalSnd *walsnd = MyWalSnd; -- 2.21.0 --- src/backend/replication/walsender.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index e7a59b0a92..a5d375f097 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -2858,18 +2858,29 @@ XLogSendLogical(void) * point, then we're caught up. */ if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) - { WalSndCaughtUp = true; - - /* - * Have WalSndLoop() terminate the connection in an orderly - * manner, after writing out all the pending data. - */ - if (got_STOPPING) - got_SIGUSR2 = true; - } } + /* + * If we've recently sent up to the currently flushed WAL + * and are shutting down, we can safely wrap up by flushing + * buffers and exchanging CopyDone messages. It doesn't matter + * if more WAL may be written before shutdown because no + * WAL written after replication slots are checkpointed + * can result in invocation of logical decoding hooks and + * output to the client. + * + * We could instead WalSndSetState(WALSNDSTATE_STOPPING) + * to allow shutdown to continue and put the walsender + * in a state where any unexpected WAL records Assert. + * But the net effect is the same and this is safer to + * backpatch on customer systems. + * + * See RT64864 + */ + if (got_STOPPING && WalSndCaughtUp) + got_SIGUSR2 = true; + /* Update shared memory status */ { WalSnd *walsnd = MyWalSnd; -- 2.21.0