walsender vs. XLogBackgroundFlush during shutdown - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | walsender vs. XLogBackgroundFlush during shutdown |
Date | |
Msg-id | 20190501002845.uyacu4p36zcubbs2@development Whole thread Raw |
Responses |
Re: walsender vs. XLogBackgroundFlush during shutdown
Re: walsender vs. XLogBackgroundFlush during shutdown Re: walsender vs. XLogBackgroundFlush during shutdown |
List | pgsql-hackers |
Hi, I see there's an ongoing discussion about race conditions in walreceiver blocking shutdown, so let me start a new thread about a race condition in walsender during shutdown. The symptoms are rather simple - 'pg_ctl -m fast shutdown' gets stuck, waiting for walsender processes to catch-up and terminate indefinitely. The reason for that is pretty simple - the walsenders are doing logical decoding, and during shutdown they're waiting for WalSndCaughtUp=true. But per XLogSendLogical() this only happens if if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) { WalSndCaughtUp = true; ... } That is, we need to get beyong GetFlushRecPtr(). But that may never happen, because there may be incomplete (and unflushed) page in WAL buffers, not forced out by any transaction. So if there's a WAL record overflowing to the unflushed page, the walsender will never catch up. Now, this situation is apparently expected, because WalSndWaitForWal() does this: /* * If we're shutting down, trigger pending WAL to be written out, * otherwise we'd possibly end up waiting for WAL that never gets * written, because walwriter has shut down already. */ if (got_STOPPING) XLogBackgroundFlush(); but unfortunately that does not actually do anything, because right at the very beginning XLogBackgroundFlush() does this: /* back off to last completed page boundary */ WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ; That is, it intentionally ignores the incomplete page, which means the walsender can't read the record and reach GetFlushRecPtr(). XLogBackgroundFlush() does this since (at least) 2007, so how come we never had issues with this before? Firstly, walsenders used for physical replication don't have this issue, because they don't need to decode the WAL record overflowing to the incomplete/unflushed page. So it seems only walsenders used for logical decoding are vulnerable to this. Secondly, this seems to happen only when a bit of WAL is generated just at the right moment during shutdown. I have initially ran into this issue with the failover slots (i.e. the patch that was committed and reverted in the 9.6 cycle, IIRC), which is doing exactly this - it's writing the slot positions into WAL during shutdown. Which made it pretty trivial to trigger this issue. But I don't think we're safe without the failover slots patch, because any output plugin can do something similar - say, LogLogicalMessage() or something like that. I'm not aware of a plugin doing such things, but I don't think it's illegal / prohibited either. (Of course, plugins that generate WAL won't be useful for decoding on standby in the future.) So what I think we should do is to tweak XLogBackgroundFlush() so that during shutdown it skips the back-off to page boundary, and flushes even the last piece of WAL. There are only two callers anyway, so something like XLogBackgroundFlush(bool) would be simple enough. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: