Re: walsender vs. XLogBackgroundFlush during shutdown - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: walsender vs. XLogBackgroundFlush during shutdown |
Date | |
Msg-id | 20190501164620.4jjue75tswqdst3l@development Whole thread Raw |
In response to | Re: walsender vs. XLogBackgroundFlush during shutdown (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Wed, May 01, 2019 at 08:53:15AM -0700, Andres Freund wrote: >Hi, > >On 2019-05-01 02:28:45 +0200, Tomas Vondra wrote: >> The reason for that is pretty simple - the walsenders are doing logical >> decoding, and during shutdown they're waiting for WalSndCaughtUp=true. >> But per XLogSendLogical() this only happens if >> >> if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) >> { >> WalSndCaughtUp = true; >> ... >> } >> >> That is, we need to get beyong GetFlushRecPtr(). But that may never >> happen, because there may be incomplete (and unflushed) page in WAL >> buffers, not forced out by any transaction. So if there's a WAL record >> overflowing to the unflushed page, the walsender will never catch up. >> >> Now, this situation is apparently expected, because WalSndWaitForWal() >> does this: >> >> /* >> * If we're shutting down, trigger pending WAL to be written out, >> * otherwise we'd possibly end up waiting for WAL that never gets >> * written, because walwriter has shut down already. >> */ >> if (got_STOPPING) >> XLogBackgroundFlush(); >> >> but unfortunately that does not actually do anything, because right at >> the very beginning XLogBackgroundFlush() does this: >> >> /* back off to last completed page boundary */ >> WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ; > >> That is, it intentionally ignores the incomplete page, which means the >> walsender can't read the record and reach GetFlushRecPtr(). >> >> XLogBackgroundFlush() does this since (at least) 2007, so how come we >> never had issues with this before? > >I assume that's because of the following logic: > /* if we have already flushed that far, consider async commit records */ > if (WriteRqst.Write <= LogwrtResult.Flush) > { > SpinLockAcquire(&XLogCtl->info_lck); > WriteRqst.Write = XLogCtl->asyncXactLSN; > SpinLockRelease(&XLogCtl->info_lck); > flexible = false; /* ensure it all gets written */ > } > >and various pieces of the code doing XLogSetAsyncXactLSN() to force >flushing. I wonder if the issue is that we're better at avoiding >unnecessary WAL to be written due to >6ef2eba3f57f17960b7cd4958e18aa79e357de2f > I don't think so, because (a) there are no async commits involved, and (b) we originally ran into the issue on 9.6 and 6ef2eba3f57f1 is only in 10+. > >> But I don't think we're safe without the failover slots patch, because >> any output plugin can do something similar - say, LogLogicalMessage() or >> something like that. I'm not aware of a plugin doing such things, but I >> don't think it's illegal / prohibited either. (Of course, plugins that >> generate WAL won't be useful for decoding on standby in the future.) > >FWIW, I'd consider such an output plugin outright broken. > Why? Is that prohibited somewhere, either explicitly or implicitly? I do see obvious issues with generating WAL from plugin (infinite cycle and so on), but I suppose that's more a regular coding issue than something that would make all plugins doing that broken. FWIW I don't see any particular need to generate WAL from output plugins, I mentioned it mostly jsut as a convenient way to trigger the issue. I suppose there are other ways to generate a bit of WAL during shutdown. > >> So what I think we should do is to tweak XLogBackgroundFlush() so that >> during shutdown it skips the back-off to page boundary, and flushes even >> the last piece of WAL. There are only two callers anyway, so something >> like XLogBackgroundFlush(bool) would be simple enough. > >I think it then just ought to be a normal XLogFlush(). I.e. something >along the lines of: > > /* > * If we're shutting down, trigger pending WAL to be written out, > * otherwise we'd possibly end up waiting for WAL that never gets > * written, because walwriter has shut down already. > */ > if (got_STOPPING && !RecoveryInProgress()) > XLogFlush(GetXLogInsertRecPtr()); > > /* Update our idea of the currently flushed position. */ > if (!RecoveryInProgress()) > RecentFlushPtr = GetFlushRecPtr(); > else > RecentFlushPtr = GetXLogReplayRecPtr(NULL); > Perhaps. That would work too, I guess. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: