Re: walsender vs. XLogBackgroundFlush during shutdown - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: walsender vs. XLogBackgroundFlush during shutdown
Date
Msg-id 20190501164620.4jjue75tswqdst3l@development
Whole thread Raw
In response to Re: walsender vs. XLogBackgroundFlush during shutdown  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, May 01, 2019 at 08:53:15AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-05-01 02:28:45 +0200, Tomas Vondra wrote:
>> The reason for that is pretty simple - the walsenders are doing logical
>> decoding, and during shutdown they're waiting for WalSndCaughtUp=true.
>> But per XLogSendLogical() this only happens if
>>
>>    if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
>>    {
>>        WalSndCaughtUp = true;
>>        ...
>>    }
>>
>> That is, we need to get beyong GetFlushRecPtr(). But that may never
>> happen, because there may be incomplete (and unflushed) page in WAL
>> buffers, not forced out by any transaction. So if there's a WAL record
>> overflowing to the unflushed page, the walsender will never catch up.
>>
>> Now, this situation is apparently expected, because WalSndWaitForWal()
>> does this:
>>
>>    /*
>>     * If we're shutting down, trigger pending WAL to be written out,
>>     * otherwise we'd possibly end up waiting for WAL that never gets
>>     * written, because walwriter has shut down already.
>>     */
>>    if (got_STOPPING)
>>        XLogBackgroundFlush();
>>
>> but unfortunately that does not actually do anything, because right at
>> the very beginning XLogBackgroundFlush() does this:
>>
>>    /* back off to last completed page boundary */
>>    WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
>
>> That is, it intentionally ignores the incomplete page, which means the
>> walsender can't read the record and reach GetFlushRecPtr().
>>
>> XLogBackgroundFlush() does this since (at least) 2007, so how come we
>> never had issues with this before?
>
>I assume that's because of the following logic:
>    /* if we have already flushed that far, consider async commit records */
>    if (WriteRqst.Write <= LogwrtResult.Flush)
>    {
>        SpinLockAcquire(&XLogCtl->info_lck);
>        WriteRqst.Write = XLogCtl->asyncXactLSN;
>        SpinLockRelease(&XLogCtl->info_lck);
>        flexible = false;        /* ensure it all gets written */
>    }
>
>and various pieces of the code doing XLogSetAsyncXactLSN() to force
>flushing.  I wonder if the issue is that we're better at avoiding
>unnecessary WAL to be written due to
>6ef2eba3f57f17960b7cd4958e18aa79e357de2f
>

I don't think so, because (a) there are no async commits involved, and (b)
we originally ran into the issue on 9.6 and 6ef2eba3f57f1 is only in 10+.

>
>> But I don't think we're safe without the failover slots patch, because
>> any output plugin can do something similar - say, LogLogicalMessage() or
>> something like that. I'm not aware of a plugin doing such things, but I
>> don't think it's illegal / prohibited either. (Of course, plugins that
>> generate WAL won't be useful for decoding on standby in the future.)
>
>FWIW, I'd consider such an output plugin outright broken.
>

Why? Is that prohibited somewhere, either explicitly or implicitly? I do
see obvious issues with generating WAL from plugin (infinite cycle and so
on), but I suppose that's more a regular coding issue than something that
would make all plugins doing that broken.

FWIW I don't see any particular need to generate WAL from output plugins,
I mentioned it mostly jsut as a convenient way to trigger the issue. I
suppose there are other ways to generate a bit of WAL during shutdown.

>
>> So what I think we should do is to tweak XLogBackgroundFlush() so that
>> during shutdown it skips the back-off to page boundary, and flushes even
>> the last piece of WAL. There are only two callers anyway, so something
>> like XLogBackgroundFlush(bool) would be simple enough.
>
>I think it then just ought to be a normal XLogFlush(). I.e. something
>along the lines of:
>
>        /*
>         * If we're shutting down, trigger pending WAL to be written out,
>         * otherwise we'd possibly end up waiting for WAL that never gets
>         * written, because walwriter has shut down already.
>         */
>        if (got_STOPPING && !RecoveryInProgress())
>            XLogFlush(GetXLogInsertRecPtr());
>
>        /* Update our idea of the currently flushed position. */
>        if (!RecoveryInProgress())
>            RecentFlushPtr = GetFlushRecPtr();
>        else
>            RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>

Perhaps. That would work too, I guess.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: New vacuum option to do only freezing
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Compute XID horizon for page level index vacuum on primary.