Home > mailing lists

Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From	Bossart, Nathan
Subject	Re: archive status ".ready" files may be created too early
Date	July 30, 2021 23:25:19
Msg-id	DA71434B-7340-4984-9B91-F085BC47A778@amazon.com Whole thread Raw
In response to	Re: archive status ".ready" files may be created too early (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses	Re: archive status ".ready" files may be created too early Re: archive status ".ready" files may be created too early
List	pgsql-hackers

Tree view

On 7/30/21, 11:34 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
> Hmm ... I'm not sure we're prepared to backpatch this kind of change.
> It seems a bit too disruptive to how replay works.  I think patch we
> should be focusing solely on patch 0001 to surgically fix the precise
> bug you see.  Does patch 0002 exist because you think that a system with
> only 0001 will not correctly deal with a crash at the right time?

Yes, that was what I was worried about.  However, I just performed a
variety of tests with just 0001 applied, and I am beginning to suspect
my concerns were unfounded.  With wal_buffers set very high,
synchronous_commit set to off, and a long sleep at the end of
XLogWrite(), I can reliably cause the archive status files to lag far
behind the current open WAL segment.  However, even if I crash at this
time, the .ready files are created when the server restarts (albeit
out of order).  This appears to be due to the call to
XLogArchiveCheckDone() in RemoveOldXlogFiles().  Therefore, we can
likely abandon 0002.

> Now, the reason I'm looking at this patch series is that we're seeing a
> related problem with walsender/walreceiver, which apparently are capable
> of creating a file in the replica that ends up not existing in the
> primary after a crash, for a reason closely related to what you
> describe for WAL archival.  I'm not sure what is going on just yet, so
> I'm not going to try and explain because I'm likely to get it wrong.

I've suspected that this is due to the use of the flushed location for
the send pointer, which AFAICT needn't align with a WAL record
boundary.

                /*
                 * Streaming the current timeline on a primary.
                 *
                 * Attempt to send all data that's already been written out and
                 * fsync'd to disk.  We cannot go further than what's been written out
                 * given the current implementation of WALRead().  And in any case
                 * it's unsafe to send WAL that is not securely down to disk on the
                 * primary: if the primary subsequently crashes and restarts, standbys
                 * must not have applied any WAL that got lost on the primary.
                 */
                 SendRqstPtr = GetFlushRecPtr();

Nathan

pgsql-hackers by date:

From: Andres Freund
Date: 30 July 2021, 23:00:44
Subject: Re: Background writer and checkpointer in crash recovery

From: Melanie Plageman
Date: 30 July 2021, 23:34:34
Subject: Re: Parallel Full Hash Join

Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

Previous

Next