Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: archive status ".ready" files may be created too early
Date
Msg-id C6234E05-6391-485F-92C3-10F7EE4D9612@amazon.com
Whole thread Raw
In response to Re: archive status ".ready" files may be created too early  (Andres Freund <andres@anarazel.de>)
Responses Re: archive status ".ready" files may be created too early
List pgsql-hackers
On 8/30/21, 2:06 PM, "Andres Freund" <andres@anarazel.de> wrote:
> When were you thinking of doing the latch sets? Adding a latch set for every
> XLogWrite() wouldn't be cheap either. Both because syscalls under a lock
> aren't free and because waking up walsender even more often isn't free (we
> already have a few threads about reducing the signalling frequency).
>
> There's also the question of what to do with single user mode. We shouldn't
> just skip creating .ready files there...

Good point.

> Although, the more I think about, the more I am confused about the trailing
>         if (XLogArchivingActive())
>                 NotifySegmentsReadyForArchive(LogwrtResult.Flush);
>
> in XLogWrite(). Shouldn't that at the very least be inside the "If asked to
> flush, do so" branch? Outside that and the finishing_seg branch
> LogwrtResult.Flush won't have moved, right? So the call to
> NotifySegmentsReadyForArchive() can't do anything, no?

The registration logic looks like this:
        1. Register boundary
        2. Get flush location from shared memory
        3. If flush location >= our just-registered boundary, nudge
           the WAL writer to create .ready files if needed

If we called NotifySegmentsReadyForArchive() before we updated the
flush location in shared memory, we might skip nudging the WAL writer
even though we should.

> Nor does it seem like we'd ever need to call NotifySegmentsReadyForArchive()
> if we started writing on the current page - flushRecPtr can't move across a
> segment boundary in that case.

I think there is a chance that we've crossed one of our recorded
segment boundaries anytime the flush pointer moves.

> When s1 is fully written out, NotifySegmentsReadyForArchive() will set
> earliestSegBoundary = latestSegBoundary = s4 and create .reaady for .s1 -
> ok. But when when s2 is flushed, we'll afaict happily create .ready files
> for s2, s3 despite s4 not yet being written, because earliestSegBoundary is
> now s4.

In this case, the .ready files for s2 and s3 wouldn't be created until
s4 is flushed to disk.

> The whole approach here of delaying .ready creation for these types of
> segments seems wrong to me. Doesn't the exact same problem also exist for
> streaming rep - which one can also use to maintain a PITR archive? walsender
> sends up to the flush location, and pg_receivewal's FindStreamingStart() will
> afaict just continue receiving from after that point.

The problem with streaming replication is being discussed in a new
thread [0].

Nathan

[0] https://postgr.es/m/202108232252.dh7uxf6oxwcy%40alvherre.pgsql


pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: Use extended statistics to estimate (Var op Var) clauses
Next
From: "Bossart, Nathan"
Date:
Subject: Re: archive status ".ready" files may be created too early