Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From Andres Freund
Subject Re: archive status ".ready" files may be created too early
Date
Msg-id 20210831023839.7brycllwyyapbrfv@alap3.anarazel.de
Whole thread Raw
In response to Re: archive status ".ready" files may be created too early  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: archive status ".ready" files may be created too early
List pgsql-hackers
Hi,

On 2021-08-30 22:39:04 +0000, Bossart, Nathan wrote:
> On 8/30/21, 2:06 PM, "Andres Freund" <andres@anarazel.de> wrote:
> > When were you thinking of doing the latch sets? Adding a latch set for every
> > XLogWrite() wouldn't be cheap either. Both because syscalls under a lock
> > aren't free and because waking up walsender even more often isn't free (we
> > already have a few threads about reducing the signalling frequency).
> >
> > There's also the question of what to do with single user mode. We shouldn't
> > just skip creating .ready files there...
> 
> Good point.
> 
> > Although, the more I think about, the more I am confused about the trailing
> >         if (XLogArchivingActive())
> >                 NotifySegmentsReadyForArchive(LogwrtResult.Flush);
> >
> > in XLogWrite(). Shouldn't that at the very least be inside the "If asked to
> > flush, do so" branch? Outside that and the finishing_seg branch
> > LogwrtResult.Flush won't have moved, right? So the call to
> > NotifySegmentsReadyForArchive() can't do anything, no?
> 
> The registration logic looks like this:
>         1. Register boundary
>         2. Get flush location from shared memory
>         3. If flush location >= our just-registered boundary, nudge
>            the WAL writer to create .ready files if needed
> 
> If we called NotifySegmentsReadyForArchive() before we updated the
> flush location in shared memory, we might skip nudging the WAL writer
> even though we should.

That's trivial to address - just have a local variable saying whether we need
to call NotifySegmentsReadyForArchive().

Note that the finishing_seg path currently calls
NotifySegmentsReadyForArchive() before the shared memory flush location is
updated.


> > When s1 is fully written out, NotifySegmentsReadyForArchive() will set
> > earliestSegBoundary = latestSegBoundary = s4 and create .reaady for .s1 -
> > ok. But when when s2 is flushed, we'll afaict happily create .ready files
> > for s2, s3 despite s4 not yet being written, because earliestSegBoundary is
> > now s4.
> 
> In this case, the .ready files for s2 and s3 wouldn't be created until
> s4 is flushed to disk.

I don't think that's true as the code stands today? The
NotifySegmentsReadyForArchive() for s2 will update earliestSegBoundary to s4,
because latestSegBoundary = 4 and earliestSegBoundary = 1, triggering the
keep_latest branch. Any subsequent NotifySegmentsReadyForArchive() with a
segment < 4 will then be able to flush s2 and s3?


> > The whole approach here of delaying .ready creation for these types of
> > segments seems wrong to me. Doesn't the exact same problem also exist for
> > streaming rep - which one can also use to maintain a PITR archive? walsender
> > sends up to the flush location, and pg_receivewal's FindStreamingStart() will
> > afaict just continue receiving from after that point.
> 
> The problem with streaming replication is being discussed in a new
> thread [0].

I don't think it's sensible to fix these separately. It'd be one thing to do
that for HEAD, but on the back branches? And that this patch hasn't gotten any
performance testing is scary.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Replication slot drop message is sent after pgstats shutdown.
Next
From: vignesh C
Date:
Subject: Re: Added missing invalidations for all tables publication