Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: archive status ".ready" files may be created too early |
Date | |
Msg-id | 20210831023839.7brycllwyyapbrfv@alap3.anarazel.de Whole thread Raw |
In response to | Re: archive status ".ready" files may be created too early ("Bossart, Nathan" <bossartn@amazon.com>) |
Responses |
Re: archive status ".ready" files may be created too early
|
List | pgsql-hackers |
Hi, On 2021-08-30 22:39:04 +0000, Bossart, Nathan wrote: > On 8/30/21, 2:06 PM, "Andres Freund" <andres@anarazel.de> wrote: > > When were you thinking of doing the latch sets? Adding a latch set for every > > XLogWrite() wouldn't be cheap either. Both because syscalls under a lock > > aren't free and because waking up walsender even more often isn't free (we > > already have a few threads about reducing the signalling frequency). > > > > There's also the question of what to do with single user mode. We shouldn't > > just skip creating .ready files there... > > Good point. > > > Although, the more I think about, the more I am confused about the trailing > > if (XLogArchivingActive()) > > NotifySegmentsReadyForArchive(LogwrtResult.Flush); > > > > in XLogWrite(). Shouldn't that at the very least be inside the "If asked to > > flush, do so" branch? Outside that and the finishing_seg branch > > LogwrtResult.Flush won't have moved, right? So the call to > > NotifySegmentsReadyForArchive() can't do anything, no? > > The registration logic looks like this: > 1. Register boundary > 2. Get flush location from shared memory > 3. If flush location >= our just-registered boundary, nudge > the WAL writer to create .ready files if needed > > If we called NotifySegmentsReadyForArchive() before we updated the > flush location in shared memory, we might skip nudging the WAL writer > even though we should. That's trivial to address - just have a local variable saying whether we need to call NotifySegmentsReadyForArchive(). Note that the finishing_seg path currently calls NotifySegmentsReadyForArchive() before the shared memory flush location is updated. > > When s1 is fully written out, NotifySegmentsReadyForArchive() will set > > earliestSegBoundary = latestSegBoundary = s4 and create .reaady for .s1 - > > ok. But when when s2 is flushed, we'll afaict happily create .ready files > > for s2, s3 despite s4 not yet being written, because earliestSegBoundary is > > now s4. > > In this case, the .ready files for s2 and s3 wouldn't be created until > s4 is flushed to disk. I don't think that's true as the code stands today? The NotifySegmentsReadyForArchive() for s2 will update earliestSegBoundary to s4, because latestSegBoundary = 4 and earliestSegBoundary = 1, triggering the keep_latest branch. Any subsequent NotifySegmentsReadyForArchive() with a segment < 4 will then be able to flush s2 and s3? > > The whole approach here of delaying .ready creation for these types of > > segments seems wrong to me. Doesn't the exact same problem also exist for > > streaming rep - which one can also use to maintain a PITR archive? walsender > > sends up to the flush location, and pg_receivewal's FindStreamingStart() will > > afaict just continue receiving from after that point. > > The problem with streaming replication is being discussed in a new > thread [0]. I don't think it's sensible to fix these separately. It'd be one thing to do that for HEAD, but on the back branches? And that this patch hasn't gotten any performance testing is scary. Greetings, Andres Freund
pgsql-hackers by date: