Re: Incorrect logic in XLogNeedsFlush() - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Incorrect logic in XLogNeedsFlush() |
Date | |
Msg-id | CAAKRu_ZEcWNbWAf_bBpSugHh=GTS=EYmb_8npSztU=vLcn=H+g@mail.gmail.com Whole thread Raw |
In response to | Re: Incorrect logic in XLogNeedsFlush() (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Incorrect logic in XLogNeedsFlush()
Re: Incorrect logic in XLogNeedsFlush() |
List | pgsql-hackers |
On Sun, Sep 14, 2025 at 4:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Sep 12, 2025 at 10:02 PM Jeff Davis <pgsql@j-davis.com> wrote: > > > Assuming I'm right so far, then I agree with changing the test in > > XLogNeedsFlush() to !XLogInsertAllowed(), as the patch does. > > > > The comment in the patch is describing what's happening, but lost the > > "why". Perhaps something like: > > > > "During recovery, we don't flush WAL but update minRecoveryPoint > > instead. So "needs flush" is taken to mean whether minRecoveryPoint > > would need to be updated. The end-of-recovery checkpoint still must > > flush WAL like normal, though, so check !XLogInsertAllowed(). This > > check must be consistent with XLogFlush()." > > Updated as per suggestion I like the idea of adding an assert to keep the two in sync. However, two things 1) Since XLogNeedsFlush() refreshes LogwrtResult, is it possible for this assert to fail because the flush pointer was updated? I realize that that is the case with my original patch as well. If this is true, perhaps there is not a way to use XLogNeedsFlush() after XLogFlush() -- only before as a check to see if we must call XLogFlush(). 2) You've added the assert at the bottom which will only cover the flush pointer case and not the min recovery point case. Maybe that's good enough though... > > One loose end: there was some discussion about the times when > > LocalMinRecoveryPoint is valid/correct. I'm not sure I entirely > > understood -- is that no longer a concern? > > IMHO during crash recovery LocalMinRecoveryPoint and > ControlFile->minRecoveryPoint can not initialized until we replay all > WAL so those should never get accessed. However if XLogNeedsFlush() > were called during the end of the recovery checkpoint it was accessing > this which was wrong although it was not a live bug and this patch is > making that more resilient. OTOH if we are creating a restartpoint > during standby replay or archive recovery then we do access > LocalMinRecoveryPoint and ControlFile->minRecoveryPoint because by > that time we have already done the minimum recovery require for > reaching to the consistent state and we would have initialized > ControlFile->minRecoveryPoint. > > IMHO during crash recovery, the variables LocalMinRecoveryPoint and > ControlFile->minRecoveryPoint remain uninitialized until all required > WAL has been replayed. Consequently, they must not be accessed during > this phase. Previously, calling XLogNeedsFlush() during the "end of > recovery checkpoint" incorrectly accessed these uninitialized values. > While this was not a live bug, this patch hardens the code against > this type of race condition. > > OTOH, when creating a RestartPoint during standby replay or doing > archive recovery, access to both LocalMinRecoveryPoint and > ControlFile->minRecoveryPoint is correct. By this stage, it has > completed the minimum recovery required to achieve a consistent state, > and ControlFile->minRecoveryPoint would have been properly > initialized. My confusion was two things: how the startup process is getting a valid minimum recovery point during crash recovery on a standby and how bgwriter and checkpointer avoid reading the control file during crash recovery on the standby. IIUC, the minimum recovery point on the primary will always be InvalidXLogRecPtr. On the standby, during "normal" recovery, all backends (regular backends running read-only queries, checkpointer, etc) update the minimum recovery point so it is ready in case of a crash. During recovery from a crash on the standby, no regular backends will be running read-only queries because those are forbidden until crash recovery finishes. Crash recovery on the standby is finished when WAL has been replayed through the minimum recovery point -- or, at least I thought that was the point of the minimum recovery point. But, the startup process needs to replay WAL through the minimum recovery point. So, where does it get this value from? The control file? And if the startup process gets the minimum recovery point from the ControlFile, then how are bgwriter and checkpointer protected from reading it since they should only see InvalidXLogRecPtr during crash recovery? Looking at this logic in XLogFlush() -> UpdateMinRecoveryPoint() if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery) { updateMinRecoveryPoint = false; return; } Only the startup process will pass this test. Then other processes will continue to read the control file. This only works if ControlFile->minRecoveryPoint is InvalidXLogRecPtr during crash recovery. As for your patch, I think we should also update the comment at the top of XLogNeedsFlush(). Something like this when combined with your comment update and code. /* - * Test whether XLOG data has been flushed up to (at least) the given position. + * Test whether XLOG data has been flushed up to (at least) the given position + * or whether the minimum recovery point is updated past the given position. * - * Returns true if a flush is still needed. (It may be that someone else - * is already in process of flushing that far, however.) + * Returns true if a flush is still needed or if the minimum recovery point + * must be updated. + * + * It is possible that someone else is already in the process of flushing that + * far or updating the minimum recovery point that far. */ bool XLogNeedsFlush(XLogRecPtr record) { /* - * During recovery, we don't flush WAL but update minRecoveryPoint - * instead. So "needs flush" is taken to mean whether minRecoveryPoint - * would need to be updated. + * During recovery, when WAL inserts are forbidden, "needs flush" is + * really "minimum recovery point needs updating". The end-of-recovery + * checkpoint still must flush WAL like normal, though, so check + * !XLogInsertAllowed(). This check must be consistent with XLogFlush(). */ - if (RecoveryInProgress()) + if (!XLogInsertAllowed()) { + Assert(RecoveryInProgress()); And I don't know if WAL inserts could ever be forbidden outside of recovery (maybe if a read-only database mode is added) -- but maybe it is worth adding an assertion that recovery is in progress? - Melanie
pgsql-hackers by date: