Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Patch] ALTER SYSTEM READ ONLY |
Date | |
Msg-id | CA+Tgmoa_gvxP++gm7tK6wc95Zn=TtOPhHQtXRGrHFcNUDXHXOg@mail.gmail.com Whole thread Raw |
In response to | Re: [Patch] ALTER SYSTEM READ ONLY (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [Patch] ALTER SYSTEM READ ONLY
|
List | pgsql-hackers |
On Fri, Jul 23, 2021 at 4:03 PM Robert Haas <robertmhaas@gmail.com> wrote: > My 0003 is where I see some lingering problems. It creates > XLogAcceptWrites(), moves the appropriate stuff there, and doesn't > need the xlogreader. But it doesn't really solve the problem of how > checkpointer.c would be able to call this function with proper > arguments. It is at least better in not needing two arguments to > decide what to do, but how is checkpointer.c supposed to know what to > pass for xlogaction? Worse yet, how is checkpointer.c supposed to know > what to pass for EndOfLogTLI and EndOfLog? On further study, I found another problem: the way my patch set leaves things, XLogAcceptWrites() depends on ArchiveRecoveryRequested, which will not be correctly initialized in any process other than the startup process. So CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog) would just be skipped. Your 0001 seems to have the same problem. You added Assert(AmStartupProcess()) to the inside of the if (ArchiveRecoveryRequested) block, but that doesn't fix anything. Outside the startup process, ArchiveRecoveryRequested will always be false, but the point is that the associated stuff should be done if ArchiveRecoveryRequested would have been true in the startup process. Both of our patch sets leave things in a state where that would never happen, which is not good. Unless I'm missing something, it seems like maybe you didn't test your patches to verify that, when the XLogAcceptWrites() call comes from the checkpointer, all the same things happen that would have happened had it been called from the startup process. That would be a really good thing to have tested before posting your patches. As far as EndOfLogTLI is concerned, there are, somewhat annoyingly, several TLIs stored in XLogCtl. None of them seem to be precisely the same thing as EndLogTLI, but I am hoping that replayEndTLI is close enough. I found out pretty quickly through testing that replayEndTLI isn't always valid -- it ends up 0 if we don't enter recovery. That's not really a problem, though, because we only need it to be valid if ArchiveRecoveryRequested. The code that initializes and updates it seems to run whenever InRecovery = true, and ArchiveRecoveryRequested = true will force InRecovery = true. So it looks to me like replayEndTLI will always be initialized in the cases where we need a value. It's not yet entirely clear to me if it has to have the same value as EndOfLogTLI. I find this code comment quite mysterious: /* * EndOfLogTLI is the TLI in the filename of the XLOG segment containing * the end-of-log. It could be different from the timeline that EndOfLog * nominally belongs to, if there was a timeline switch in that segment, * and we were reading the old WAL from a segment belonging to a higher * timeline. */ EndOfLogTLI = xlogreader->seg.ws_tli; The thing is, if we were reading old WAL from a segment belonging to a higher timeline, wouldn't we have switched to that new timeline? Suppose we want WAL segment 246 from TLI 1, but we don't have that segment on TLI 1, only TLI 2. Well, as far as I know, for us to use the TLI 2 version, we'd need to have TLI 2 in the history of the recovery_target_timeline. And if that is the case, then we would have to replay through the record where the timeline changes. And if we do that, then the discrepancy postulated by the comment cannot still exist by the time we reach this code, because this code is only reached after we finish WAL redo. So I'm baffled as to how this can happen, but considering how many cases there are in this code, I sure can't promise that it doesn't. The fact that we have few tests for any of this doesn't help either. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: