Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers
From | Amul Sul |
---|---|
Subject | Re: [Patch] ALTER SYSTEM READ ONLY |
Date | |
Msg-id | CAAJ_b97vxaTA-GZK4AyCJvrpzUsAhGLugXaYrE7mybb4FmOarw@mail.gmail.com Whole thread Raw |
In response to | Re: [Patch] ALTER SYSTEM READ ONLY (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [Patch] ALTER SYSTEM READ ONLY
|
List | pgsql-hackers |
, On Sat, Jul 24, 2021 at 1:33 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Jun 17, 2021 at 1:23 AM Amul Sul <sulamul@gmail.com> wrote: > > Attached is rebase for the latest master head. Also, I added one more > > refactoring code that deduplicates the code setting database state in the > > control file. The same code set the database state is also needed for this > > feature. > > I started studying 0001 today and found that it rearranged the order > of operations in StartupXLOG() more than I was expecting. It does, as > per previous discussions, move a bunch of things to the place where we > now call XLogParamters(). But, unsatisfyingly, InRecovery = false and > XLogReaderFree() then have to move down even further. Since the goal > here is to get to a situation where we sometimes XLogAcceptWrites() > after InRecovery = false, it didn't seem nice for this refactoring > patch to still end up with a situation where this stuff happens while > InRecovery = true. In fact, with the patch, the amount of code that > runs with InRecovery = true actually *increases*, which is not what I > think should be happening here. That's why the patch ends up having to > adjust SetMultiXactIdLimit to not Assert(!InRecovery). > > And then I started to wonder how this was ever going to work as part > of the larger patch set, because as you have it here, > XLogAcceptWrites() takes arguments XLogReaderState *xlogreader, > XLogRecPtr EndOfLog, and TimeLineID EndOfLogTLI and if the > checkpointer is calling that at a later time after the user issues > pg_prohibit_wal(false), it's going to have none of those things. So I > had a quick look at that part of the code and found this in > checkpointer.c: > > XLogAcceptWrites(true, NULL, InvalidXLogRecPtr, 0); > > For those following along from home, the additional "true" is a bool > needChkpt argument added to XLogAcceptWrites() by 0003. Well, none of > this is very satisfying. The whole purpose of passing the xlogreader > is so we can figure out whether we need a checkpoint (never mind the > question of whether the existing algorithm for determining that is > really sensible) but now we need a second argument that basically > serves the same purpose since one of the two callers to this function > won't have an xlogreader. And then we're passing the EndOfLog and > EndOfLogTLI as dummy values which seems like it's probably just > totally wrong, but if for some reason it works correctly there sure > don't seem to be any comments explaining why. > > So I started doing a bit of hacking myself and ended up with the > attached, which I think is not completely the right thing yet but I > think it's better than your version. I split this into three parts. > 0001 splits up the logic that currently decides whether to write an > end-of-recovery record or a checkpoint record and if the latter how > the checkpoint ought to be performed into two functions. > DetermineRecoveryXlogAction() figures out what we want to do, and > PerformRecoveryXlogAction() does it. It also moves the code to run > recovery_end_command and related stuff into a new function > CleanupAfterArchiveRecovery(). 0002 then builds on this by postponing > UpdateFullPageWrites(), PerformRecoveryXLogAction(), and > CleanupAfterArchiveRecovery() to just before we > XLogReportParameters(). Because of the refactoring done by 0001, this > is only a small amount of code movement. Because of the separation > between DetermineRecoveryXlogAction() and PerformRecoveryXlogAction(), > the latter doesn't need the xlogreader. So we can do > DetermineRecoveryXlogAction() at the same time as now, while the > xlogreader is available, and then we don't need it later when we > PerformRecoveryXlogAction(), because we already know what we need to > know. I think this is all fine as far as it goes. > > My 0003 is where I see some lingering problems. It creates > XLogAcceptWrites(), moves the appropriate stuff there, and doesn't > need the xlogreader. But it doesn't really solve the problem of how > checkpointer.c would be able to call this function with proper > arguments. It is at least better in not needing two arguments to > decide what to do, but how is checkpointer.c supposed to know what to > pass for xlogaction? Worse yet, how is checkpointer.c supposed to know > what to pass for EndOfLogTLI and EndOfLog? Actually, EndOfLog doesn't > seem too problematic, because that value has been stored in four (!) > places inside XLogCtl by this code: > > LogwrtResult.Write = LogwrtResult.Flush = EndOfLog; > > XLogCtl->LogwrtResult = LogwrtResult; > > XLogCtl->LogwrtRqst.Write = EndOfLog; > XLogCtl->LogwrtRqst.Flush = EndOfLog; > > Presumably we could relatively easily change things around so that we > finish one of those values ... probably one of the "write" values .. > back out of XLogCtl instead of passing it as a parameter. That would > work just as well from the checkpointer as from the startup process, > and there seems to be no way for the value to change until after > XLogAcceptWrites() has been called, so it seems fine. But that doesn't > help for the other arguments. What I'm thinking is that we should just > arrange to store EndOfLogTLI and xlogaction into XLogCtl also, and > then XLogAcceptWrites() can fish those values out of there as well, > which should be enough to make it work and do the same thing > regardless of which process is calling it. But I have run out of time > for today so have not explored coding that up. > I have spent some time thinking about making XLogAcceptWrites() independent, and for that, we need to get rid of its arguments which are available only in the startup process. The first argument xlogaction deduced by the DetermineRecoveryXlogAction(). If we are able to make this function logic independent and can deduce that xlogaction in any process, we can skip xlogaction argument passing. DetermineRecoveryXlogAction() function depends on a few global variables, valid only in the startup process are InRecovery, ArchiveRecoveryRequested & LocalPromoteIsTriggered. Out of three LocalPromoteIsTriggered's value is already available in shared memory and that can be fetched by calling LocalPromoteIsTriggered(). InRecovery's value can be guessed by as long as DBState in the control file doesn't get changed unexpectedly until XLogAcceptWrites() executes. If the DBState was not a clean shutdown, then surely the server has gone through recovery. If we could rely on DBState in the control file then we are good to go. For the last one, ArchiveRecoveryRequested, I don't see any existing and appropriate shared memory or control file information, so that can be identified if the archive recovery was requested or not. Initially, I thought to use SharedRecoveryState which is always set to RECOVERY_STATE_ARCHIVE, if the archive recovery requested. But there is another case where SharedRecoveryState could be RECOVERY_STATE_ARCHIVE irrespective of ArchiveRecoveryRequested value, that is the presence of a backup label file. If we want to use SharedRecoveryState, we need one more state which could differentiate between ArchiveRecoveryRequested and the backup label file presence case. To move ahead, I have copied ArchiveRecoveryRequested into shared memory and it will be cleared once archive cleanup is finished. With all these changes, we could get rid of xlogaction argument and DetermineRecoveryXlogAction() function. Could move its logic to PerformRecoveryXLogAction() directly. Now, the remaining two arguments of XLogAcceptWrites() are required for the CleanupAfterArchiveRecovery() function. Along with these two arguments, this function requires ArchiveRecoveryRequested and ThisTimeLineID which are again global variables. With the previous changes, we have got ArchiveRecoveryRequested into shared memory. And for ThisTimeLineID, I don't think we need to do anything since this value is available with all the backend as per the following comment: " /* * ThisTimeLineID will be same in all backends --- it identifies current * WAL timeline for the database system. */ TimeLineID ThisTimeLineID = 0; " In addition to the four places that Robert has pointed for EndOfLog, XLogCtl->lastSegSwitchLSN also holds EndOfLog value and that doesn't seem to change until WAL write is enabled. For EndOfLogTLI, I think we can safely use XLogCtl->replayEndTLI. Currently, The EndOfLogTLI is the timeline ID of the last record that xlogreader reads, but this xlogreader was simply re-fetching the last record which we have replied in redo loop if it was in recovery, if not in recovery, we don't need to worry since this value is needed only in case of ArchiveRecoveryRequested = true, which implicitly forces redo and sets replayEndTLI value. With all the above changes XLogAcceptWrites() can be called from other processes but I haven't tested that. This finding is still not complete and not too clean, perhaps, posting the patches with aforesaid changes just to confirm the direction and forward the discussion, thanks. Regards, Amul
Attachment
pgsql-hackers by date: