Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers
From | Amul Sul |
---|---|
Subject | Re: [Patch] ALTER SYSTEM READ ONLY |
Date | |
Msg-id | CAAJ_b96G-oBxDC3C7Y72ER09bsheGHOxBK1HXHVOyHNXjTDmcA@mail.gmail.com Whole thread Raw |
In response to | Re: [Patch] ALTER SYSTEM READ ONLY (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [Patch] ALTER SYSTEM READ ONLY
|
List | pgsql-hackers |
On Wed, Sep 15, 2021 at 9:38 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Sep 15, 2021 at 10:32 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Putting these changes into 0001 seems to make no sense. It seems like > > they should be part of 0003, or maybe a new 0004 patch. > > After looking at this a little bit more, I think it's really necessary > to separate out all of your changes into separate patches at least for > initial review. It's particularly important to separate code movement > changes from other kinds of changes. 0001 was just moving code before, > and so was 0002, but now both are making other changes, which is not > easy to see from looking at the 'git diff' output. For that reason > it's not so easy to understand exactly what you've changed here and > analyze it. > Ok, understood, I have separated my changes into 0001 and 0002 patch, and the refactoring patches start from 0003. In the 0001 patch, I have copied ArchiveRecoveryRequested to shared memory as said previously. Coping ArchiveRecoveryRequested value to shared memory is not really interesting, and I think somehow we should reuse existing variable, (perhaps, with some modification of the information it can store, e.g. adding one more enum value for SharedRecoveryState or something else), thinking on the same. In addition to that, I tried to turn down the scope of ArchiveRecoveryRequested global variable. Now, this is a static variable, and the scope is limited to xlog.c file like LocalXLogInsertAllowed and can be accessed through the newly added function ArchiveRecoveryIsRequested() (like PromoteIsTriggered()). Let me know what you think about the approach. In 0002 patch is a mixed one where I tried to remove the dependencies on global variables and local variables belonging to StartupXLOG(). I am still worried about the InRecovery value that needs to be deduced afterward inside XLogAcceptWrites(). Currently, relying on ControlFile->state != DB_SHUTDOWNED check but I think that will not be good for ASRO where we plan to skip XLogAcceptWrites() work only and let the StartupXLOG() do the rest of the work as it is where it will going to update ControlFile's DBState to DB_IN_PRODUCTION, then we might need some ugly kludge to call PerformRecoveryXLogAction() in checkpointer irrespective of DBState, which makes me a bit uncomfortable. > I poked around a little bit at these patches, looking for > perhaps-interesting global variables upon which the code called from > XLogAcceptWrites() would depend with your patches applied. The most > interesting ones seem to be (1) ThisTimeLineID, which you mentioned > and which may be fine but I'm not totally convinced yet, (2) > LocalXLogInsertAllowed, which is probably not broken but I'm thinking > we may want to redesign that mechanism somehow to make it cleaner, and Thanks for the off-list detailed explanation on this. For somebody else who might be reading this, the concern here is (not really a concern, it is a good thing to improve) the LocalSetXLogInsertAllowed() function call, is a kind of hack that enables wal writes irrespective of RecoveryInProgress() for a shorter period. E.g. see following code in StartupXLOG: " LocalSetXLogInsertAllowed(); UpdateFullPageWrites(); LocalXLogInsertAllowed = -1; .... .... /* * If any of the critical GUCs have changed, log them before we allow * backends to write WAL. */ LocalSetXLogInsertAllowed(); XLogReportParameters(); " Instead of explicitly enabling wal insert, somehow that implicitly allowed for the startup process and/or the checkpointer doing the first checkpoint and/or wal writes after the recovery. Well, the current LocalSetXLogInsertAllowed() mechanism is not really harming anything or bad and does not necessarily need to change but it would be nice if we were able to come up with something much cleaner, bug-free, and 100% perfect enough design. (Hope I am not missing anything from the discussion). > (3) CheckpointStats, which is called from RemoveXlogFile which is > called from RemoveNonParentXlogFiles which is called from > CleanupAfterArchiveRecovery which is called from XLogAcceptWrites. > This last one is actually pretty weird already in the existing code. > It sort of looks like RemoveXlogFile() only expects to be called from > the checkpointer (or a standalone backend) so that it can update > CheckpointStats and have that just work, but actually it's also called > from the startup process when a timeline switch happens. I don't know > whether the fact that the increments to ckpt_segs_recycled get lost in > that case should be considered an intentional behavior that should be > preserved or an inadvertent mistake. > Maybe I could be wrong, but I think that is intentional. It removes pre-allocated or bogus files of the old timeline which are not supposed to be considered in stats. The comments for CheckpointStatsData might not be clear but comment at the calling RemoveNonParentXlogFiles() place inside StartupXLOG hints the same: " /* * Before we continue on the new timeline, clean up any * (possibly bogus) future WAL segments on the old * timeline. */ RemoveNonParentXlogFiles(EndRecPtr, ThisTimeLineID); .... .... * We switched to a new timeline. Clean up segments on the old * timeline. * * If there are any higher-numbered segments on the old timeline, * remove them. They might contain valid WAL, but they might also be * pre-allocated files containing garbage. In any case, they are not * part of the new timeline's history so we don't need them. */ RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); " > So I think you've covered most of the necessary things here, with > probably some more discussion needed on whether you've done the right > things... > Thanks, Robert, for your time. Regards, Amul Sul
Attachment
- v35-0005-Create-XLogAcceptWrites-function-with-code-from-.patch
- v35-0002-miscellaneous-remove-dependency-on-global-and-lo.patch
- v35-0004-Postpone-some-end-of-recovery-operations-relatin.patch
- v35-0001-Store-ArchiveRecoveryRequested-in-shared-memory-.patch
- v35-0003-Refactor-some-end-of-recovery-code-out-of-Startu.patch
pgsql-hackers by date: