Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Patch] ALTER SYSTEM READ ONLY |
Date | |
Msg-id | CA+TgmobppXiL598ytRivAX6-MoRxDxNV4np1h+ybbJdxf2uxgw@mail.gmail.com Whole thread Raw |
In response to | Re: [Patch] ALTER SYSTEM READ ONLY (Amul Sul <sulamul@gmail.com>) |
Responses |
Re: [Patch] ALTER SYSTEM READ ONLY
|
List | pgsql-hackers |
On Tue, Oct 12, 2021 at 8:18 AM Amul Sul <sulamul@gmail.com> wrote: > In the attached version I have fixed this issue by restoring missingContrecPtr. > > To handle abortedRecPtr and missingContrecPtr newly added global > variables thought the commit # ff9f111bce24, we don't need to store > them in the shared memory separately, instead, we need a flag that > indicates a broken record found previously, at the end of recovery, so > that we can overwrite contrecord. > > The missingContrecPtr is assigned to the EndOfLog, and we have handled > EndOfLog previously in the 0004 patch, and the abortedRecPtr is the > same as the lastReplayedEndRecPtr, AFAICS. I have added an assert to > ensure that the lastReplayedEndRecPtr value is the same as the > abortedRecPtr, but I think that is not needed, we can go ahead and > write an overwrite-contrecord starting at lastReplayedEndRecPtr. I thought that it made sense to commit 0001 and 0002 at this point, so I have done that. I think that the treatment of missingContrecPtr and abortedRecPtr may need more thought yet, so at least for that reason I don't think it's a good idea to proceed with 0004 yet. 0003 is just code movement so I guess that can be committed whenever we're confident that we know exactly which things we want to end up inside XLogAcceptWrites(). I do have a few ideas after studying this a bit more: - I wonder whether, in addition to moving a few things later as 0002 did, we also ought to think about moving one thing earlier, specifically XLogReportParameters(). Right now, we have, I believe, four things that write WAL at the end of recovery: CreateOverwriteContrecordRecord(), UpdateFullPageWrites(), PerformRecoveryXLogAction(), and XLogReportParameters(). As the code is structured now, we do the first three of those things, and then do a bunch of other stuff inside CleanupAfterArchiveRecovery() like running recovery_end_command, and removing non-parent xlog files, and archiving the partial segment, and then come back and do the fourth one. Is there any good reason for that? If not, I think doing them all together would be cleaner, and would propose to reverse the order of CleanupAfterArchiveRecovery() and XLogReportParameters(). - If we did that, then I would further propose to adjust things so that we remove the call to LocalSetXLogInsertAllowed() and the assignment LocalXLogInsertAllowed = -1 from inside CreateEndOfRecoveryRecord(), the LocalXLogInsertAllowed = -1 from just after UpdateFullPageWrites(), and the call to LocalSetXLogInsertAllowed() just before XLogReportParameters(). Instead, just let the call to LocalSetXLogInsertAllowed() right before CreateOverwriteContrecordRecord() remain in effect. There doesn't seem to be much point in flipping that switch off and on again, and the fact that we have been doing so is in my view just evidence that StartupXLOG() doesn't do a very good job of getting related code all into one place. - It seems really tempting to invent a fourth RecoveryState value that indicates that we are done with REDO but not yet in production, and maybe also to rename RecoveryState to something like WALState. I'm thinking of something like WAL_STATE_CRASH_RECOVERY, WAL_STATE_ARCHIVE_RECOVERY, WAL_STATE_REDO_COMPLETE, and WAL_STATE_PRODUCTION. Then, instead of having LocalSetXLogInsertAllowed(), we could teach XLogInsertAllowed() that the startup process and the checkpointer are allowed to insert WAL when the state is WAL_STATE_REDO_COMPLETE, but other processes only once we reach WAL_STATE_PRODUCTION. We would set WAL_STATE_REDO_COMPLETE where we now call LocalSetXLogInsertAllowed(). It's necessary to include the checkpointer, or at least I think it is, because PerformRecoveryXLogAction() might call RequestCheckpoint(), and that's got to work. If we did this, then I think it would also solve another problem which the overall patch set has to address somehow. Say that we eventually move responsibility for the to-be-created XLogAcceptWrites() function from the startup process to the checkpointer, as proposed. The checkpointer needs to know when to call it ... and the answer with this change is simple: when we reach WAL_STATE_REDO_COMPLETE, it's time! But this idea is not completely problem-free. I spent some time poking at it and I think it's a little hard to come up with a satisfying way to code XLogInsertAllowed(). Right now that function calls RecoveryInProgress(), and if RecoveryInProgress() decides that recovery is no longer in progress, it calls InitXLOGAccess(). However, that presumes that the only reason you'd call RecoveryInProgress() is to figure out whether you should write WAL, which I don't think is really true, and it also means that, when the wal state is WAL_STATE_REDO_COMPLETE, RecoveryInProgress() would need to return true in the checkpointer and startup process and false everywhere else, which does not sound like a great idea. It seems fine to say that xlog insertion is allowed in some processes but not others, because not all processes are necessarily equally privileged, but whether or not we're in recovery is supposed to be something about which everyone agrees, so answering that question differently in different processes doesn't seem nice. XLogInsertAllowed() could be rewritten to check the state directly and make its own determination, without relying on RecoveryInProgress(), and I think that might be the right way to go here. But that isn't entirely problem-free either, because there's a lot of code that uses RecoveryInProgress() to answer the question "should I write WAL?" and therefore it's not great if RecoveryInProgress() is returning an answer that is inconsistent with XLogInsertAllowed(). MarkBufferDirtyHint() and heap_page_prune_opt() are examples of this kind of coding. It probably wouldn't break in practice right away, because most of that code never runs in the startup process or the checkpointer and would therefore never notice the difference in behavior between those two functions, but if in the future we get the read-only feature that this thread is supposed to be about, we'd have problems. Not all RecoveryInProgress() calls have this sense - e.g. sendDir() in basebackup.c is trying to figure out whether recovery ended during the backup, not whether we can write WAL. But perhaps this is a good time to go and replace RecoveryInProgress() checks that are intending to decide whether or not it's OK to write WAL with XLogInsertAllowed() checks (noting that the return value is reversed). If we did that, then I think RecoveryInProgress() could also NOT call InitXLOGAccess(), and that could be done only by XLogInsertAllowed(), which seems like it might be better. But I haven't really tried to code all of this up, so I'm not really sure how it all works out. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: