Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Patch] ALTER SYSTEM READ ONLY |
Date | |
Msg-id | CA+TgmoZf2by_6kbY0JntGEmrSkj3DxUTNZDXORNasGdsSmkJjA@mail.gmail.com Whole thread Raw |
In response to | Re: [Patch] ALTER SYSTEM READ ONLY (Amul Sul <sulamul@gmail.com>) |
Responses |
Re: [Patch] ALTER SYSTEM READ ONLY
|
List | pgsql-hackers |
On Thu, Jan 14, 2021 at 6:29 AM Amul Sul <sulamul@gmail.com> wrote: > To move development, testing, and the review forward, I have commented out the > code acquiring CheckpointLock from CreateCheckPoint() in the 0003 patch and > added the changes for the checkpointer so that system read-write state change > request can be processed as soon as possible, as suggested by Robert[1]. I am extremely doubtful about SetWALProhibitState()'s claim that "The final state can only be requested by the checkpointer or by the single-user so that there will be no chance that the server is already in the desired final state." It seems like there is an obvious race condition: CompleteWALProhibitChange() is called with a cur_state_gen argument which embeds the last state we saw, but there's nothing to keep it from changing between the time we saw it and the time that function calls SetWALProhibitState(), is there? We aren't holding any lock. It seems to me that SetWALProhibitState() needs to be rewritten to avoid this assumption. On a related note, SetWALProhibitState() has only two callers. One passes is_final_state as true, and the other as false: it's never a variable. The two cases are handled mostly differently. This doesn't seem good. A lot of the logic in this function should probably be moved to the calling sites, especially because it's almost certainly wrong for this function to be basing what it does on the *current* WAL prohibit state rather than the WAL prohibit state that was in effect at the time we made the decision to call this function in the first place. As I mentioned in the previous paragraph, that's a built-in race condition. To put that another way, this function should NOT feel free to call GetWALProhibitStateGen(). I don't really see why we should have both an SQL callable function pg_alter_wal_prohibit_state() and also a DDL command for this. If we're going to go with a functional interface, and I guess the idea of that is to make it so GRANT EXECUTE works, then why not just get rid of the DDL? RequestWALProhibitChange() doesn't look very nice. It seems like it's basically the second half of pg_alter_wal_prohibit_state(), not being called from anywhere else. It doesn't seem to add anything to separate it out like this; the interface between the two is not especially clean. It seems odd that ProcessWALProhibitStateChangeRequest() returns without doing anything if !AmCheckpointerProcess(), rather than having that be an Assert(). Why is it like that? I think WALProhibitStateShmemInit() would probably look more similar to other functions if it did if (found) { stuff; } rather than if (!found) return; stuff; -- but I might be wrong about the existing precedent. The SetLastCheckPointSkipped() and LastCheckPointIsSkipped() stuff seems confusingly-named, because we have other reasons for skipping a checkpoint that are not what we're talking about here. I think this is talking about whether we've performed a checkpoint after recovery, and the naming should reflect that. But I think there's something else wrong with the design, too: why is this protected by a spinlock? I have questions in both directions. On the one hand, I wonder why we need any kind of lock at all. On the other hand, if we do need a lock, I wonder why a spinlock that protects only the setting and clearing of the flag and nothing else is sufficient. There are zero comments explaining what the idea behind this locking regime is, and I can't understand why it should be correct. In fact, I think this area needs a broader rethink. Like, the way you integrated that stuff into StartupXLog(), it sure looks to me like we might skip the checkpoint but still try to write other WAL records. Before we reach the offending segment of code, we call UpdateFullPageWrites(). Afterwards, we call XLogReportParameters(). Both of those are going to potentially write WAL. I guess you could argue that's OK, on the grounds that neither function is necessarily going to log anything, but I don't think I believe that. If I make my server read only, take the OS down, change some GUCs, and then start it again, I don't expect it to PANIC. Also, I doubt that it's OK to skip the checkpoint as this code does and then go ahead and execute recovery_end_command and update the control file anyway. It sure looks like the existing code is written with the assumption that the checkpoint happens before those other things. One idea I just had was: suppose that, if the system is READ ONLY, we don't actually exit recovery right away, and the startup process doesn't exit. Instead we just sit there and wait for the system to be made read-write again before doing anything else. But then if hot_standby=false, there's no way for someone to execute a ALTER SYSTEM READ WRITE and/or pg_alter_wal_prohibit_state(), which seems bad. So perhaps we need to let in regular connections *as if* the system were read-write while postponing not just the end-of-recovery checkpoint but also the other associated things like UpdateFullPageWrites(), XLogReportParameters(), recovery_end_command, control file update, etc. until the end of recovery. Or maybe that's not the right idea either, but regardless of what we do here it needs clear comments justifying it. The current version of the patch does not have any. I think that you've mis-positioned the check in autovacuum.c. Note that the comment right afterwards says: "a worker finished, or postmaster signaled failure to start a worker". Those are things we should still check for even when the system is R/O. What we don't want to do in that case is start new workers. I would suggest revising the comment that starts with "There are some conditions that..." to mention three conditions. The new one would be that the system is in a read-only state. I'd mention that first, making the existing ones #2 and #3, and then add the code to "continue;" in that case right after that comment, before setting current_time. SendsSignalToCheckpointer() has multiple problems. As far as the name, it should at least be "Send" rather than "Sends" but the corresponding functions elsewhere have names like SendPostmasterSignal() not SendSignalToPostmaster(). Also, why is it OK for it to use elog() rather than ereport()? Also, why is it an error if the checkpointer's not running, rather than just having the next checkpointer do it when it's relaunched? Also, why pass SIGINT as an argument if there's only one caller? A related thing that's also odd is that sending SIGINT calls ReqCheckpointHandler() not anything specific to prohibiting WAL. That is probably OK because that function now just sets the latch. But then we could stop sending SIGINT to the checkpointer at all and just send SIGUSR1, which would also set the latch, without using up a signal. I wonder if we should make that change as a separate preparatory patch. It seems like that would clear things up; it would remove the oddity that this patch is invoking a handler called ReqCheckpointerHandler() with no intention of requesting a checkpoint, because ReqCheckpointerHandler() would be gone. That problem could also be fixed by renaming ReqCheckpointerHandler() to something clearer, but that seems inferior. This is probably not a complete list of problems. Review from others would be appreciated. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: