Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers
From | Amul Sul |
---|---|
Subject | Re: [Patch] ALTER SYSTEM READ ONLY |
Date | |
Msg-id | CAAJ_b95MUL4cmTDqKbgkXVFx5ewTkW60JUTLj8b6W1UyXVqDRg@mail.gmail.com Whole thread Raw |
In response to | Re: [Patch] ALTER SYSTEM READ ONLY (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [Patch] ALTER SYSTEM READ ONLY
|
List | pgsql-hackers |
On Sat, Aug 29, 2020 at 1:23 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Aug 19, 2020 at 6:28 AM Amul Sul <sulamul@gmail.com> wrote: > > Attached is a rebased on top of the latest master head (# 3e98c0bafb2). > > Does anyone, especially anyone named Andres Freund, have comments on > 0001? That work is somewhat independent of the rest of this patch set > from a theoretical point of view, and it seems like if nobody sees a > problem with the line of attack there, it would make sense to go ahead > and commit that part. Considering that this global barrier stuff is > new and that I'm not sure how well we really understand the problems > yet, there's a possibility that we might end up revising these details > again. I understand that most people, including me, are somewhat > reluctant to see experimental code get committed, in this case that > ship has basically sailed already, since neither of the patches that > we thought would use the barrier mechanism end up making it into v13. > I don't think it's really making things any worse to try to improve > the mechanism. > > 0002 isn't separately committable, but I don't see anything wrong with it. > > Regarding 0003: > > I don't understand why ProcessBarrierWALProhibit() can safely assert > that the WALPROHIBIT_STATE_READ_ONLY is set. > IF blocks entered to kill a transaction have valid XID & this happens only in case of system state changing to READ_ONLY. > + errhint("Cannot continue a > transaction if it has performed writes while system is read only."))); > > This sentence is bad because it makes it sound like the current > transaction successfully performed a write after the system had > already become read-only. I think something like errdetail("Sessions > with open write transactions must be terminated.") would be better. > Ok, changed as suggested in the attached version. > I think SetWALProhibitState() could be in walprohibit.c rather than > xlog.c. Also, this function appears to have obvious race conditions. > It fetches the current state, then thinks things over while holding no > lock, and then unconditionally updates the current state. What happens > if somebody else has changed the state in the meantime? I had sort of > imagined that we'd use something like pg_atomic_uint32 for this and > manipulate it using compare-and-swap operations. Using some kind of > lock is probably fine, too, but you have to hold it long enough that > the variable can't change under you while you're still deciding > whether it's OK to modify it, or else recheck after reacquiring the > lock that the value doesn't differ from what you expect. > > I think the choice to use info_lck to synchronize > SharedWALProhibitState is very strange -- what is the justification > for that? I thought the idea might be that we frequently need to check > SharedWALProhibitState at times when we'd be holding info_lck anyway, > but it looks to me like you always do separate acquisitions of > info_lck just for this, in which case I don't see why we should use it > here instead of a separate lock. For that matter, why does this need > to be part of XLogCtlData rather than a separate shared memory area > that is private to walprohibit.c? > In the attached patch I added a separate shared memory structure for WAL prohibit state. SharedWALProhibitState is now pg_atomic_uint32 and part of that structure instead of XLogCtlData. The shared state will be changed using a compare-and-swap operation. I hope that should be enough to avoid said race conditions. > - else > + /* > + * Can't perform checkpoint or xlog rotation without writing WAL. > + */ > + else if (XLogInsertAllowed()) > > Not project style. > Corrected. > + case WAIT_EVENT_SYSTEM_WALPROHIBIT_STATE_CHANGE: > > Can we drop the word SYSTEM here to make this shorter, or would that > break some convention? > No issue, removed SYSTEM. > +/* > + * NB: The return string should be the same as the _ShowOption() for boolean > + * type. > + */ > + static const char * > + show_system_is_read_only(void) > +{ > Fixed. > I'm not sure the comment is appropriate here, but I'm very sure the > extra spaces before "static" and "show" are not per style. > > + /* We'll be done once in-progress flag bit is cleared */ > > Another whitespace mistake. > Fixed. > + elog(DEBUG1, "WALProhibitRequest: Waiting for checkpointer"); > + elog(DEBUG1, "Done WALProhibitRequest"); > > I think these should be removed. > Removed. > Can WALProhibitRequest() and performWALProhibitStateChange() be moved > to walprohibit.c, just to bring more of the code for this feature > together in one place? Maybe we could also rename them to > RequestWALProhibitChange() and CompleteWALProhibitChange()? > Yes, I have moved these functions to walprohibit.c and renamed as suggested. For this, I needed to add few helper functions to send a signal to checkpointer and update Control File, as send_signal_to_checkpointer & SetControlFileWALProhibitFlag() respectively, since checkpointer_pid or ControlFile are not directly accessible from walprohibit.c > - * think it should leave the child state in place. > + * think it should leave the child state in place. Note that the upper > + * transaction will be a force to ready-only irrespective of > its previous > + * status if the server state is WAL prohibited. > */ > - XactReadOnly = s->prevXactReadOnly; > + XactReadOnly = s->prevXactReadOnly || !XLogInsertAllowed(); > > Both instances of this pattern seem sketchy to me. You don't expect > that reverting the state to a previous state will instead change to a > different state that doesn't match up with what you had before. What > is the bad thing that would happen if we did not make this change? > We can drop these changes now since we are simply terminating sessions for those who have performed or expected to perform write operations. > - * Else, must check to see if we're still in recovery. > + * Else, must check to see if we're still in recovery > > Spurious change. > Fixed. > + /* Request checkpoint */ > + RequestCheckpoint(CHECKPOINT_IMMEDIATE); > + ereport(LOG, (errmsg("system is now read write"))); > > This does not seem right. Perhaps the intention here was that the > system should perform a checkpoint when it switches to read-write > state after having skipped the startup checkpoint. But why would we do > this unconditionally in all cases where we just went to a read-write > state? > You are correct since this could be expensive if the system changes to read-only for a shorter period. For the initial version, I did this unconditionally to avoid additional shared-memory variables in XLogCtlData but now WAL prohibits state got its own shared-memory structure so that I have added the required variable to it. Now, doing this checkpoint conditionally with CHECKPOINT_END_OF_RECOVERY & CHECKPOINT_IMMEDIATE flag what we do in the startup process. Note that to mark end-of-recovery checkpoint has been skipped from the startup process I have added helper function as MarkCheckPointSkippedInWalProhibitState(), I am not sure the name that I have chosen is the best fit. > There's probably quite a bit more to say about 0003 but I think I'm > running too low on mental energy to say more now. > Thanks for your time and suggestions. Regards, Amul
Attachment
pgsql-hackers by date: