Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Amul Sul
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CAAJ_b95MUL4cmTDqKbgkXVFx5ewTkW60JUTLj8b6W1UyXVqDRg@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY
List pgsql-hackers
On Sat, Aug 29, 2020 at 1:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Aug 19, 2020 at 6:28 AM Amul Sul <sulamul@gmail.com> wrote:
> > Attached is a rebased on top of the latest master head (# 3e98c0bafb2).
>
> Does anyone, especially anyone named Andres Freund, have comments on
> 0001? That work is somewhat independent of the rest of this patch set
> from a theoretical point of view, and it seems like if nobody sees a
> problem with the line of attack there, it would make sense to go ahead
> and commit that part. Considering that this global barrier stuff is
> new and that I'm not sure how well we really understand the problems
> yet, there's a possibility that we might end up revising these details
> again. I understand that most people, including me, are somewhat
> reluctant to see experimental code get committed, in this case that
> ship has basically sailed already, since neither of the patches that
> we thought would use the barrier mechanism end up making it into v13.
> I don't think it's really making things any worse to try to improve
> the mechanism.
>
> 0002 isn't separately committable, but I don't see anything wrong with it.
>
> Regarding 0003:
>
> I don't understand why ProcessBarrierWALProhibit() can safely assert
> that the WALPROHIBIT_STATE_READ_ONLY is set.
>

IF blocks entered to kill a transaction have valid XID & this happens only in
case of system state changing to READ_ONLY.

> +                                errhint("Cannot continue a
> transaction if it has performed writes while system is read only.")));
>
> This sentence is bad because it makes it sound like the current
> transaction successfully performed a write after the system had
> already become read-only. I think something like errdetail("Sessions
> with open write transactions must be terminated.") would be better.
>

Ok, changed as suggested in the attached version.

> I think SetWALProhibitState() could be in walprohibit.c rather than
> xlog.c. Also, this function appears to have obvious race conditions.
> It fetches the current state, then thinks things over while holding no
> lock, and then unconditionally updates the current state. What happens
> if somebody else has changed the state in the meantime? I had sort of
> imagined that we'd use something like pg_atomic_uint32 for this and
> manipulate it using compare-and-swap operations. Using some kind of
> lock is probably fine, too, but you have to hold it long enough that
> the variable can't change under you while you're still deciding
> whether it's OK to modify it, or else recheck after reacquiring the
> lock that the value doesn't differ from what you expect.
>
> I think the choice to use info_lck to synchronize
> SharedWALProhibitState is very strange -- what is the justification
> for that? I thought the idea might be that we frequently need to check
> SharedWALProhibitState at times when we'd be holding info_lck anyway,
> but it looks to me like you always do separate acquisitions of
> info_lck just for this, in which case I don't see why we should use it
> here instead of a separate lock. For that matter, why does this need
> to be part of XLogCtlData rather than a separate shared memory area
> that is private to walprohibit.c?
>

In the attached patch I added a separate shared memory structure for WAL
prohibit state. SharedWALProhibitState is now pg_atomic_uint32 and part of that
structure instead of XLogCtlData. The shared state will be changed using a
compare-and-swap operation.

I hope that should be enough to avoid said race conditions.

> -       else
> +       /*
> +        * Can't perform checkpoint or xlog rotation without writing WAL.
> +        */
> +       else if (XLogInsertAllowed())
>
> Not project style.
>

Corrected.

> +               case WAIT_EVENT_SYSTEM_WALPROHIBIT_STATE_CHANGE:
>
> Can we drop the word SYSTEM here to make this shorter, or would that
> break some convention?
>

No issue, removed SYSTEM.

> +/*
> + * NB: The return string should be the same as the _ShowOption() for boolean
> + * type.
> + */
> + static const char *
> + show_system_is_read_only(void)
> +{
>

Fixed.

> I'm not sure the comment is appropriate here, but I'm very sure the
> extra spaces before "static" and "show" are not per style.
>
> +               /*  We'll be done once in-progress flag bit is cleared */
>
> Another whitespace mistake.
>

Fixed.

> +               elog(DEBUG1, "WALProhibitRequest: Waiting for checkpointer");
> +       elog(DEBUG1, "Done WALProhibitRequest");
>
> I think these should be removed.
>

Removed.

> Can WALProhibitRequest() and performWALProhibitStateChange() be moved
> to walprohibit.c, just to bring more of the code for this feature
> together in one place? Maybe we could also rename them to
> RequestWALProhibitChange() and CompleteWALProhibitChange()?
>

Yes, I have moved these functions to walprohibit.c and renamed as suggested.
For this, I needed to add few helper functions to send a signal to checkpointer
and update Control File, as send_signal_to_checkpointer &
SetControlFileWALProhibitFlag() respectively, since checkpointer_pid
or ControlFile are not directly accessible from walprohibit.c

> -        * think it should leave the child state in place.
> +        * think it should leave the child state in place.  Note that the upper
> +        * transaction will be a force to ready-only irrespective of
> its previous
> +        * status if the server state is WAL prohibited.
>          */
> -       XactReadOnly = s->prevXactReadOnly;
> +       XactReadOnly = s->prevXactReadOnly || !XLogInsertAllowed();
>
> Both instances of this pattern seem sketchy to me. You don't expect
> that reverting the state to a previous state will instead change to a
> different state that doesn't match up with what you had before. What
> is the bad thing that would happen if we did not make this change?
>

We can drop these changes now since we are simply terminating sessions for those
who have performed or expected to perform write operations.

> -        * Else, must check to see if we're still in recovery.
> +        * Else, must check to see if we're still in recovery
>
> Spurious change.
>

Fixed.

> +                       /* Request checkpoint */
> +                       RequestCheckpoint(CHECKPOINT_IMMEDIATE);
> +                       ereport(LOG, (errmsg("system is now read write")));
>
> This does not seem right. Perhaps the intention here was that the
> system should perform a checkpoint when it switches to read-write
> state after having skipped the startup checkpoint. But why would we do
> this unconditionally in all cases where we just went to a read-write
> state?
>

You are correct since this could be expensive if the system changes to read-only
for a shorter period. For the initial version, I did this unconditionally to
avoid additional shared-memory variables in XLogCtlData but now WAL prohibits
state got its own shared-memory structure so that I have added the required
variable to it.  Now, doing this checkpoint conditionally with
 CHECKPOINT_END_OF_RECOVERY & CHECKPOINT_IMMEDIATE flag what we do in the
startup process. Note that to mark end-of-recovery checkpoint has been skipped
from the startup process I have added helper function as
MarkCheckPointSkippedInWalProhibitState(), I am not sure the name that I have
chosen is the best fit.

> There's probably quite a bit more to say about 0003 but I think I'm
> running too low on mental energy to say more now.
>

Thanks for your time and suggestions.

Regards,
Amul

Attachment

pgsql-hackers by date:

Previous
From: Georgios Kokolatos
Date:
Subject: Re: New default role- 'pg_read_all_data'
Next
From: Dmitry Dolgov
Date:
Subject: Group by reordering optimization