Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Amul Sul
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CAAJ_b94bXooXcp7GBTq_nbpc9wCzNPMgwm6umjBz1E1B-urOeg@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
List pgsql-hackers
 On Tue, Sep 15, 2020 at 2:35 PM Amul Sul <sulamul@gmail.com> wrote:
>
> Hi Andres,
>
> The attached patch has fixed the issue that you have raised & I have confirmed
> in my previous email.  Also, I tried to improve some of the things that you have
> pointed but for those changes, I am a little unsure and looking forward to the
> inputs/suggestions/confirmation on that, therefore 0003 patch is marked WIP.
>
> Please have a look at my inline reply below for the things that are changes in
> the attached version and need inputs:
>
> On Sat, Sep 12, 2020 at 10:52 AM Amul Sul <sulamul@gmail.com> wrote:
> >
> > On Thu, Sep 10, 2020 at 2:33 AM Andres Freund <andres@anarazel.de> wrote:
> > >
[... Skipped ....]
> > >
> > >
> > > > +/*
> > > > + * RequestWALProhibitChange()
> > > > + *
> > > > + * Request checkpointer to make the WALProhibitState to read-only.
> > > > + */
> > > > +static void
> > > > +RequestWALProhibitChange(void)
> > > > +{
> > > > +     /* Must not be called from checkpointer */
> > > > +     Assert(!AmCheckpointerProcess());
> > > > +     Assert(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS);
> > > > +
> > > > +     /*
> > > > +      * If in a standalone backend, just do it ourselves.
> > > > +      */
> > > > +     if (!IsPostmasterEnvironment)
> > > > +     {
> > > > +             CompleteWALProhibitChange(GetWALProhibitState());
> > > > +             return;
> > > > +     }
> > > > +
> > > > +     send_signal_to_checkpointer(SIGINT);
> > > > +
> > > > +     /* Wait for the state to change to read-only */
> > > > +     ConditionVariablePrepareToSleep(&WALProhibitState->walprohibit_cv);
> > > > +     for (;;)
> > > > +     {
> > > > +             /* We'll be done once in-progress flag bit is cleared */
> > > > +             if (!(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > > +                     break;
> > > > +
> > > > +             ConditionVariableSleep(&WALProhibitState->walprohibit_cv,
> > > > +                                                        WAIT_EVENT_WALPROHIBIT_STATE_CHANGE);
> > > > +     }
> > > > +     ConditionVariableCancelSleep();
> > >
> > > What if somebody concurrently changes the state back to READ WRITE?
> > > Won't we unnecessarily wait here?
> > >
> >
> > Yes, there will be wait.
> >
> > > That's probably fine, because we would just wait until that transition
> > > is complete too. But at least a comment about that would be
> > > good. Alternatively a "ASRO transitions completed counter" or such might
> > > be a better idea?
> > >
> >
> > Ok, will add comments but could you please elaborate little a bit about "ASRO
> > transitions completed counter"  and is there any existing counter I can refer
> > to?
> >

In an off-list discussion, Robert had explained to me this counter thing and
its requirement.

I tried to add the same as "shared WAL prohibited state generation" in the
attached version. The implementation is quite similar to the generation counter
in the super barrier. In the attached version, when a backend makes a request
for the WAL prohibit state changes then a generation number will be given to
that backend to wait on and that wait will be ended when the shared generation
counter changes.

> > >
[... Skipped ....]
> > > > +/*
> > > > + * SetWALProhibitState()
> > > > + *
> > > > + * Change current WAL prohibit state to the input state.
> > > > + *
> > > > + * If the server is already completely moved to the requested WAL prohibit
> > > > + * state, or if the desired state is same as the current state, return false,
> > > > + * indicating that the server state did not change. Else return true.
> > > > + */
> > > > +bool
> > > > +SetWALProhibitState(uint32 new_state)
> > > > +{
> > > > +     bool            state_updated = false;
> > > > +     uint32          cur_state;
> > > > +
> > > > +     cur_state = GetWALProhibitState();
> > > > +
> > > > +     /* Server is already in requested state */
> > > > +     if (new_state == cur_state ||
> > > > +             new_state == (cur_state | WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > > +             return false;
> > > > +
> > > > +     /* Prevent concurrent contrary in progress transition state setting */
> > > > +     if ((new_state & WALPROHIBIT_TRANSITION_IN_PROGRESS) &&
> > > > +             (cur_state & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > > +     {
> > > > +             if (cur_state & WALPROHIBIT_STATE_READ_ONLY)
> > > > +                     ereport(ERROR,
> > > > +                                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > +                                      errmsg("system state transition to read only is already in progress"),
> > > > +                                      errhint("Try after sometime again.")));
> > > > +             else
> > > > +                     ereport(ERROR,
> > > > +                                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > +                                      errmsg("system state transition to read write is already in progress"),
> > > > +                                      errhint("Try after sometime again.")));
> > > > +     }
> > > > +
> > > > +     /* Update new state in share memory */
> > > > +     state_updated =
> > > > +             pg_atomic_compare_exchange_u32(&WALProhibitState->SharedWALProhibitState,
> > > > +                                                                        &cur_state, new_state);
> > > > +
> > > > +     if (!state_updated)
> > > > +             ereport(ERROR,
> > > > +                             (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > +                              errmsg("system read write state concurrently changed"),
> > > > +                              errhint("Try after sometime again.")));
> > > > +
> > >
> > > I don't think it's safe to use pg_atomic_compare_exchange_u32() outside
> > > of a loop. I think there's platforms (basically all load-linked /
> > > store-conditional architectures) where than can fail spuriously.
> > >
> > > Also, there's no memory barrier around GetWALProhibitState, so there's
> > > no guarantee it's not an out-of-date value you're starting with.
> > >
> >
> > How about having some kind of lock instead what Robert have suggested
> > previously[3] ?
> >
>
> I would like to discuss this point more. In the attached version I have added
> WALProhibitLock to protect shared walprohibit state updates.  I was a little
> unsure do we want another spinlock what XLogCtlData has which is mostly used to
> read the shared variable and for the update, both are used e.g. LogwrtResult.
>
> Right now I haven't added and shared walprohibit state was fetch using a
> volatile pointer. Do we need a spinlock there, I am not sure why? Thoughts?
>

I reverted this WALProhibitLock implementation since with changes in the
attached version I don't think we need that locking.

Regards,
Amul

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Next
From: Peter Eisentraut
Date:
Subject: Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path