Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Amul Sul
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id CAAJ_b96-zeiZKQj5cqc1fx7TNLv726eT+KH6S5J9JesGHvcwdA@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY
List pgsql-hackers
Hi Andres,

The attached patch has fixed the issue that you have raised & I have confirmed
in my previous email.  Also, I tried to improve some of the things that you have
pointed but for those changes, I am a little unsure and looking forward to the
inputs/suggestions/confirmation on that, therefore 0003 patch is marked WIP.

Please have a look at my inline reply below for the things that are changes in
the attached version and need inputs:

On Sat, Sep 12, 2020 at 10:52 AM Amul Sul <sulamul@gmail.com> wrote:
>
> On Thu, Sep 10, 2020 at 2:33 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
>
> Thanks for your time.
>
> >
> > Thomas, there's one point below that could be relevant for you. You can
> > search for your name and/or checkpoint...
> >
> >
> > On 2020-09-01 16:43:10 +0530, Amul Sul wrote:
> > > diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
> > > index 42050ab7195..0ac826d3c2f 100644
> > > --- a/src/backend/nodes/readfuncs.c
> > > +++ b/src/backend/nodes/readfuncs.c
> > > @@ -2552,6 +2552,19 @@ _readAlternativeSubPlan(void)
> > >       READ_DONE();
> > >  }
> > >
> > > +/*
> > > + * _readAlterSystemWALProhibitState
> > > + */
> > > +static AlterSystemWALProhibitState *
> > > +_readAlterSystemWALProhibitState(void)
> > > +{
> > > +     READ_LOCALS(AlterSystemWALProhibitState);
> > > +
> > > +     READ_BOOL_FIELD(WALProhibited);
> > > +
> > > +     READ_DONE();
> > > +}
> > > +
> >
> > Why do we need readfuncs support for this?
> >
>
> I thought we need that from your previous comment[1].
>
> > > +
> > > +/*
> > > + * AlterSystemSetWALProhibitState
> > > + *
> > > + * Execute ALTER SYSTEM READ { ONLY | WRITE } statement.
> > > + */
> > > +static void
> > > +AlterSystemSetWALProhibitState(AlterSystemWALProhibitState *stmt)
> > > +{
> > > +     /* some code */
> > > +     elog(INFO, "AlterSystemSetWALProhibitState() called");
> > > +}
> >
> > As long as it's not implemented it seems better to return an ERROR.
> >
>
> Ok, will add an error in the next version.
>
> > > @@ -3195,6 +3195,16 @@ typedef struct AlterSystemStmt
> > >       VariableSetStmt *setstmt;       /* SET subcommand */
> > >  } AlterSystemStmt;
> > >
> > > +/* ----------------------
> > > + *           Alter System Read Statement
> > > + * ----------------------
> > > + */
> > > +typedef struct AlterSystemWALProhibitState
> > > +{
> > > +     NodeTag         type;
> > > +     bool            WALProhibited;
> > > +} AlterSystemWALProhibitState;
> > > +
> >
> > All the nearby fields use under_score_style names.
> >
>
> I am not sure which nearby fields having the underscore that you are referring
> to. Probably "WALProhibited" needs to be renamed to  "walprohibited" to be
> inline with the nearby fields.
>
> >
> > > From f59329e4a7285c5b132ca74473fe88e5ba537254 Mon Sep 17 00:00:00 2001
> > > From: Amul Sul <amul.sul@enterprisedb.com>
> > > Date: Fri, 19 Jun 2020 06:29:36 -0400
> > > Subject: [PATCH v6 3/5] Implement ALTER SYSTEM READ ONLY using global barrier.
> > >
> > > Implementation:
> > >
> > >  1. When a user tried to change server state to WAL-Prohibited using
> > >     ALTER SYSTEM READ ONLY command; AlterSystemSetWALProhibitState()
> > >     raises request to checkpointer by marking current state to inprogress in
> > >     shared memory.  Checkpointer, noticing that the current state is has
> >
> > "is has"
> >
> > >     WALPROHIBIT_TRANSITION_IN_PROGRESS flag set, does the barrier request, and
> > >     then acknowledges back to the backend who requested the state change once
> > >     the transition has been completed.  Final state will be updated in control
> > >     file to make it persistent across the system restarts.
> >
> > What makes checkpointer the right backend to do this work?
> >
>
> Once we've initiated the change to a read-only state, we probably want to
> always either finish that change or go back to read-write, even if the process
> that initiated the change is interrupted. Leaving the system in a
> half-way-in-between state long term seems bad. Maybe we would have put some
> background process, but choose the checkpointer in charge of making the state
> change and to avoid the new background process to keep the first version patch
> simple.  The checkpointer isn't likely to get killed, but if it does, it will
> be relaunched and the new one can clean things up.  Probably later we might want
> such a background worker that will be isn't likely to get killed.
>
> >
> > >  2. When a backend receives the WAL-Prohibited barrier, at that moment if
> > >     it is already in a transaction and the transaction already assigned XID,
> > >     then the backend will be killed by throwing FATAL(XXX: need more discussion
> > >     on this)
> >
> >
> > >  3. Otherwise, if that backend running transaction which yet to get XID
> > >     assigned we don't need to do anything special
> >
> > Somewhat garbled sentence...
> >
> >
> > >  4. A new transaction (from existing or new backend) starts as a read-only
> > >     transaction.
> >
> > Maybe "(in an existing or in a new backend)"?
> >
> >
> > >  5. Autovacuum launcher as well as checkpointer will don't do anything in
> > >     WAL-Prohibited server state until someone wakes us up.  E.g. a backend
> > >     might later on request us to put the system back to read-write.
> >
> > "will don't do anything", "might later on request us"
> >
>
> Ok, I'll fix all of this. I usually don't much focus on the commit message text
> but I try to make it as much as possible sane enough.
>
> >
> > >  6. At shutdown in WAL-Prohibited mode, we'll skip shutdown checkpoint
> > >     and xlog rotation. Starting up again will perform crash recovery(XXX:
> > >     need some discussion on this as well) but the end of recovery checkpoint
> > >     will be skipped and it will be performed when the system changed to
> > >     WAL-Permitted mode.
> >
> > Hm, this has some interesting interactions with some of Thomas' recent
> > hacking.
> >
>
> I would be so thankful for the help.
>
> >
> > >  8. Only super user can toggle WAL-Prohibit state.
> >
> > Hm. I don't quite agree with this. We try to avoid if (superuser())
> > style checks these days, because they can't be granted to other
> > users. Look at how e.g. pg_promote() - an operation of similar severity
> > - is handled. We just revoke the permission from public in
> > system_views.sql:
> > REVOKE EXECUTE ON FUNCTION pg_promote(boolean, integer) FROM public;
> >
>
> Ok, currently we don't have SQL callable function to change the system
> read-write state.  Do you want me to add that? If so, any naming suggesting? How
> about pg_make_system_read_only(bool)  or have two function as
> pg_make_system_read_only(void) & pg_make_system_read_write(void).
>

In the attached version I added SQL callable function as
pg_alter_wal_prohibit_state(bool), and another suggestion for the naming is
welcome.

For the permission denied error for ASRO READ-ONLY/READ-WRITE, I have added
ereport() in AlterSystemSetWALProhibitState() instead of aclcheck_error() and
the hint is added. Any suggestions?

> >
> > >  9. Add system_is_read_only GUC show the system state -- will true when system
> > >     is wal prohibited or in recovery.
> >
> > *shows the system state. There's also some oddity in the second part of
> > the sentence.
> >
> > Is it really correct to show system_is_read_only as true during
> > recovery? For one, recovery could end soon after, putting the system
> > into r/w mode, if it wasn't actually ALTER SYSTEM READ ONLY'd. But also,
> > during recovery the database state actually changes if there are changes
> > to replay.  ISTM it would not be a good idea to mix ASRO and
> > pg_is_in_recovery() into one GUC.
> >
>
> Well, whether the system is in recovery or wal prohibited state it is read-only
> for the user perspective, isn't it?
>
> >
> > > --- /dev/null
> > > +++ b/src/backend/access/transam/walprohibit.c
> > > @@ -0,0 +1,321 @@
> > > +/*-------------------------------------------------------------------------
> > > + *
> > > + * walprohibit.c
> > > + *           PostgreSQL write-ahead log prohibit states
> > > + *
> > > + *
> > > + * Portions Copyright (c) 2020, PostgreSQL Global Development Group
> > > + *
> > > + * src/backend/access/transam/walprohibit.c
> > > + *
> > > + *-------------------------------------------------------------------------
> > > + */
> > > +#include "postgres.h"
> > > +
> > > +#include "access/walprohibit.h"
> > > +#include "pgstat.h"
> > > +#include "port/atomics.h"
> > > +#include "postmaster/bgwriter.h"
> > > +#include "storage/condition_variable.h"
> > > +#include "storage/procsignal.h"
> > > +#include "storage/shmem.h"
> > > +
> > > +/*
> > > + * Shared-memory WAL prohibit state
> > > + */
> > > +typedef struct WALProhibitStateData
> > > +{
> > > +     /* Indicates current WAL prohibit state */
> > > +     pg_atomic_uint32 SharedWALProhibitState;
> > > +
> > > +     /* Startup checkpoint pending */
> > > +     bool            checkpointPending;
> > > +
> > > +     /* Signaled when requested WAL prohibit state changes */
> > > +     ConditionVariable walprohibit_cv;
> >
> > You're using three different naming styles for as many members.
> >
>
> Ill fix in the next version.
>
> >
> > > +/*
> > > + * ProcessBarrierWALProhibit()
> > > + *
> > > + * Handle WAL prohibit state change request.
> > > + */
> > > +bool
> > > +ProcessBarrierWALProhibit(void)
> > > +{
> > > +     /*
> > > +      * Kill off any transactions that have an XID *before* allowing the system
> > > +      * to go WAL prohibit state.
> > > +      */
> > > +     if (FullTransactionIdIsValid(GetTopFullTransactionIdIfAny()))
> >
> > Hm. I wonder if this check is good enough. If you look at
> > RecordTransactionCommit() we also WAL log in some cases where no xid was
> > assigned.  This is particularly true of (auto-)vacuum, but also for HOT
> > pruning.
> >
> > I think it'd be good to put the logic of this check into xlog.c and
> > mirror the logic in RecordTransactionCommit(). And add cross-referencing
> > comments to RecordTransactionCommit and the new function, reminding our
> > futures selves that both places need to be modified.
> >
>
> I am not sure I have understood this, here is the snip from the implementation
> detail from the first post[2]:
>
> "Open transactions that don't have an XID are not killed, but will get an ERROR
> if they try to acquire an XID later, or if they try to write WAL without
> acquiring an XID (e.g. VACUUM).  To make that happen, the patch adds a new
> coding rule: a critical section that will write WAL must be preceded by a call
> to CheckWALPermitted(), AssertWALPermitted(), or AssertWALPermitted_HaveXID().
> The latter variants are used when we know for certain that inserting WAL here
> must be OK, either because we have an XID (we would have been killed by a change
> to read-only if one had occurred) or for some other reason."
>
> Do let me  know if you want further clarification.
>
> >
> > > +     {
> > > +             /* Should be here only for the WAL prohibit state. */
> > > +             Assert(GetWALProhibitState() & WALPROHIBIT_STATE_READ_ONLY);
> >
> > There are no races where an ASRO READ ONLY is quickly followed by ASRO
> > READ WRITE where this could be reached?
> >
>
> No, right now SetWALProhibitState() doesn't allow two transient wal prohibit
> states at a time.
>
> >
> > > +/*
> > > + * AlterSystemSetWALProhibitState()
> > > + *
> > > + * Execute ALTER SYSTEM READ { ONLY | WRITE } statement.
> > > + */
> > > +void
> > > +AlterSystemSetWALProhibitState(AlterSystemWALProhibitState *stmt)
> > > +{
> > > +     uint32          state;
> > > +
> > > +     if (!superuser())
> > > +             ereport(ERROR,
> > > +                             (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > > +                              errmsg("must be superuser to execute ALTER SYSTEM command")));
> >
> > See comments about this above.
> >
> >
> > > +     /* Alter WAL prohibit state not allowed during recovery */
> > > +     PreventCommandDuringRecovery("ALTER SYSTEM");
> > > +
> > > +     /* Requested state */
> > > +     state = stmt->WALProhibited ?
> > > +             WALPROHIBIT_STATE_READ_ONLY : WALPROHIBIT_STATE_READ_WRITE;
> > > +
> > > +     /*
> > > +      * Since we yet to convey this WAL prohibit state to all backend mark it
> > > +      * in-progress.
> > > +      */
> > > +     state |= WALPROHIBIT_TRANSITION_IN_PROGRESS;
> > > +
> > > +     if (!SetWALProhibitState(state))
> > > +             return;                                 /* server is already in the desired state */
> > > +
> >
> > This use of bitmasks seems unnecessary to me. I'd rather have one param
> > for WALPROHIBIT_STATE_READ_ONLY / WALPROHIBIT_STATE_READ_WRITE and one
> > for WALPROHIBIT_TRANSITION_IN_PROGRESS
> >
>
> Ok.
>
> How about the new version of  SetWALProhibitState function as :
> SetWALProhibitState(bool wal_prohibited, bool is_final_state)  ?
>

I have added the same.

> >
> >
> > > +/*
> > > + * RequestWALProhibitChange()
> > > + *
> > > + * Request checkpointer to make the WALProhibitState to read-only.
> > > + */
> > > +static void
> > > +RequestWALProhibitChange(void)
> > > +{
> > > +     /* Must not be called from checkpointer */
> > > +     Assert(!AmCheckpointerProcess());
> > > +     Assert(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS);
> > > +
> > > +     /*
> > > +      * If in a standalone backend, just do it ourselves.
> > > +      */
> > > +     if (!IsPostmasterEnvironment)
> > > +     {
> > > +             CompleteWALProhibitChange(GetWALProhibitState());
> > > +             return;
> > > +     }
> > > +
> > > +     send_signal_to_checkpointer(SIGINT);
> > > +
> > > +     /* Wait for the state to change to read-only */
> > > +     ConditionVariablePrepareToSleep(&WALProhibitState->walprohibit_cv);
> > > +     for (;;)
> > > +     {
> > > +             /* We'll be done once in-progress flag bit is cleared */
> > > +             if (!(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > +                     break;
> > > +
> > > +             ConditionVariableSleep(&WALProhibitState->walprohibit_cv,
> > > +                                                        WAIT_EVENT_WALPROHIBIT_STATE_CHANGE);
> > > +     }
> > > +     ConditionVariableCancelSleep();
> >
> > What if somebody concurrently changes the state back to READ WRITE?
> > Won't we unnecessarily wait here?
> >
>
> Yes, there will be wait.
>
> > That's probably fine, because we would just wait until that transition
> > is complete too. But at least a comment about that would be
> > good. Alternatively a "ASRO transitions completed counter" or such might
> > be a better idea?
> >
>
> Ok, will add comments but could you please elaborate little a bit about "ASRO
> transitions completed counter"  and is there any existing counter I can refer
> to?
>
> >
> > > +/*
> > > + * CompleteWALProhibitChange()
> > > + *
> > > + * Checkpointer will call this to complete the requested WAL prohibit state
> > > + * transition.
> > > + */
> > > +void
> > > +CompleteWALProhibitChange(uint32 wal_state)
> > > +{
> > > +     uint64          barrierGeneration;
> > > +
> > > +     /*
> > > +      * Must be called from checkpointer. Otherwise, it must be single-user
> > > +      * backend.
> > > +      */
> > > +     Assert(AmCheckpointerProcess() || !IsPostmasterEnvironment);
> > > +     Assert(wal_state & WALPROHIBIT_TRANSITION_IN_PROGRESS);
> > > +
> > > +     /*
> > > +      * WAL prohibit state change is initiated. We need to complete the state
> > > +      * transition by setting requested WAL prohibit state in all backends.
> > > +      */
> > > +     elog(DEBUG1, "waiting for backends to adopt requested WAL prohibit state");
> > > +
> > > +     /* Emit global barrier */
> > > +     barrierGeneration = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_WALPROHIBIT);
> > > +     WaitForProcSignalBarrier(barrierGeneration);
> > > +
> > > +     /* And flush all writes. */
> > > +     XLogFlush(GetXLogWriteRecPtr());
> >
> > Hm, maybe I'm missing something, but why is the write pointer the right
> > thing to flush? That won't include records that haven't been written to
> > disk yet... We also need to trigger writing out all WAL that is as of
> > yet unwritten, no?  Without having thought a lot about it, it seems that
> > GetXLogInsertRecPtr() would be the right thing to flush?
> >
>
> TBH, I am not an expert in this area.  I wants to flush the latest record
> pointer that needs to be flushed, I think  GetXLogInsertRecPtr() would be fine
> if is the latest one. Note that wal flushes are not blocked in read-only mode.
>

Used GetXLogInsertRecPtr().

> >
> > > +     /* Set final state by clearing in-progress flag bit */
> > > +     if (SetWALProhibitState(wal_state & ~(WALPROHIBIT_TRANSITION_IN_PROGRESS)))
> > > +     {
> > > +             bool            wal_prohibited;
> > > +
> > > +             wal_prohibited = (wal_state & WALPROHIBIT_STATE_READ_ONLY) != 0;
> > > +
> > > +             /* Update the control file to make state persistent */
> > > +             SetControlFileWALProhibitFlag(wal_prohibited);
> >
> > Hm. Is there an issue with not WAL logging the control file change? Is
> > there a scenario where we a crash + recovery would end up overwriting
> > this?
> >
>
> I am not sure. If the system crash before update this that means we haven't
> acknowledged the system state change. And the server will be restarted with the
> previous state.
>
> Could you please explain what bothering you.
>
> >
> > > +             if (wal_prohibited)
> > > +                     ereport(LOG, (errmsg("system is now read only")));
> > > +             else
> > > +             {
> > > +                     /*
> > > +                      * Request checkpoint if the end-of-recovery checkpoint has been
> > > +                      * skipped previously.
> > > +                      */
> > > +                     if (WALProhibitState->checkpointPending)
> > > +                     {
> > > +                             RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
> > > +                                                               CHECKPOINT_IMMEDIATE);
> > > +                             WALProhibitState->checkpointPending = false;
> > > +                     }
> > > +                     ereport(LOG, (errmsg("system is now read write")));
> > > +             }
> > > +     }
> > > +
> > > +     /* Wake up the backend who requested the state change */
> > > +     ConditionVariableBroadcast(&WALProhibitState->walprohibit_cv);
> >
> > Could be multiple backends, right?
> >
>
> Yes, you are correct, will fix that.
>
> >
> > > +}
> > > +
> > > +/*
> > > + * GetWALProhibitState()
> > > + *
> > > + * Atomically return the current server WAL prohibited state
> > > + */
> > > +uint32
> > > +GetWALProhibitState(void)
> > > +{
> > > +     return pg_atomic_read_u32(&WALProhibitState->SharedWALProhibitState);
> > > +}
> >
> > Is there an issue with needing memory barriers here?
> >
> >
> > > +/*
> > > + * SetWALProhibitState()
> > > + *
> > > + * Change current WAL prohibit state to the input state.
> > > + *
> > > + * If the server is already completely moved to the requested WAL prohibit
> > > + * state, or if the desired state is same as the current state, return false,
> > > + * indicating that the server state did not change. Else return true.
> > > + */
> > > +bool
> > > +SetWALProhibitState(uint32 new_state)
> > > +{
> > > +     bool            state_updated = false;
> > > +     uint32          cur_state;
> > > +
> > > +     cur_state = GetWALProhibitState();
> > > +
> > > +     /* Server is already in requested state */
> > > +     if (new_state == cur_state ||
> > > +             new_state == (cur_state | WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > +             return false;
> > > +
> > > +     /* Prevent concurrent contrary in progress transition state setting */
> > > +     if ((new_state & WALPROHIBIT_TRANSITION_IN_PROGRESS) &&
> > > +             (cur_state & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > +     {
> > > +             if (cur_state & WALPROHIBIT_STATE_READ_ONLY)
> > > +                     ereport(ERROR,
> > > +                                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > +                                      errmsg("system state transition to read only is already in progress"),
> > > +                                      errhint("Try after sometime again.")));
> > > +             else
> > > +                     ereport(ERROR,
> > > +                                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > +                                      errmsg("system state transition to read write is already in progress"),
> > > +                                      errhint("Try after sometime again.")));
> > > +     }
> > > +
> > > +     /* Update new state in share memory */
> > > +     state_updated =
> > > +             pg_atomic_compare_exchange_u32(&WALProhibitState->SharedWALProhibitState,
> > > +                                                                        &cur_state, new_state);
> > > +
> > > +     if (!state_updated)
> > > +             ereport(ERROR,
> > > +                             (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > +                              errmsg("system read write state concurrently changed"),
> > > +                              errhint("Try after sometime again.")));
> > > +
> >
> > I don't think it's safe to use pg_atomic_compare_exchange_u32() outside
> > of a loop. I think there's platforms (basically all load-linked /
> > store-conditional architectures) where than can fail spuriously.
> >
> > Also, there's no memory barrier around GetWALProhibitState, so there's
> > no guarantee it's not an out-of-date value you're starting with.
> >
>
> How about having some kind of lock instead what Robert have suggested
> previously[3] ?
>

I would like to discuss this point more. In the attached version I have added
WALProhibitLock to protect shared walprohibit state updates.  I was a little
unsure do we want another spinlock what XLogCtlData has which is mostly used to
read the shared variable and for the update, both are used e.g. LogwrtResult.

Right now I haven't added and shared walprohibit state was fetch using a
volatile pointer. Do we need a spinlock there, I am not sure why? Thoughts?

> >
> > > +/
> > > + * MarkCheckPointSkippedInWalProhibitState()
> > > + *
> > > + * Sets checkpoint pending flag so that it can be performed next time while
> > > + * changing system state to WAL permitted.
> > > + */
> > > +void
> > > +MarkCheckPointSkippedInWalProhibitState(void)
> > > +{
> > > +     WALProhibitState->checkpointPending = true;
> > > +}
> >
> > I don't *at all* like this living outside of xlog.c. I think this should
> > be moved there, and merged with deferring checkpoints in other cases
> > (promotions, not immediately performing a checkpoint after recovery).
>
> Here we want to perform the checkpoint sometime quite later when the
> system state changes to read-write. For that, I think we need some flag
> if we want this in xlog.c then we can have that flag in XLogCtl.
>

Right now I have added a new variable to XLogCtlData and moved this code to
xlog.c.

>
> > There's state in ControlFile *and* here for essentially the same thing.
> >
>
> I am sorry to trouble you much, but I haven't understood this too.
>
> >
> >
> > > +      * If it is not currently possible to insert write-ahead log records,
> > > +      * either because we are still in recovery or because ALTER SYSTEM READ
> > > +      * ONLY has been executed, force this to be a read-only transaction.
> > > +      * We have lower level defences in XLogBeginInsert() and elsewhere to stop
> > > +      * us from modifying data during recovery when !XLogInsertAllowed(), but
> > > +      * this gives the normal indication to the user that the transaction is
> > > +      * read-only.
> > > +      *
> > > +      * On the other hand, we only need to set the startedInRecovery flag when
> > > +      * the transaction started during recovery, and not when WAL is otherwise
> > > +      * prohibited. This information is used by RelationGetIndexScan() to
> > > +      * decide whether to permit (1) relying on existing killed-tuple markings
> > > +      * and (2) further killing of index tuples. Even when WAL is prohibited
> > > +      * on the master, it's still the master, so the former is OK; and since
> > > +      * killing index tuples doesn't generate WAL, the latter is also OK.
> > > +      * See comments in RelationGetIndexScan() and MarkBufferDirtyHint().
> > > +      */
> > > +     XactReadOnly = DefaultXactReadOnly || !XLogInsertAllowed();
> > > +     s->startedInRecovery = RecoveryInProgress();
> >
> > It's somewhat ugly that we call RecoveryInProgress() once in
> > XLogInsertAllowed() and then again directly here... It's probably fine
> > runtime cost wise, but...
> >
> >
> > >  /*
> > >   * Subroutine to try to fetch and validate a prior checkpoint record.
> > >   *
> > > @@ -8508,9 +8564,13 @@ ShutdownXLOG(int code, Datum arg)
> > >        */
> > >       WalSndWaitStopping();
> > >
> > > +     /*
> > > +      * The restartpoint, checkpoint, or xlog rotation will be performed if the
> > > +      * WAL writing is permitted.
> > > +      */
> > >       if (RecoveryInProgress())
> > >               CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
> > > -     else
> > > +     else if (XLogInsertAllowed())
> >
> > Not sure I like going via XLogInsertAllowed(), that seems like a
> > confusing indirection here. And it encompasses things we atually don't
> > want to check for - it's fragile to also look at LocalXLogInsertAllowed
> > here imo.
> >
> >
> > >       ShutdownCLOG();
> > >       ShutdownCommitTs();
> > >       ShutdownSUBTRANS();
> > > diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> > > index 1b8cd7bacd4..aa4cdd57ec1 100644
> > > --- a/src/backend/postmaster/autovacuum.c
> > > +++ b/src/backend/postmaster/autovacuum.c
> > > @@ -652,6 +652,10 @@ AutoVacLauncherMain(int argc, char *argv[])
> > >
> > >               HandleAutoVacLauncherInterrupts();
> > >
> > > +             /* If the server is read only just go back to sleep. */
> > > +             if (!XLogInsertAllowed())
> > > +                     continue;
> > > +
> >
> > I think we really should have a different functions for places like
> > this. We don't want to generally hide bugs like e.g. starting the
> > autovac launcher in recovery, but this would.
> >
>
> So, we need a separate function like XLogInsertAllowed() and a global variable
> like LocalXLogInsertAllowed for the caching wal prohibit state.
>
> >
> > > @@ -342,6 +344,28 @@ CheckpointerMain(void)
> > >               AbsorbSyncRequests();
> > >               HandleCheckpointerInterrupts();
> > >
> > > +             wal_state = GetWALProhibitState();
> > > +
> > > +             if (wal_state & WALPROHIBIT_TRANSITION_IN_PROGRESS)
> > > +             {
> > > +                     /* Complete WAL prohibit state change request */
> > > +                     CompleteWALProhibitChange(wal_state);
> > > +                     continue;
> > > +             }
> > > +             else if (wal_state & WALPROHIBIT_STATE_READ_ONLY)
> > > +             {
> > > +                     /*
> > > +                      * Don't do anything until someone wakes us up.  For example a
> > > +                      * backend might later on request us to put the system back to
> > > +                      * read-write wal prohibit sate.
> > > +                      */
> > > +                     (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, -1,
> > > +                                                      WAIT_EVENT_CHECKPOINTER_MAIN);
> > > +                     continue;
> > > +             }
> > > +             Assert(wal_state == WALPROHIBIT_STATE_READ_WRITE);
> > > +
> > >               /*
> > >                * Detect a pending checkpoint request by checking whether the flags
> > >                * word in shared memory is nonzero.  We shouldn't need to acquire the
> > > @@ -1323,3 +1347,16 @@ FirstCallSinceLastCheckpoint(void)
> > >
> > >       return FirstCall;
> > >  }
> >
> > So, if we're in the middle of a paced checkpoint with a large
> > checkpoint_timeout - a sensible real world configuration - we'll not
> > process ASRO until that checkpoint is over?  That seems very much not
> > practical. What am I missing?
> >
>
> Yes, the process doing ASRO will wait until that checkpoint is over.
>
> >
> > > +/*
> > > + * send_signal_to_checkpointer allows a process to send a signal to the checkpoint process.
> > > + */
> > > +void
> > > +send_signal_to_checkpointer(int signum)
> > > +{
> > > +     if (CheckpointerShmem->checkpointer_pid == 0)
> > > +             elog(ERROR, "checkpointer is not running");
> > > +
> > > +     if (kill(CheckpointerShmem->checkpointer_pid, signum) != 0)
> > > +             elog(ERROR, "could not signal checkpointer: %m");
> > > +}
> >
> > Sudden switch to a different naming style...
> >
>
> My bad, sorry, will fix that.
>
> 1] http://postgr.es/m/20200724020402.2byiiufsd7pw4hsp@alap3.anarazel.de
> 2] http://postgr.es/m/CAAJ_b97KZzdJsffwRK7w0XU5HnXkcgKgTR69t8cOZztsyXjkQw@mail.gmail.com
> 3] http://postgr.es/m/CA+TgmoYMyw-m3O5XQ8tRy4mdEArGcfXr+9niO5Fmq1wVdKxYmQ@mail.gmail.com


Thank you !

Regards,
Amul

Attachment

pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: [patch] _bt_binsrch* improvements - equal-prefix-skip binary search
Next
From: Fujii Masao
Date:
Subject: Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away