Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [Patch] ALTER SYSTEM READ ONLY
Date
Msg-id 20200724020402.2byiiufsd7pw4hsp@alap3.anarazel.de
Whole thread Raw
In response to Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
Responses Re: [Patch] ALTER SYSTEM READ ONLY
Re: [Patch] ALTER SYSTEM READ ONLY
List pgsql-hackers
Hi,

> From f0188a48723b1ae7372bcc6a344ed7868fdc40fb Mon Sep 17 00:00:00 2001
> From: Amul Sul <amul.sul@enterprisedb.com>
> Date: Fri, 27 Mar 2020 05:05:38 -0400
> Subject: [PATCH v3 2/6] Add alter system read only/write syntax
> 
> Note that syntax doesn't have any implementation.
> ---
>  src/backend/nodes/copyfuncs.c    | 12 ++++++++++++
>  src/backend/nodes/equalfuncs.c   |  9 +++++++++
>  src/backend/parser/gram.y        | 13 +++++++++++++
>  src/backend/tcop/utility.c       | 20 ++++++++++++++++++++
>  src/bin/psql/tab-complete.c      |  6 ++++--
>  src/include/nodes/nodes.h        |  1 +
>  src/include/nodes/parsenodes.h   | 10 ++++++++++
>  src/tools/pgindent/typedefs.list |  1 +
>  8 files changed, 70 insertions(+), 2 deletions(-)

Shouldn't there be at outfuncs support as well? Perhaps we even need
readfuncs, not immediately sure.


> From 2c5db7db70d4cebebf574fbc47db7fbf7c440be1 Mon Sep 17 00:00:00 2001
> From: Amul Sul <amul.sul@enterprisedb.com>
> Date: Fri, 19 Jun 2020 06:29:36 -0400
> Subject: [PATCH v3 3/6] Implement ALTER SYSTEM READ ONLY using global barrier.
> 
> Implementation:
> 
>  1. When a user tried to change server state to WAL-Prohibited using
>     ALTER SYSTEM READ ONLY command; AlterSystemSetWALProhibitState() will emit
>     PROCSIGNAL_BARRIER_WAL_PROHIBIT_STATE_CHANGE barrier and will wait until the
>     barrier has been absorbed by all the backends.
> 
>  2. When a backend receives the WAL-Prohibited barrier, at that moment if
>     it is already in a transaction and the transaction already assigned XID,
>     then the backend will be killed by throwing FATAL(XXX: need more discussion
>     on this)

I think we should consider introducing XACTFATAL or such, guaranteeing
the transaction gets aborted, without requiring a FATAL. This has been
needed for enough cases that it's worthwhile.


There are several cases where we WAL log without having an xid
assigned. E.g. when HOT pruning during syscache lookups or such. Are
there any cases where the check for being in recovery is followed by a
CHECK_FOR_INTERRUPTS, before the WAL logging is done?


>  3. Otherwise, if that backend running transaction which yet to get XID
>     assigned we don't need to do anything special, simply call
>     ResetLocalXLogInsertAllowed() so that any future WAL insert in will check
>     XLogInsertAllowed() first which set ready only state appropriately.
> 
>  4. A new transaction (from existing or new backend) starts as a read-only
>     transaction.

Why do we need 4)? And doesn't that have the potential to be
unnecessarily problematic if a the server is subsequently brought out of
the readonly state again?


>  5. Auxiliary processes like autovacuum launcher, background writer,
>     checkpointer and  walwriter will don't do anything in WAL-Prohibited
>     server state until someone wakes us up. E.g. a backend might later on
>     request us to put the system back to read-write.

Hm. It's not at all clear to me why bgwriter and walwriter shouldn't do
anything in this state. bgwriter for example is even running entirely
normally in a hot standby node?


>  6. At shutdown in WAL-Prohibited mode, we'll skip shutdown checkpoint
>     and xlog rotation. Starting up again will perform crash recovery(XXX:
>     need some discussion on this as well)
> 
>  7. ALTER SYSTEM READ ONLY/WRITE is restricted on standby server.
> 
>  8. Only super user can toggle WAL-Prohibit state.
> 
>  9. Add system_is_read_only GUC show the system state -- will true when system
>     is wal prohibited or in recovery.



> +/*
> + * AlterSystemSetWALProhibitState
> + *
> + * Execute ALTER SYSTEM READ { ONLY | WRITE } statement.
> + */
> +void
> +AlterSystemSetWALProhibitState(AlterSystemWALProhibitState *stmt)
> +{
> +    if (!superuser())
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +                 errmsg("must be superuser to execute ALTER SYSTEM command")));

ISTM we should rather do this in a GRANTable manner. We've worked
substantially towards that in the last few years.


>  
> +    /*
> +     * WALProhibited indicates if we have stopped allowing WAL writes.
> +     * Protected by info_lck.
> +     */
> +    bool        WALProhibited;
> +
>      /*
>       * SharedHotStandbyActive indicates if we allow hot standby queries to be
>       * run.  Protected by info_lck.
> @@ -7962,6 +7969,25 @@ StartupXLOG(void)
>          RequestCheckpoint(CHECKPOINT_FORCE);
>  }
>  
> +void
> +MakeReadOnlyXLOG(void)
> +{
> +    SpinLockAcquire(&XLogCtl->info_lck);
> +    XLogCtl->WALProhibited = true;
> +    SpinLockRelease(&XLogCtl->info_lck);
> +}
> +
> +/*
> + * Is the system still in WAL prohibited state?
> + */
> +bool
> +IsWALProhibited(void)
> +{
> +    volatile XLogCtlData *xlogctl = XLogCtl;
> +
> +    return xlogctl->WALProhibited;
> +}

What does this kind of locking achieving? It doesn't protect against
concurrent ALTER SYSTEM SET READ ONLY or such?


> +        /*
> +         * If the server is in WAL-Prohibited state then don't do anything until
> +         * someone wakes us up. E.g. a backend might later on request us to put
> +         * the system back to read-write.
> +         */
> +        if (IsWALProhibited())
> +        {
> +            (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, -1,
> +                             WAIT_EVENT_CHECKPOINTER_MAIN);
> +            continue;
> +        }
> +
>          /*
>           * Detect a pending checkpoint request by checking whether the flags
>           * word in shared memory is nonzero.  We shouldn't need to acquire the

So if the ASRO happens while a checkpoint, potentially with a
checkpoint_timeout = 60d, it'll not take effect until the checkpoint has
finished.

But uh, as far as I can tell, the code would simply continue an
in-progress checkpoint, despite having absorbed the barrier. And then
we'd PANIC when doing the XLogInsert()?



> diff --git a/src/include/access/walprohibit.h b/src/include/access/walprohibit.h
> new file mode 100644
> index 00000000000..619c33cd780
> --- /dev/null
> +++ b/src/include/access/walprohibit.h

Not sure I like the mix of xlog/wal prefix for pretty closely related
files... I'm not convinced it's worth having a separate file for this,
fwiw.



> From 5600adc647bd729e4074ecf13e97b9f297e9d5c6 Mon Sep 17 00:00:00 2001
> From: Amul Sul <amul.sul@enterprisedb.com>
> Date: Fri, 15 May 2020 06:39:43 -0400
> Subject: [PATCH v3 4/6] Use checkpointer to make system READ-ONLY or
>  READ-WRITE
> 
> Till the previous commit, the backend used to do this, but now the backend
> requests checkpointer to do it. Checkpointer, noticing that the current state
> is has WALPROHIBIT_TRANSITION_IN_PROGRESS flag set, does the barrier request,
> and then acknowledges back to the backend who requested the state change.
> 
> Note that this commit also enables ALTER SYSTEM READ WRITE support and make WAL
> prohibited state persistent across the system restarts.

The split between the previous commit and this commit seems more
confusing than useful to me.

> +/*
> + * WALProhibitedRequest: Request checkpointer to make the WALProhibitState to
> + * read-only.
> + */
> +void
> +WALProhibitRequest(void)
> +{
> +    /* Must not be called from checkpointer */
> +    Assert(!AmCheckpointerProcess());
> +    Assert(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS);
> +
> +    /*
> +     * If in a standalone backend, just do it ourselves.
> +     */
> +    if (!IsPostmasterEnvironment)
> +    {
> +        performWALProhibitStateChange(GetWALProhibitState());
> +        return;
> +    }
> +
> +    if (CheckpointerShmem->checkpointer_pid == 0)
> +        elog(ERROR, "checkpointer is not running");
> +
> +    if (kill(CheckpointerShmem->checkpointer_pid, SIGINT) != 0)
> +        elog(ERROR, "could not signal checkpointer: %m");
> +
> +    /* Wait for the state to change to read-only */
> +    ConditionVariablePrepareToSleep(&CheckpointerShmem->readonly_cv);
> +    for (;;)
> +    {
> +        /*  We'll be done once in-progress flag bit is cleared */
> +        if (!(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> +            break;
> +
> +        elog(DEBUG1, "WALProhibitRequest: Waiting for checkpointer");
> +        ConditionVariableSleep(&CheckpointerShmem->readonly_cv,
> +                               WAIT_EVENT_SYSTEM_WALPROHIBIT_STATE_CHANGE);
> +    }
> +    ConditionVariableCancelSleep();
> +    elog(DEBUG1, "Done WALProhibitRequest");
> +}

Isn't it possible that the system could have been changed back to be
read-write by the time the wakeup is being processed?



> From 0b7426fc4708cc0e4ad333da3b35e473658bba28 Mon Sep 17 00:00:00 2001
> From: Amul Sul <amul.sul@enterprisedb.com>
> Date: Tue, 14 Jul 2020 02:10:55 -0400
> Subject: [PATCH v3 5/6] Error or Assert before START_CRIT_SECTION for WAL
>  write

Isn't that the wrong order? This needs to come before the feature is
enabled, no?



> @@ -758,6 +759,9 @@ brinbuildempty(Relation index)
>          ReadBufferExtended(index, INIT_FORKNUM, P_NEW, RBM_NORMAL, NULL);
>      LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
>  
> +    /* Building indexes will have an XID */
> +    AssertWALPermitted_HaveXID();
> +

Ugh, that's a pretty ugly naming scheme mix.



> @@ -176,6 +177,10 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
>      if (((BrinPageFlags(oldpage) & BRIN_EVACUATE_PAGE) == 0) &&
>          brin_can_do_samepage_update(oldbuf, origsz, newsz))
>      {
> +        /* Can reach here from VACUUM, so need not have an XID */
> +        if (RelationNeedsWAL(idxrel))
> +            CheckWALPermitted();
> +

Hm. Maybe I am confused, but why is that dependent on
RelationNeedsWAL()? Shouldn't readonly actually mean readonly, even if
no WAL is emitted?


>  #include "access/genam.h"
>  #include "access/gist_private.h"
>  #include "access/transam.h"
> +#include "access/walprohibit.h"
>  #include "commands/vacuum.h"
>  #include "lib/integerset.h"
>  #include "miscadmin.h"

The number of places that now need this new header - pretty much the
same set of files that do XLogInsert, already requiring an xlog* header
to be included - drives me further towards the conclusion that it's not
a good idea to have it separate.

>  extern void ProcessInterrupts(void);
>  
> +#ifdef USE_ASSERT_CHECKING
> +typedef enum
> +{
> +    WALPERMIT_UNCHECKED,
> +    WALPERMIT_CHECKED,
> +    WALPERMIT_CHECKED_AND_USED
> +} WALPermitCheckState;
> +
> +/* in access/walprohibit.c */
> +extern WALPermitCheckState walpermit_checked_state;
> +
> +/*
> + * Reset walpermit_checked flag when no longer in the critical section.
> + * Otherwise, marked checked and used.
> + */
> +#define RESET_WALPERMIT_CHECKED_STATE() \
> +do { \
> +    walpermit_checked_state = CritSectionCount ? \
> +    WALPERMIT_CHECKED_AND_USED : WALPERMIT_UNCHECKED; \
> +} while(0)
> +#else
> +#define RESET_WALPERMIT_CHECKED_STATE() ((void) 0)
> +#endif
> +

Why are these in headers? And why is this tied to CritSectionCount?


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Peter Geoghegan
Date:
Subject: Re: Default setting for enable_hashagg_disk