From 2724f096d343a56777b7380fe1789f1a4103ff0e Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Tue, 14 Jul 2020 02:30:44 -0400 Subject: [PATCH v26 3/3] Documentation. --- doc/src/sgml/func.sgml | 26 +++++++++++-- doc/src/sgml/high-availability.sgml | 34 ++++++++++++++++ src/backend/access/transam/README | 60 ++++++++++++++++++++++++++--- src/backend/storage/page/README | 12 +++--- 4 files changed, 118 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d2011634075..fa185b3d750 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24898,9 +24898,9 @@ SELECT collation for ('foo' COLLATE "de_DE"); - Each of these functions returns true if - the signal was successfully sent and false - if sending the signal failed. + Except pg_prohibit_wal, each of these functions + returns true if the signal was successfully sent + and false if sending the signal failed. @@ -25034,6 +25034,26 @@ SELECT collation for ('foo' COLLATE "de_DE"); is emitted and false is returned. + + + + + pg_prohibit_wal + + pg_prohibit_wal () + void + + + Function accepts a boolean argument to alter the WAL read-write state + and forces all processes of the PostgreSQL + server to accept that state change immediately. When + true passed, system state changed to read-only + (WAL prohibited state), if that not already. When + false passed, system state changed to read-write + (WAL permitted state), if that not already. See + for more details. + +
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index c072110ba60..b54767beeb0 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -2339,4 +2339,38 @@ HINT: You can then restart the server after making the necessary configuration + + WAL Prohibited State + + + WAL Prohibited State + + + + WAL prohibited is a read-only system state. Any permitted user can call + pg_prohibit_wal function to forces the system into + a read-only mode where insert write ahead log will be prohibited until the + same function executed to change that state to read-write. Like Hot Standby, + connections to the server are allowed to run read-only queries in WAL + prohibited state. If the system is only allowed to run read-only query, GUC + wal_prohibited value will be on. + Otherwise, it will be off. When the user requests WAL + prohibited state, at that moment if any existing session is already running + a transaction, and that transaction has already been performed or planning + to perform wal write operations then the session running that transaction + will be terminated. This is useful for HA setup where the master server + needs to stop accepting WAL writes immediately and kick out any + transaction expecting WAL writes at the end, in case of network down on + master or replication connections failures. + + + + Shutting down the read-only system will skip the shutdown checkpoint, and at + the restart, it will go into crash recovery mode and stay in that state + until the system changed to read-write. At starting read-only server if it + finds standby.signal or + recovery.signal file then system implicitly get out of + read-only state. + + diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README index 1edc8180c12..74c965b1f19 100644 --- a/src/backend/access/transam/README +++ b/src/backend/access/transam/README @@ -442,8 +442,8 @@ to be modified. 2. START_CRIT_SECTION() (Any error during the next three steps must cause a PANIC because the shared buffers will contain unlogged changes, which we have to ensure don't get to disk. Obviously, you should check conditions -such as whether there's enough free space on the page before you start the -critical section.) +such as whether there's WAL write permission and enough free space on the page +before you start the critical section.) 3. Apply the required changes to the shared buffer(s). @@ -486,6 +486,54 @@ with the incomplete-split flag set, it will finish the interrupted split by inserting the key to the parent, before proceeding. +Read only system state +---------------------- + +The system state when it is not currently possible to insert write ahead log +records, either because the system is still in recovery or because the system +forced to WAL prohibited by executing pg_prohibit_wal() function. We have a +lower-level defense in XLogBeginInsert() and elsewhere to stop us from modifying +data during recovery when !XLogInsertAllowed(), but if XLogBeginInsert() is +inside the critical section we must not depend on it to report an error. +Otherwise, it will cause PANIC as mentioned previously. + +We do not reach the point where we try to write WAL during recovery but +pg_prohibit_wal() can be executed anytime by the user to stop WAL writing. Any +backends which receive read-only system state transition barrier interrupt need +to stop WAL writing immediately. For barrier absorption the backed(s) will kill +the running transaction which has valid XID indicates that the transaction has +performed and/or planning WAL write. The transaction which doesn't acquire +valid XID yet or operation such VACUUM or CONCURRENT CREATE INDEX which not +necessary have valid XID for WAL will not be prevented while barrier processing, +and those might hit the error from XLogBeginInsert() while trying to write WAL +in read only system state. To prevent such error from XLogBeginInsert() inside +the critical section the WAL write permission has to check before +START_CRIT_SECTION(). + +To enforce the practice to check WAL permission before entering into critical +section for the WAL write, we have added an assert check flag that indicates +permission has been checked before calling XLogBeginInsert(). If not, +XLogBeginInsert() will have assertion failure. WAL permission check is not +mandatory if the XLogBeginInsert() is not inside the critical section where +throwing the error is acceptable. To get permission check flag set either +CheckWALPermitted(), AssertWALPermitted_HaveXID(), or AssertWALPermitted() +should be called before START_CRIT_SECTION(). This flag automatically resets +while exiting from the critical section. The rule to place either of permission +check routines will be: + + The places where WAL write operation in critical can be expected without + having valid XID (e.g vacuum) need to protect by CheckWALPermitted(), so + that error can be reported outside before critical section. + + The places where INSERT and UPDATE are expected which are never happened + without valid XID can be checked using AssertWALPermitted_HaveXID. So that + non-assert build will not have the checking overhead. + + The places we know that we cannot be reached in the read-only state and may + or may not have XID, but need to ensure the permission has been checked on + assert enabled build should use AssertWALPermitted(). + + Constructing a WAL record ------------------------- @@ -531,7 +579,8 @@ Details of the API functions: void XLogBeginInsert(void) - Must be called before XLogRegisterBuffer and XLogRegisterData. + Must be called before XLogRegisterBuffer and XLogRegisterData. WAL + permission must be check before calling it in a critical section. void XLogResetInsertion(void) @@ -638,8 +687,9 @@ MarkBufferDirtyHint() to mark the block dirty. If the buffer is clean and checksums are in use then MarkBufferDirtyHint() inserts an XLOG_FPI_FOR_HINT record to ensure that we take a full page image that includes the hint. We do this to avoid a partial page write, when we -write the dirtied page. WAL is not written during recovery, so we simply skip -dirtying blocks because of hints when in recovery. +write the dirtied page. WAL is not written while in read only (i.e. during +recovery or in WAL prohibit state), so we simply skip dirtying blocks because of +hints when in recovery. If you do decide to optimise away a WAL record, then any calls to MarkBufferDirty() must be replaced by MarkBufferDirtyHint(), diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README index e30d7ac59ad..15f0bb4b7b5 100644 --- a/src/backend/storage/page/README +++ b/src/backend/storage/page/README @@ -56,9 +56,9 @@ WAL is a fatal error and prevents further recovery, whereas a checksum failure on a normal data block is a hard error but not a critical one for the server, even if it is a very bad thing for the user. -New WAL records cannot be written during recovery, so hint bits set during -recovery must not dirty the page if the buffer is not already dirty, when -checksums are enabled. Systems in Hot-Standby mode may benefit from hint bits -being set, but with checksums enabled, a page cannot be dirtied after setting a -hint bit (due to the torn page risk). So, it must wait for full-page images -containing the hint bit updates to arrive from the primary. +New WAL records cannot be written during recovery or or while in WAL prohibit +state, so hint bits set during recovery must not dirty the page if the buffer is +not already dirty, when checksums are enabled. Systems in Hot-Standby mode may +benefit from hint bits being set, but with checksums enabled, a page cannot be +dirtied after setting a hint bit (due to the torn page risk). So, it must wait +for full-page images containing the hint bit updates to arrive from the primary. -- 2.18.0