Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct
Date
Msg-id 4BD69BD4.8070603@enterprisedb.com
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct  (Fujii Masao <masao.fujii@gmail.com>)
Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Heikki Linnakangas wrote:
> Robert Haas wrote:
>> On Fri, Apr 23, 2010 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, actually, now that I've looked at the patch I think it's starting
>>> from a fundamentally wrong position anyway.  Checkpoint records are a
>>> completely wrong mechanism for transmitting this data to slaves, because
>>> a checkpoint is emitted *after* we do something, not *before* we do it.
>>> In particular it's ludicrous to be looking at shutdown checkpoints to
>>> try to determine whether the subsequent WAL will meet the slave's
>>> requirements.  There's no connection at all between what the GUC state
>>> was at shutdown and what it might be after starting again.
>>>
>>> A design that might work is
>>> (1) store the active value of wal_mode in pg_control (but NOT as part of
>>> the last-checkpoint-record image).
>>> (2) invent a new WAL record type that is transmitted when we change
>>> wal_mode.
>>>
>>> Then, slaves could check whether the master's wal_mode is high enough
>>> by looking at pg_control when they start plus any wal_mode_change
>>> records they come across.
>>>
>>> If we did this then we could get rid of those WAL record types that were
>>> added to signify that information had been omitted from WAL at specific
>>> times.
>> <dons project manager hat>
>>
>> I notice that Heikki's patch doesn't include doing the above.  Should
>> we?  If so, who's going to do it?
>
> I'll give it a shot.

Ok, here's a patch that includes the changes to add new wal_mode GUC
(http://archives.postgresql.org/message-id/4BD581A6.60602@enterprisedb.com),
and implements Tom's design to keep a copy of wal_mode and the
max_connections, max_prepared_xacts and max_locks_per_xact settings in
pg_control.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index eb5765a..6c6a504 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -689,8 +689,7 @@ archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/ser
    </para>

    <para>
-    When <varname>archive_mode</> is <literal>off</> and <xref
-    linkend="guc-max-wal-senders"> is zero some SQL commands
+    When <varname>wal_mode</> is <literal>minimal</> some SQL commands
     are optimized to avoid WAL logging, as described in <xref
     linkend="populate-pitr">.  If archiving or streaming replication were
     turned on during execution of one of these statements, WAL would not
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c5692ba..63ca749 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1353,6 +1353,43 @@ SET ENABLE_SEQSCAN TO OFF;
      <title>Settings</title>
      <variablelist>

+     <varlistentry id="guc-wal-mode" xreflabel="wal_mode">
+      <term><varname>wal_mode</varname> (<type>enum</type>)</term>
+      <indexterm>
+       <primary><varname>wal_mode</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        <varname>wal_mode</> determines how much information is written
+        to the WAL. The default value is <literal>minimal</>, which writes
+        only minimal information needed to recover from a crash or immediate
+        shutdown. <literal>archive</> adds logging required for WAL archiving,
+        and <literal>hot_standby</> further adds extra information about
+        running transactions required to run read-only queries on a standby
+        server.
+        This parameter can only be set at server start.
+       </para>
+       <para>
+        In <literal>minimal</> mode, WAL-logging of some bulk operations, like
+        <command>CREATE INDEX</>, <command>CLUSTER</> and <command>COPY</> on
+        a table that was created or truncated in the same transaction can be
+        safely skipped, which can make those operations much faster, but
+        minimal WAL does not contain enough information to reconstruct the
+        data from a base backup and the WAL logs, so at least
+        <literal>archive</> level must be used to enable WAL archiving
+        (<xref linkend="guc-archive-mode">) and streaming replication. See
+        also <xref linkend="populate-pitr">.
+       </para>
+       <para>
+        In <literal>hot_standby</> mode, the same information is logged as
+        in <literal>archive</> mode, plus information needed to reconstruct
+        the status of running transactions from the WAL. To enable read-only
+        queries on a standby server, <varname>wal_mode</> must be set to
+        <literal>hot_standby</> on the primary.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-fsync" xreflabel="fsync">
       <indexterm>
        <primary><varname>fsync</> configuration parameter</primary>
@@ -1726,7 +1763,9 @@ SET ENABLE_SEQSCAN TO OFF;
         <varname>archive_mode</> and <varname>archive_command</> are
         separate variables so that <varname>archive_command</> can be
         changed without leaving archiving mode.
-        This parameter can only be set at server start.
+        This parameter can only be set at server start. It is ignored
+        unless <varname>wal_mode</> is set to <literal>archive</> or
+        <literal>hot_standby</>.
        </para>
       </listitem>
      </varlistentry>
@@ -1884,16 +1923,14 @@ SET ENABLE_SEQSCAN TO OFF;
       </indexterm>
       <listitem>
        <para>
-        Parameter has two roles. During recovery, specifies whether or not
-        you can connect and run queries to enable <xref linkend="hot-standby">.
-        During normal running, specifies whether additional information is written
-        to WAL to allow recovery connections on a standby server that reads
-        WAL data generated by this server. The default value is
+        During recovery, specifies whether or not you can connect and run
+        queries to enable <xref linkend="hot-standby">. The default value is
         <literal>on</literal>.  It is thought that there is little
         measurable difference in performance from using this feature, so
         feedback is welcome if any production impacts are noticeable.
         It is likely that this parameter will be removed in later releases.
-        This parameter can only be set at server start.
+        This parameter can only be set at server start. It is ignored when
+        not in standby mode.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index d69f2ea..7fa0817 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1589,9 +1589,9 @@ LOG:  database system is ready to accept read only connections
 </programlisting>

     Consistency information is recorded once per checkpoint on the primary, as long
-    as <varname>recovery_connections</> is enabled on the primary.  It is not possible
+    as <varname>wal_mode</> is set to <literal>hot_standby</> on the primary.  It is not possible
     to enable recovery connections on the standby when reading WAL written during the
-    period that <varname>recovery_connections</> was disabled on the primary.
+    period that <varname>wal_mode</> was not set to <literal>hot_standby</> on the primary.
     Reaching a consistent state can also be delayed in the presence
     of both of these conditions:

@@ -1838,7 +1838,7 @@ LOG:  database system is ready to accept read only connections
    </para>

    <para>
-    On the primary, parameters <varname>recovery_connections</> and
+    On the primary, parameters <varname>wal_mode</> and
     <varname>vacuum_defer_cleanup_age</> can be used.
     <varname>max_standby_delay</> has no effect if set on the primary.
    </para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index b00e69f..a493348 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -835,10 +835,9 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     <command>TRUNCATE</command> command. In such cases no WAL
     needs to be written, because in case of an error, the files
     containing the newly loaded data will be removed anyway.
-    However, this consideration does not apply when
-    <xref linkend="guc-archive-mode"> is on or streaming replication
-    is allowed (i.e., <xref linkend="guc-max-wal-senders"> is more
-    than or equal to one), as all commands must write WAL in that case.
+    However, this consideration only applies when
+    <xref linkend="guc-wal-mode"> is <literal>minimal</> as all commands
+    must write WAL otherwise.
    </para>

   </sect2>
@@ -910,18 +909,16 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
   </sect2>

   <sect2 id="populate-pitr">
-   <title>Turn off <varname>archive_mode</varname> and streaming replication</title>
+   <title>Disable WAL archival and streaming replication</title>

    <para>
     When loading large amounts of data into an installation that uses
     WAL archiving or streaming replication, you might want to disable
-    archiving (turn off the <xref linkend="guc-archive-mode">
-    configuration variable) and replication (zero the
-    <xref linkend="guc-max-wal-senders"> configuration variable)
-    while loading.  It might be
+    archiving and replication by setting the <xref linkend="guc-wal-mode">
+    configuration variable to <literal>minimal</> while loading.  It might be
     faster to take a new base backup after the load has completed
     than to process a large amount of incremental WAL data.
-    But note that changing either of these variables requires
+    But note that changing <varname>wal_mode</> requires
     a server restart.
    </para>

@@ -929,10 +926,9 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     Aside from avoiding the time for the archiver or WAL sender to
     process the WAL data,
     doing this will actually make certain commands faster, because they
-    are designed not to write WAL at all if <varname>archive_mode</varname>
-    is off and <varname>max_wal_senders</varname> is zero.  (They can
-    guarantee crash safety more cheaply by doing an
-    <function>fsync</> at the end than by writing WAL.)
+    are designed not to write WAL at all if <varname>wal_mode</varname>
+    is <literal>minimal</>.  (They can guarantee crash safety more cheaply
+    by doing an <function>fsync</> at the end than by writing WAL.)
     This applies to the following commands:
     <itemizedlist>
      <listitem>
@@ -1015,9 +1011,10 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
      <listitem>
       <para>
        If using WAL archiving, consider disabling it during the restore.
-       To do that, turn off <varname>archive_mode</varname> before loading the
-       dump script, and afterwards turn it back on
-       and take a fresh base backup.
+       To do that, set <varname>wal_mode</varname> to <literal>minimal</>
+       before loading the dump script, and afterwards set it back to
+       <literal>archive</> or <literal>hot_standby</> and take a fresh
+       base backup.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index ac075cb..99235da 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -278,16 +278,6 @@ end_heap_rewrite(RewriteState state)
                    (char *) state->rs_buffer, true);
     }

-    /* Write an XLOG UNLOGGED record if WAL-logging was skipped */
-    if (!state->rs_use_wal && !state->rs_new_rel->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 30];
-
-        snprintf(reason, sizeof(reason), "heap rewrite on \"%s\"",
-                 RelationGetRelationName(state->rs_new_rel));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /*
      * If the rel isn't temp, must fsync before commit.  We use heap_sync to
      * ensure that the toast table gets fsync'd too.
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 6cb3cac..89ed8a0 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -215,19 +215,6 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
      */
     wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!wstate.btws_use_wal && !wstate.index->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 20];
-
-        snprintf(reason, sizeof(reason), "b-tree build on \"%s\"",
-                 RelationGetRelationName(wstate.index));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /* reserve the metapage */
     wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
     wstate.btws_pages_written = 0;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7647f4e..e1209bb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -76,6 +76,7 @@ int            MaxStandbyDelay = 30;
 bool        fullPageWrites = true;
 bool        log_checkpoints = false;
 int            sync_method = DEFAULT_SYNC_METHOD;
+int            wal_mode = WAL_MODE_MINIMAL;

 #ifdef WAL_DEBUG
 bool        XLOG_DEBUG = false;
@@ -97,6 +98,13 @@ bool        XLOG_DEBUG = false;
 /*
  * GUC support
  */
+const struct config_enum_entry wal_mode_options[] = {
+    {"minimal", WAL_MODE_MINIMAL, false},
+    {"archive", WAL_MODE_ARCHIVE, false},
+    {"hot_standby", WAL_MODE_HOT_STANDBY, false},
+    {NULL, 0, false}
+};
+
 const struct config_enum_entry sync_method_options[] = {
     {"fsync", SYNC_METHOD_FSYNC, false},
 #ifdef HAVE_FSYNC_WRITETHROUGH
@@ -501,6 +509,18 @@ static bool reachedMinRecoveryPoint = false;
 static bool InRedo = false;

 /*
+ * Information logged when we detect a change in one of the parameters
+ * important for Hot Standby.
+ */
+typedef struct xl_parameter_change
+{
+    int            MaxConnections;
+    int            max_prepared_xacts;
+    int            max_locks_per_xact;
+    int            wal_mode;
+} xl_parameter_change;
+
+/*
  * Flags set by interrupt handlers for later service in the redo loop.
  */
 static volatile sig_atomic_t got_SIGHUP = false;
@@ -522,7 +542,8 @@ static void readRecoveryCommandFile(void);
 static void exitArchiveRecovery(TimeLineID endTLI,
                     uint32 endLogId, uint32 endLogSeg);
 static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
-static void CheckRequiredParameterValues(CheckPoint checkPoint);
+static void CheckRequiredParameterValues(void);
+static void XLogReportParameters(void);
 static void LocalSetXLogInsertAllowed(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);

@@ -4922,6 +4943,13 @@ BootStrapXLOG(void)
     ControlFile->time = checkPoint.time;
     ControlFile->checkPoint = checkPoint.redo;
     ControlFile->checkPointCopy = checkPoint;
+
+    /* Set important parameter values for use when replaying WAL */
+    ControlFile->MaxConnections = MaxConnections;
+    ControlFile->max_prepared_xacts = max_prepared_xacts;
+    ControlFile->max_locks_per_xact = max_locks_per_xact;
+    ControlFile->wal_mode = wal_mode;
+
     /* some additional ControlFile fields are set in WriteControlFile() */

     WriteControlFile();
@@ -5539,17 +5567,18 @@ GetLatestXLogTime(void)
 }

 /*
- * Note that text field supplied is a parameter name and does not require translation
+ * Note that text field supplied is a parameter name and does not require
+ * translation
  */
-#define RecoveryRequiresIntParameter(param_name, currValue, checkpointValue) \
+#define RecoveryRequiresIntParameter(param_name, currValue, minValue) \
 { \
-    if (currValue < checkpointValue) \
+    if (currValue < minValue) \
         ereport(ERROR, \
             (errmsg("recovery connections cannot continue because " \
                     "%s = %u is a lower setting than on WAL source server (value was %u)", \
                     param_name, \
                     currValue, \
-                    checkpointValue))); \
+                    minValue))); \
 }

 /*
@@ -5557,21 +5586,37 @@ GetLatestXLogTime(void)
  * for various aspects of recovery operation.
  */
 static void
-CheckRequiredParameterValues(CheckPoint checkPoint)
+CheckRequiredParameterValues(void)
 {
-    /* We ignore autovacuum_max_workers when we make this test. */
-    RecoveryRequiresIntParameter("max_connections",
-                                 MaxConnections, checkPoint.MaxConnections);
+    /*
+     * For archive recovery, the WAL must be generated with at least
+     * 'archive' wal_mode.
+     */
+    if (InArchiveRecovery && ControlFile->wal_mode == WAL_MODE_MINIMAL)
+    {
+        ereport(WARNING,
+                (errmsg("WAL was generated with wal_mode='minimal', data may be missing"),
+                 errhint("This happens if you temporarily set wal_mode='minimal' without taking a new base
backup.")));
+    }

-    RecoveryRequiresIntParameter("max_prepared_xacts",
-                          max_prepared_xacts, checkPoint.max_prepared_xacts);
-    RecoveryRequiresIntParameter("max_locks_per_xact",
-                          max_locks_per_xact, checkPoint.max_locks_per_xact);
+    /*
+     * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
+     * and we must have at least as many backend slots as the primary.
+     */
+    if (InArchiveRecovery && XLogRequestRecoveryConnections)
+    {
+        if (ControlFile->wal_mode < WAL_MODE_HOT_STANDBY)
+            ereport(ERROR,
+                    (errmsg("recovery connections cannot start because wal_mode was not set to 'hot_standby' on the
WALsource server"))); 

-    if (!checkPoint.XLogStandbyInfoMode)
-        ereport(ERROR,
-                (errmsg("recovery connections cannot start because the recovery_connections "
-                        "parameter is disabled on the WAL source server")));
+        /* We ignore autovacuum_max_workers when we make this test. */
+        RecoveryRequiresIntParameter("max_connections",
+                                     MaxConnections, ControlFile->MaxConnections);
+        RecoveryRequiresIntParameter("max_prepared_xacts",
+                                     max_prepared_xacts, ControlFile->max_prepared_xacts);
+        RecoveryRequiresIntParameter("max_locks_per_xact",
+                                     max_locks_per_xact, ControlFile->max_locks_per_xact);
+    }
 }

 /*
@@ -5904,6 +5949,9 @@ StartupXLOG(void)
                                 BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
         }

+        /* Check that the GUCs used to generate the WAL allow recovery */
+        CheckRequiredParameterValues();
+
         /*
          * Initialize recovery connections, if enabled. We won't let backends
          * in yet, not until we've reached the min recovery point specified in
@@ -5915,8 +5963,6 @@ StartupXLOG(void)
             TransactionId *xids;
             int            nxids;

-            CheckRequiredParameterValues(checkPoint);
-
             ereport(DEBUG1,
                     (errmsg("initializing recovery connections")));

@@ -6401,6 +6447,13 @@ StartupXLOG(void)
     }

     /*
+     * If any of the critical GUCs have changed, log them before we allow
+     * any backends to write WAL
+     */
+    LocalSetXLogInsertAllowed();
+    XLogReportParameters();
+
+    /*
      * All done.  Allow backends to write WAL.    (Although the bool flag is
      * probably atomic in itself, we use the info_lck here to ensure that
      * there are no race conditions concerning visibility of other recent
@@ -6998,12 +7051,6 @@ CreateCheckPoint(int flags)
     MemSet(&checkPoint, 0, sizeof(checkPoint));
     checkPoint.time = (pg_time_t) time(NULL);

-    /* Set important parameter values for use when replaying WAL */
-    checkPoint.MaxConnections = MaxConnections;
-    checkPoint.max_prepared_xacts = max_prepared_xacts;
-    checkPoint.max_locks_per_xact = max_locks_per_xact;
-    checkPoint.XLogStandbyInfoMode = XLogStandbyInfoActive();
-
     /*
      * We must hold WALInsertLock while examining insert state to determine
      * the checkpoint REDO pointer.
@@ -7647,28 +7694,49 @@ RequestXLogSwitch(void)
 }

 /*
- * Write an XLOG UNLOGGED record, indicating that some operation was
- * performed on data that we fsync()'d directly to disk, skipping
- * WAL-logging.
- *
- * Such operations screw up archive recovery, so we complain if we see
- * these records during archive recovery. That shouldn't happen in a
- * correctly configured server, but you can induce it by temporarily
- * disabling archiving and restarting, so it's good to at least get a
- * warning of silent data loss in such cases. These records serve no
- * other purpose and are simply ignored during crash recovery.
+ * Check if any of the GUC parameters that are critical for hot standby
+ * have changed, and update the value in pg_control file if necessary.
  */
-void
-XLogReportUnloggedStatement(char *reason)
+static void
+XLogReportParameters(void)
 {
-    XLogRecData rdata;
+    if (wal_mode != ControlFile->wal_mode ||
+        MaxConnections != ControlFile->MaxConnections ||
+        max_prepared_xacts != ControlFile->max_prepared_xacts ||
+        max_locks_per_xact != max_locks_per_xact)
+    {
+        /*
+         * The change in number of backend slots doesn't need to be
+         * WAL-logged if archiving is not enabled, as you can't start
+         * archive recovery with wal_mode='minimal' anyway. We don't
+         * really care about the values in pg_control either if
+         * wal_mode='minimal', but seems better to keep them up-to-date
+         * to avoid confusion.
+         */
+        if (wal_mode != ControlFile->wal_mode || XLogIsNeeded())
+        {
+            XLogRecData rdata;
+            xl_parameter_change xlrec;

-    rdata.buffer = InvalidBuffer;
-    rdata.data = reason;
-    rdata.len = strlen(reason) + 1;
-    rdata.next = NULL;
+            xlrec.MaxConnections = MaxConnections;
+            xlrec.max_prepared_xacts = max_prepared_xacts;
+            xlrec.max_locks_per_xact = max_locks_per_xact;
+            xlrec.wal_mode = wal_mode;
+
+            rdata.buffer = InvalidBuffer;
+            rdata.data = (char *) &xlrec;
+            rdata.len = sizeof(xlrec);
+            rdata.next = NULL;
+
+            XLogInsert(RM_XLOG_ID, XLOG_PARAMETER_CHANGE, &rdata);
+        }

-    XLogInsert(RM_XLOG_ID, XLOG_UNLOGGED, &rdata);
+        ControlFile->MaxConnections = MaxConnections;
+        ControlFile->max_prepared_xacts = max_prepared_xacts;
+        ControlFile->max_locks_per_xact = max_locks_per_xact;
+        ControlFile->wal_mode = wal_mode;
+        UpdateControlFile();
+    }
 }

 /*
@@ -7709,10 +7777,6 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
                               checkPoint.nextMultiOffset);
         SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);

-        /* Check to see if any changes to max_connections give problems */
-        if (standbyState != STANDBY_DISABLED)
-            CheckRequiredParameterValues(checkPoint);
-
         /*
          * If we see a shutdown checkpoint, we know that nothing was
          * running on the master at this point. So fake-up an empty
@@ -7834,18 +7898,21 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
             LWLockRelease(ControlFileLock);
         }
     }
-    else if (info == XLOG_UNLOGGED)
+    else if (info == XLOG_PARAMETER_CHANGE)
     {
-        if (InArchiveRecovery)
-        {
-            /*
-             * Note: We don't print the reason string from the record, because
-             * that gets added as a line using xlog_desc()
-             */
-            ereport(WARNING,
-                (errmsg("unlogged operation performed, data may be missing"),
-                 errhint("This can happen if you temporarily disable archive_mode without taking a new base
backup.")));
-        }
+        xl_parameter_change xlrec;
+
+        /* Update our copy of the parameters in pg_control */
+        memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+
+        ControlFile->MaxConnections = xlrec.MaxConnections;
+        ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+        ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
+        ControlFile->wal_mode = xlrec.wal_mode;
+        UpdateControlFile();
+
+        /* Check to see if any changes to max_connections give problems */
+        CheckRequiredParameterValues();
     }
 }

@@ -7896,11 +7963,30 @@ xlog_desc(StringInfo buf, uint8 xl_info, char *rec)
         appendStringInfo(buf, "backup end: %X/%X",
                          startpoint.xlogid, startpoint.xrecoff);
     }
-    else if (info == XLOG_UNLOGGED)
+    else if (info == XLOG_PARAMETER_CHANGE)
     {
-        char       *reason = rec;
+        xl_parameter_change xlrec;
+        const char *wal_mode_str;
+        const struct config_enum_entry *entry;
+
+        memcpy(&xlrec, rec, sizeof(xl_parameter_change));
+
+        /* Find a string representation for wal_mode */
+        wal_mode_str = "?";
+        for (entry = wal_mode_options; entry->name; entry++)
+        {
+            if (entry->val == xlrec.wal_mode)
+            {
+                wal_mode_str = entry->name;
+                break;
+            }
+        }

-        appendStringInfo(buf, "unlogged operation: %s", reason);
+        appendStringInfo(buf, "parameter change: max_connections=%d max_prepared_xacts=%d max_locks_per_xact=%d
wal_mode=%s",
+                         xlrec.MaxConnections,
+                         xlrec.max_prepared_xacts,
+                         xlrec.max_locks_per_xact,
+                         wal_mode_str);
     }
     else
         appendStringInfo(buf, "UNKNOWN");
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 30a00ab..ccb4599 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -787,23 +787,6 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
      */
     use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!use_wal && !NewHeap->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 32];
-
-        if (OldIndex != NULL)
-            snprintf(reason, sizeof(reason), "CLUSTER on \"%s\"",
-                     RelationGetRelationName(NewHeap));
-        else
-            snprintf(reason, sizeof(reason), "VACUUM FULL on \"%s\"",
-                     RelationGetRelationName(NewHeap));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /* use_wal off requires smgr_targblock be initially invalid */
     Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 84a83f1..9d46e47 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2223,14 +2223,7 @@ CopyFrom(CopyState cstate)
      * indexes since those use WAL anyway)
      */
     if (hi_options & HEAP_INSERT_SKIP_WAL)
-    {
-        char        reason[NAMEDATALEN + 30];
-
-        snprintf(reason, sizeof(reason), "COPY FROM on \"%s\"",
-                 RelationGetRelationName(cstate->rel));
-        XLogReportUnloggedStatement(reason);
         heap_sync(cstate->rel);
-    }
 }


diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 25b2807..9b5ce65 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3272,14 +3272,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)

         /* If we skipped writing WAL, then we need to sync the heap. */
         if (hi_options & HEAP_INSERT_SKIP_WAL)
-        {
-            char        reason[NAMEDATALEN + 30];
-
-            snprintf(reason, sizeof(reason), "table rewrite on \"%s\"",
-                     RelationGetRelationName(newrel));
-            XLogReportUnloggedStatement(reason);
             heap_sync(newrel);
-        }

         heap_close(newrel, NoLock);
     }
@@ -7021,20 +7014,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace)

     heap_close(pg_class, RowExclusiveLock);

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!XLogIsNeeded() && !rel->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 40];
-
-        snprintf(reason, sizeof(reason), "ALTER TABLE SET TABLESPACE on \"%s\"",
-                 RelationGetRelationName(rel));
-
-        XLogReportUnloggedStatement(reason);
-    }
-
     relation_close(rel, NoLock);

     /* Make sure the reltablespace change is visible */
@@ -7063,10 +7042,6 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
     /*
      * We need to log the copied data in WAL iff WAL archiving/streaming is
      * enabled AND it's not a temp rel.
-     *
-     * Note: If you change the conditions here, update the conditions in
-     * ATExecSetTableSpace() for when an XLOG UNLOGGED record is written to
-     * match.
      */
     use_wal = XLogIsNeeded() && !istemp;

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d5e7e3a..d299310 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2241,14 +2241,7 @@ CloseIntoRel(QueryDesc *queryDesc)

         /* If we skipped using WAL, must heap_sync before commit */
         if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
-        {
-            char        reason[NAMEDATALEN + 30];
-
-            snprintf(reason, sizeof(reason), "SELECT INTO on \"%s\"",
-                     RelationGetRelationName(myState->rel));
-            XLogReportUnloggedStatement(reason);
             heap_sync(myState->rel);
-        }

         /* close rel, but keep lock until commit */
         heap_close(myState->rel, NoLock);
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 47f71bd..eeab0f3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -728,6 +728,12 @@ PostmasterMain(int argc, char *argv[])
         write_stderr("%s: superuser_reserved_connections must be less than max_connections\n", progname);
         ExitPostmaster(1);
     }
+    if (XLogArchiveMode && wal_mode == WAL_MODE_MINIMAL)
+        ereport(WARNING,
+                (errmsg("archive_mode ignored because wal_mode is 'minimal'")));
+    if (max_wal_senders > 0 && wal_mode == WAL_MODE_MINIMAL)
+        ereport(WARNING,
+                (errmsg("WAL streaming connections not allowed because wal_mode is 'minimal'")));

     /*
      * Other one-time internal sanity checks can go here, if they are fast.
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3838665..35bc772 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -253,6 +253,24 @@ WalSndHandshake(void)
                     {
                         StringInfoData buf;

+                        /*
+                         * Check that we're logging enough information in the
+                         * WAL for log-shipping.
+                         *
+                         * NOTE: This only checks the current value of
+                         * wal_mode. Even if the current setting is not
+                         * 'minimal', there can be old WAL in the pg_xlog
+                         * directory that was created with 'minimal'.
+                         * So this is not bulletproof, the purpose is
+                         * just to give a user-friendly error message that
+                         * hints how to configure the system correctly.
+                         */
+                        if (wal_mode == WAL_MODE_MINIMAL)
+                            ereport(FATAL,
+                                    (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+                                     errmsg("standby connections not allowed because wal_mode='minimal'")));
+
+
                         /* Send a CopyOutResponse message, and start streaming */
                         pq_beginmessage(&buf, 'H');
                         pq_sendbyte(&buf, 0);
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 51753d6..a92a874 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -256,7 +256,7 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
      */
     if (!TransactionIdIsValid(latestRemovedXid))
     {
-        elog(DEBUG1, "Invalid latestremovexXid reported, using latestcompletedxid instead");
+        elog(DEBUG1, "invalid latestremovexXid reported, using latestcompletedxid instead");

         LWLockAcquire(ProcArrayLock, LW_SHARED);
         latestRemovedXid = ShmemVariableCache->latestCompletedXid;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2434cc0..2fb4090 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -340,6 +340,7 @@ static const struct config_enum_entry constraint_exclusion_options[] = {
 /*
  * Options for enum values stored in other modules
  */
+extern const struct config_enum_entry wal_mode_options[];
 extern const struct config_enum_entry sync_method_options[];

 /*
@@ -2785,6 +2786,15 @@ static struct config_enum ConfigureNamesEnum[] =
     },

     {
+        {"wal_mode", PGC_POSTMASTER, WAL_SETTINGS,
+            gettext_noop("Set the level of information written to the WAL."),
+            NULL
+        },
+        &wal_mode,
+        WAL_MODE_MINIMAL, wal_mode_options, NULL
+    },
+
+    {
         {"wal_sync_method", PGC_SIGHUP, WAL_SETTINGS,
             gettext_noop("Selects the method used for forcing WAL updates to disk."),
             NULL
@@ -7862,7 +7872,7 @@ pg_timezone_abbrev_initialize(void)
 static const char *
 show_archive_command(void)
 {
-    if (XLogArchiveMode)
+    if (XLogArchivingActive())
         return XLogArchiveCommand;
     else
         return "(disabled)";
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 92763eb..c9ee77c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -150,6 +150,7 @@

 # - Settings -

+#wal_mode = minimal            # minimal, archive, or hot_standby
 #fsync = on                # turns forced synchronization on or off
 #synchronous_commit = on        # immediate fsync at commit
 #wal_sync_method = fsync        # the default is the first option
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index cb16cdf..668a912 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -60,6 +60,21 @@ dbState(DBState state)
     return _("unrecognized status code");
 }

+static const char *
+wal_mode_str(int wal_mode)
+{
+    switch (wal_mode)
+    {
+        case WAL_MODE_MINIMAL:
+            return "minimal";
+        case WAL_MODE_ARCHIVE:
+            return "archive";
+        case WAL_MODE_HOT_STANDBY:
+            return "hot_standby";
+    }
+    return _("unrecognized wal_mode");
+}
+

 int
 main(int argc, char *argv[])
@@ -206,6 +221,14 @@ main(int argc, char *argv[])
     printf(_("Backup start location:                %X/%X\n"),
            ControlFile.backupStartPoint.xlogid,
            ControlFile.backupStartPoint.xrecoff);
+    printf(_("Last wal_mode setting:                %s\n"),
+           wal_mode_str(ControlFile.wal_mode));
+    printf(_("Last max_connections setting:         %d\n"),
+           ControlFile.MaxConnections);
+    printf(_("Last max_prepared_xacts setting:      %d\n"),
+           ControlFile.max_prepared_xacts);
+    printf(_("Last max_locks_per_xact setting:      %d\n"),
+           ControlFile.max_locks_per_xact);
     printf(_("Maximum data alignment:               %u\n"),
            ControlFile.maxAlign);
     /* we don't print floatFormat since can't say much useful about it */
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 3d2a7bc..9f2a592 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -628,6 +628,15 @@ RewriteControlFile(void)
     ControlFile.backupStartPoint.xlogid = 0;
     ControlFile.backupStartPoint.xrecoff = 0;

+    /*
+     * Use the defaults for max_* settings. The values don't matter
+     * as long as wal_mode='minimal'.
+     */
+    ControlFile.MaxConnections = 100;
+    ControlFile.max_prepared_xacts = 0;
+    ControlFile.max_locks_per_xact = 64;
+    ControlFile.wal_mode = WAL_MODE_MINIMAL;
+
     /* Now we can force the recorded xlog seg size to the right thing. */
     ControlFile.xlog_seg_size = XLogSegSize;

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 6bfc7d5..1fdd648 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -13,6 +13,7 @@

 #include "access/rmgr.h"
 #include "access/xlogdefs.h"
+#include "catalog/pg_control.h"
 #include "lib/stringinfo.h"
 #include "storage/buf.h"
 #include "utils/pg_crc.h"
@@ -195,24 +196,19 @@ extern int    XLogArchiveTimeout;
 extern bool log_checkpoints;
 extern bool XLogRequestRecoveryConnections;
 extern int    MaxStandbyDelay;
+extern int    wal_mode;

-#define XLogArchivingActive()    (XLogArchiveMode)
+#define XLogArchivingActive()    (XLogArchiveMode && wal_mode >= WAL_MODE_ARCHIVE)
 #define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')

 /*
- * This is in walsender.c, but declared here so that we don't need to include
- * walsender.h in all files that check XLogIsNeeded()
+ * Is WAL-logging necessary for archival or log-shipping, or can we skip
+ * WAL-logging if we fsync() the data before committing instead?
  */
-extern int    max_wal_senders;
-
-/*
- * Is WAL-logging necessary? We need to log an XLOG record iff either
- * WAL archiving is enabled or XLOG streaming is allowed.
- */
-#define XLogIsNeeded() (XLogArchivingActive() || (max_wal_senders > 0))
+#define XLogIsNeeded() (wal_mode >= WAL_MODE_ARCHIVE)

 /* Do we need to WAL-log information required only for Hot Standby? */
-#define XLogStandbyInfoActive() (XLogRequestRecoveryConnections && XLogIsNeeded())
+#define XLogStandbyInfoActive() (wal_mode >= WAL_MODE_HOT_STANDBY)

 #ifdef WAL_DEBUG
 extern bool XLOG_DEBUG;
@@ -293,7 +289,6 @@ extern void InitXLOGAccess(void);
 extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
 extern void XLogPutNextOid(Oid nextOid);
-extern void XLogReportUnloggedStatement(char *reason);
 extern XLogRecPtr GetRedoRecPtr(void);
 extern XLogRecPtr GetInsertRecPtr(void);
 extern XLogRecPtr GetWriteRecPtr(void);
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 9d65546..54fce3e 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -21,7 +21,7 @@


 /* Version identifier for this pg_control format */
-#define PG_CONTROL_VERSION    901
+#define PG_CONTROL_VERSION    901 /* XXX bump before commit */

 /*
  * Body of CheckPoint XLOG records.  This is declared here because we keep
@@ -41,12 +41,6 @@ typedef struct CheckPoint
     Oid            oldestXidDB;    /* database with minimum datfrozenxid */
     pg_time_t    time;            /* time stamp of checkpoint */

-    /* Important parameter settings at time of shutdown checkpoints */
-    int            MaxConnections;
-    int            max_prepared_xacts;
-    int            max_locks_per_xact;
-    bool        XLogStandbyInfoMode;
-
     /*
      * Oldest XID still running. This is only needed to initialize hot standby
      * mode from an online checkpoint, so we only bother calculating this for
@@ -63,7 +57,7 @@ typedef struct CheckPoint
 #define XLOG_NEXTOID                    0x30
 #define XLOG_SWITCH                        0x40
 #define XLOG_BACKUP_END                    0x50
-#define XLOG_UNLOGGED                    0x60
+#define XLOG_PARAMETER_CHANGE            0x60


 /* System status indicator */
@@ -77,6 +71,14 @@ typedef enum DBState
     DB_IN_PRODUCTION
 } DBState;

+/* WAL modes */
+typedef enum WalMode
+{
+    WAL_MODE_MINIMAL = 0,
+    WAL_MODE_ARCHIVE,
+    WAL_MODE_HOT_STANDBY
+} WalMode;
+
 /*
  * Contents of pg_control.
  *
@@ -142,6 +144,15 @@ typedef struct ControlFileData
     XLogRecPtr    backupStartPoint;

     /*
+     * Parameter settings that determine if the WAL can be used for archival
+     * or hot standby.
+     */
+    WalMode        wal_mode;
+    int            MaxConnections;
+    int            max_prepared_xacts;
+    int            max_locks_per_xact;
+
+    /*
      * This data is used to check for hardware-architecture compatibility of
      * the database and the backend executable.  We need not check endianness
      * explicitly, since the pg_control version will surely look wrong to a
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 6ad40a9..db64c88 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -39,6 +39,7 @@ extern bool am_walsender;

 /* user-settable parameters */
 extern int    WalSndDelay;
+extern int    max_wal_senders;

 extern int    WalSenderMain(void);
 extern void WalSndSignals(void);

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Wierd quirk of HS/SR, probably not fixable
Next
From: Fujii Masao
Date:
Subject: Re: Wierd quirk of HS/SR, probably not fixable