Thread: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

sriggs@postgresql.org (Simon Riggs) writes:
> Log Message:
> -----------
> Make CheckRequiredParameterValues() depend upon correct combination
> of parameters. Fix bug report by Robert Haas that error message and
> hint was incorrect if wrong mode parameters specified on master.
> Internal changes only. Proposals for parameter simplification on
> master/primary still under way.

> Modified Files:
> --------------
>     pgsql/src/backend/access/transam:
>         xlog.c (r1.401 -> r1.402)
>         (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.401&r2=1.402)
>     pgsql/src/include/catalog:
>         pg_control.h (r1.51 -> r1.52)
>         (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/pg_control.h?r1=1.51&r2=1.52)

This is a change in pg_control layout and requires a bump to the
pg_control version number (and hence forced initdb's all round).

I think it was quite premature to commit this when the design is
still under active discussion --- you may be forcing two rounds
of initdb on testers, when maybe only one or none would be enough.
Especially when you appear to be in the minority about what the design
should be.

            regards, tom lane

On Fri, 2010-04-23 at 16:04 -0400, Tom Lane wrote:
> sriggs@postgresql.org (Simon Riggs) writes:
> > Log Message:
> > -----------
> > Make CheckRequiredParameterValues() depend upon correct combination
> > of parameters. Fix bug report by Robert Haas that error message and
> > hint was incorrect if wrong mode parameters specified on master.
> > Internal changes only. Proposals for parameter simplification on
> > master/primary still under way.
>
> > Modified Files:
> > --------------
> >     pgsql/src/backend/access/transam:
> >         xlog.c (r1.401 -> r1.402)
> >         (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.401&r2=1.402)
> >     pgsql/src/include/catalog:
> >         pg_control.h (r1.51 -> r1.52)
> >         (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/pg_control.h?r1=1.51&r2=1.52)
>
> This is a change in pg_control layout and requires a bump to the
> pg_control version number (and hence forced initdb's all round).

OK

> I think it was quite premature to commit this when the design is
> still under active discussion --- you may be forcing two rounds
> of initdb on testers, when maybe only one or none would be enough.
> Especially when you appear to be in the minority about what the design
> should be.

No intention of doing that. This change allows people to see what the
dependency actually is once the bug has been fixed. Change needs to
start from here, not from where we were before.

--
 Simon Riggs           www.2ndQuadrant.com


Simon Riggs <simon@2ndQuadrant.com> writes:
> No intention of doing that. This change allows people to see what the
> dependency actually is once the bug has been fixed. Change needs to
> start from here, not from where we were before.

Well, actually, now that I've looked at the patch I think it's starting
from a fundamentally wrong position anyway.  Checkpoint records are a
completely wrong mechanism for transmitting this data to slaves, because
a checkpoint is emitted *after* we do something, not *before* we do it.
In particular it's ludicrous to be looking at shutdown checkpoints to
try to determine whether the subsequent WAL will meet the slave's
requirements.  There's no connection at all between what the GUC state
was at shutdown and what it might be after starting again.

A design that might work is
(1) store the active value of wal_mode in pg_control (but NOT as part of
the last-checkpoint-record image).
(2) invent a new WAL record type that is transmitted when we change
wal_mode.

Then, slaves could check whether the master's wal_mode is high enough
by looking at pg_control when they start plus any wal_mode_change
records they come across.

If we did this then we could get rid of those WAL record types that were
added to signify that information had been omitted from WAL at specific
times.

            regards, tom lane

On Fri, Apr 23, 2010 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> No intention of doing that. This change allows people to see what the
>> dependency actually is once the bug has been fixed. Change needs to
>> start from here, not from where we were before.
>
> Well, actually, now that I've looked at the patch I think it's starting
> from a fundamentally wrong position anyway.  Checkpoint records are a
> completely wrong mechanism for transmitting this data to slaves, because
> a checkpoint is emitted *after* we do something, not *before* we do it.
> In particular it's ludicrous to be looking at shutdown checkpoints to
> try to determine whether the subsequent WAL will meet the slave's
> requirements.  There's no connection at all between what the GUC state
> was at shutdown and what it might be after starting again.
>
> A design that might work is
> (1) store the active value of wal_mode in pg_control (but NOT as part of
> the last-checkpoint-record image).
> (2) invent a new WAL record type that is transmitted when we change
> wal_mode.

Well, right now wal_mode would only be able to be changed at server
restart.  Eventually we might relax that, but I think there are some
restrictions on how we can do it - like maybe needing to wait until
all the transactions running at the time the change was decided on
have committed, or, well, I'm not sure.

...Robert

On Fri, 2010-04-23 at 16:44 -0400, Tom Lane wrote:

> There's no connection at all between what the GUC state
> was at shutdown and what it might be after starting again.
>
> A design that might work is
> (1) store the active value of wal_mode in pg_control (but NOT as part of
> the last-checkpoint-record image).
> (2) invent a new WAL record type that is transmitted when we change
> wal_mode.
>
> Then, slaves could check whether the master's wal_mode is high enough
> by looking at pg_control when they start plus any wal_mode_change
> records they come across.

Seems OK on standby side. On the primary there are some other points,
mentioned on other thread as to when we can change wal_mode.

> If we did this then we could get rid of those WAL record types that were
> added to signify that information had been omitted from WAL at specific
> times.

Please.

--
 Simon Riggs           www.2ndQuadrant.com


Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 23, 2010 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A design that might work is
>> (1) store the active value of wal_mode in pg_control (but NOT as part of
>> the last-checkpoint-record image).
>> (2) invent a new WAL record type that is transmitted when we change
>> wal_mode.

> Well, right now wal_mode would only be able to be changed at server
> restart.

Right, but slave servers won't find out about the change until the first
checkpoint after the start.  Which is Too Late.

            regards, tom lane

On Fri, Apr 23, 2010 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> No intention of doing that. This change allows people to see what the
>> dependency actually is once the bug has been fixed. Change needs to
>> start from here, not from where we were before.
>
> Well, actually, now that I've looked at the patch I think it's starting
> from a fundamentally wrong position anyway.  Checkpoint records are a
> completely wrong mechanism for transmitting this data to slaves, because
> a checkpoint is emitted *after* we do something, not *before* we do it.
> In particular it's ludicrous to be looking at shutdown checkpoints to
> try to determine whether the subsequent WAL will meet the slave's
> requirements.  There's no connection at all between what the GUC state
> was at shutdown and what it might be after starting again.
>
> A design that might work is
> (1) store the active value of wal_mode in pg_control (but NOT as part of
> the last-checkpoint-record image).
> (2) invent a new WAL record type that is transmitted when we change
> wal_mode.
>
> Then, slaves could check whether the master's wal_mode is high enough
> by looking at pg_control when they start plus any wal_mode_change
> records they come across.
>
> If we did this then we could get rid of those WAL record types that were
> added to signify that information had been omitted from WAL at specific
> times.

<dons project manager hat>

I notice that Heikki's patch doesn't include doing the above.  Should
we?  If so, who's going to do it?

...Robert


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> On Fri, Apr 23, 2010 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, actually, now that I've looked at the patch I think it's starting
>> from a fundamentally wrong position anyway.  Checkpoint records are a
>> completely wrong mechanism for transmitting this data to slaves, because
>> a checkpoint is emitted *after* we do something, not *before* we do it.
>> In particular it's ludicrous to be looking at shutdown checkpoints to
>> try to determine whether the subsequent WAL will meet the slave's
>> requirements.  There's no connection at all between what the GUC state
>> was at shutdown and what it might be after starting again.
>>
>> A design that might work is
>> (1) store the active value of wal_mode in pg_control (but NOT as part of
>> the last-checkpoint-record image).
>> (2) invent a new WAL record type that is transmitted when we change
>> wal_mode.
>>
>> Then, slaves could check whether the master's wal_mode is high enough
>> by looking at pg_control when they start plus any wal_mode_change
>> records they come across.
>>
>> If we did this then we could get rid of those WAL record types that were
>> added to signify that information had been omitted from WAL at specific
>> times.
> 
> <dons project manager hat>
> 
> I notice that Heikki's patch doesn't include doing the above.  Should
> we?  If so, who's going to do it?

I'll give it a shot.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote:
> Robert Haas wrote:
>> On Fri, Apr 23, 2010 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, actually, now that I've looked at the patch I think it's starting
>>> from a fundamentally wrong position anyway.  Checkpoint records are a
>>> completely wrong mechanism for transmitting this data to slaves, because
>>> a checkpoint is emitted *after* we do something, not *before* we do it.
>>> In particular it's ludicrous to be looking at shutdown checkpoints to
>>> try to determine whether the subsequent WAL will meet the slave's
>>> requirements.  There's no connection at all between what the GUC state
>>> was at shutdown and what it might be after starting again.
>>>
>>> A design that might work is
>>> (1) store the active value of wal_mode in pg_control (but NOT as part of
>>> the last-checkpoint-record image).
>>> (2) invent a new WAL record type that is transmitted when we change
>>> wal_mode.
>>>
>>> Then, slaves could check whether the master's wal_mode is high enough
>>> by looking at pg_control when they start plus any wal_mode_change
>>> records they come across.
>>>
>>> If we did this then we could get rid of those WAL record types that were
>>> added to signify that information had been omitted from WAL at specific
>>> times.
>> <dons project manager hat>
>>
>> I notice that Heikki's patch doesn't include doing the above.  Should
>> we?  If so, who's going to do it?
>
> I'll give it a shot.

Ok, here's a patch that includes the changes to add new wal_mode GUC
(http://archives.postgresql.org/message-id/4BD581A6.60602@enterprisedb.com),
and implements Tom's design to keep a copy of wal_mode and the
max_connections, max_prepared_xacts and max_locks_per_xact settings in
pg_control.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index eb5765a..6c6a504 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -689,8 +689,7 @@ archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/ser
    </para>

    <para>
-    When <varname>archive_mode</> is <literal>off</> and <xref
-    linkend="guc-max-wal-senders"> is zero some SQL commands
+    When <varname>wal_mode</> is <literal>minimal</> some SQL commands
     are optimized to avoid WAL logging, as described in <xref
     linkend="populate-pitr">.  If archiving or streaming replication were
     turned on during execution of one of these statements, WAL would not
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c5692ba..63ca749 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1353,6 +1353,43 @@ SET ENABLE_SEQSCAN TO OFF;
      <title>Settings</title>
      <variablelist>

+     <varlistentry id="guc-wal-mode" xreflabel="wal_mode">
+      <term><varname>wal_mode</varname> (<type>enum</type>)</term>
+      <indexterm>
+       <primary><varname>wal_mode</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        <varname>wal_mode</> determines how much information is written
+        to the WAL. The default value is <literal>minimal</>, which writes
+        only minimal information needed to recover from a crash or immediate
+        shutdown. <literal>archive</> adds logging required for WAL archiving,
+        and <literal>hot_standby</> further adds extra information about
+        running transactions required to run read-only queries on a standby
+        server.
+        This parameter can only be set at server start.
+       </para>
+       <para>
+        In <literal>minimal</> mode, WAL-logging of some bulk operations, like
+        <command>CREATE INDEX</>, <command>CLUSTER</> and <command>COPY</> on
+        a table that was created or truncated in the same transaction can be
+        safely skipped, which can make those operations much faster, but
+        minimal WAL does not contain enough information to reconstruct the
+        data from a base backup and the WAL logs, so at least
+        <literal>archive</> level must be used to enable WAL archiving
+        (<xref linkend="guc-archive-mode">) and streaming replication. See
+        also <xref linkend="populate-pitr">.
+       </para>
+       <para>
+        In <literal>hot_standby</> mode, the same information is logged as
+        in <literal>archive</> mode, plus information needed to reconstruct
+        the status of running transactions from the WAL. To enable read-only
+        queries on a standby server, <varname>wal_mode</> must be set to
+        <literal>hot_standby</> on the primary.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-fsync" xreflabel="fsync">
       <indexterm>
        <primary><varname>fsync</> configuration parameter</primary>
@@ -1726,7 +1763,9 @@ SET ENABLE_SEQSCAN TO OFF;
         <varname>archive_mode</> and <varname>archive_command</> are
         separate variables so that <varname>archive_command</> can be
         changed without leaving archiving mode.
-        This parameter can only be set at server start.
+        This parameter can only be set at server start. It is ignored
+        unless <varname>wal_mode</> is set to <literal>archive</> or
+        <literal>hot_standby</>.
        </para>
       </listitem>
      </varlistentry>
@@ -1884,16 +1923,14 @@ SET ENABLE_SEQSCAN TO OFF;
       </indexterm>
       <listitem>
        <para>
-        Parameter has two roles. During recovery, specifies whether or not
-        you can connect and run queries to enable <xref linkend="hot-standby">.
-        During normal running, specifies whether additional information is written
-        to WAL to allow recovery connections on a standby server that reads
-        WAL data generated by this server. The default value is
+        During recovery, specifies whether or not you can connect and run
+        queries to enable <xref linkend="hot-standby">. The default value is
         <literal>on</literal>.  It is thought that there is little
         measurable difference in performance from using this feature, so
         feedback is welcome if any production impacts are noticeable.
         It is likely that this parameter will be removed in later releases.
-        This parameter can only be set at server start.
+        This parameter can only be set at server start. It is ignored when
+        not in standby mode.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index d69f2ea..7fa0817 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1589,9 +1589,9 @@ LOG:  database system is ready to accept read only connections
 </programlisting>

     Consistency information is recorded once per checkpoint on the primary, as long
-    as <varname>recovery_connections</> is enabled on the primary.  It is not possible
+    as <varname>wal_mode</> is set to <literal>hot_standby</> on the primary.  It is not possible
     to enable recovery connections on the standby when reading WAL written during the
-    period that <varname>recovery_connections</> was disabled on the primary.
+    period that <varname>wal_mode</> was not set to <literal>hot_standby</> on the primary.
     Reaching a consistent state can also be delayed in the presence
     of both of these conditions:

@@ -1838,7 +1838,7 @@ LOG:  database system is ready to accept read only connections
    </para>

    <para>
-    On the primary, parameters <varname>recovery_connections</> and
+    On the primary, parameters <varname>wal_mode</> and
     <varname>vacuum_defer_cleanup_age</> can be used.
     <varname>max_standby_delay</> has no effect if set on the primary.
    </para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index b00e69f..a493348 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -835,10 +835,9 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     <command>TRUNCATE</command> command. In such cases no WAL
     needs to be written, because in case of an error, the files
     containing the newly loaded data will be removed anyway.
-    However, this consideration does not apply when
-    <xref linkend="guc-archive-mode"> is on or streaming replication
-    is allowed (i.e., <xref linkend="guc-max-wal-senders"> is more
-    than or equal to one), as all commands must write WAL in that case.
+    However, this consideration only applies when
+    <xref linkend="guc-wal-mode"> is <literal>minimal</> as all commands
+    must write WAL otherwise.
    </para>

   </sect2>
@@ -910,18 +909,16 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
   </sect2>

   <sect2 id="populate-pitr">
-   <title>Turn off <varname>archive_mode</varname> and streaming replication</title>
+   <title>Disable WAL archival and streaming replication</title>

    <para>
     When loading large amounts of data into an installation that uses
     WAL archiving or streaming replication, you might want to disable
-    archiving (turn off the <xref linkend="guc-archive-mode">
-    configuration variable) and replication (zero the
-    <xref linkend="guc-max-wal-senders"> configuration variable)
-    while loading.  It might be
+    archiving and replication by setting the <xref linkend="guc-wal-mode">
+    configuration variable to <literal>minimal</> while loading.  It might be
     faster to take a new base backup after the load has completed
     than to process a large amount of incremental WAL data.
-    But note that changing either of these variables requires
+    But note that changing <varname>wal_mode</> requires
     a server restart.
    </para>

@@ -929,10 +926,9 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     Aside from avoiding the time for the archiver or WAL sender to
     process the WAL data,
     doing this will actually make certain commands faster, because they
-    are designed not to write WAL at all if <varname>archive_mode</varname>
-    is off and <varname>max_wal_senders</varname> is zero.  (They can
-    guarantee crash safety more cheaply by doing an
-    <function>fsync</> at the end than by writing WAL.)
+    are designed not to write WAL at all if <varname>wal_mode</varname>
+    is <literal>minimal</>.  (They can guarantee crash safety more cheaply
+    by doing an <function>fsync</> at the end than by writing WAL.)
     This applies to the following commands:
     <itemizedlist>
      <listitem>
@@ -1015,9 +1011,10 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
      <listitem>
       <para>
        If using WAL archiving, consider disabling it during the restore.
-       To do that, turn off <varname>archive_mode</varname> before loading the
-       dump script, and afterwards turn it back on
-       and take a fresh base backup.
+       To do that, set <varname>wal_mode</varname> to <literal>minimal</>
+       before loading the dump script, and afterwards set it back to
+       <literal>archive</> or <literal>hot_standby</> and take a fresh
+       base backup.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index ac075cb..99235da 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -278,16 +278,6 @@ end_heap_rewrite(RewriteState state)
                    (char *) state->rs_buffer, true);
     }

-    /* Write an XLOG UNLOGGED record if WAL-logging was skipped */
-    if (!state->rs_use_wal && !state->rs_new_rel->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 30];
-
-        snprintf(reason, sizeof(reason), "heap rewrite on \"%s\"",
-                 RelationGetRelationName(state->rs_new_rel));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /*
      * If the rel isn't temp, must fsync before commit.  We use heap_sync to
      * ensure that the toast table gets fsync'd too.
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 6cb3cac..89ed8a0 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -215,19 +215,6 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
      */
     wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!wstate.btws_use_wal && !wstate.index->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 20];
-
-        snprintf(reason, sizeof(reason), "b-tree build on \"%s\"",
-                 RelationGetRelationName(wstate.index));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /* reserve the metapage */
     wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
     wstate.btws_pages_written = 0;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7647f4e..e1209bb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -76,6 +76,7 @@ int            MaxStandbyDelay = 30;
 bool        fullPageWrites = true;
 bool        log_checkpoints = false;
 int            sync_method = DEFAULT_SYNC_METHOD;
+int            wal_mode = WAL_MODE_MINIMAL;

 #ifdef WAL_DEBUG
 bool        XLOG_DEBUG = false;
@@ -97,6 +98,13 @@ bool        XLOG_DEBUG = false;
 /*
  * GUC support
  */
+const struct config_enum_entry wal_mode_options[] = {
+    {"minimal", WAL_MODE_MINIMAL, false},
+    {"archive", WAL_MODE_ARCHIVE, false},
+    {"hot_standby", WAL_MODE_HOT_STANDBY, false},
+    {NULL, 0, false}
+};
+
 const struct config_enum_entry sync_method_options[] = {
     {"fsync", SYNC_METHOD_FSYNC, false},
 #ifdef HAVE_FSYNC_WRITETHROUGH
@@ -501,6 +509,18 @@ static bool reachedMinRecoveryPoint = false;
 static bool InRedo = false;

 /*
+ * Information logged when we detect a change in one of the parameters
+ * important for Hot Standby.
+ */
+typedef struct xl_parameter_change
+{
+    int            MaxConnections;
+    int            max_prepared_xacts;
+    int            max_locks_per_xact;
+    int            wal_mode;
+} xl_parameter_change;
+
+/*
  * Flags set by interrupt handlers for later service in the redo loop.
  */
 static volatile sig_atomic_t got_SIGHUP = false;
@@ -522,7 +542,8 @@ static void readRecoveryCommandFile(void);
 static void exitArchiveRecovery(TimeLineID endTLI,
                     uint32 endLogId, uint32 endLogSeg);
 static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
-static void CheckRequiredParameterValues(CheckPoint checkPoint);
+static void CheckRequiredParameterValues(void);
+static void XLogReportParameters(void);
 static void LocalSetXLogInsertAllowed(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);

@@ -4922,6 +4943,13 @@ BootStrapXLOG(void)
     ControlFile->time = checkPoint.time;
     ControlFile->checkPoint = checkPoint.redo;
     ControlFile->checkPointCopy = checkPoint;
+
+    /* Set important parameter values for use when replaying WAL */
+    ControlFile->MaxConnections = MaxConnections;
+    ControlFile->max_prepared_xacts = max_prepared_xacts;
+    ControlFile->max_locks_per_xact = max_locks_per_xact;
+    ControlFile->wal_mode = wal_mode;
+
     /* some additional ControlFile fields are set in WriteControlFile() */

     WriteControlFile();
@@ -5539,17 +5567,18 @@ GetLatestXLogTime(void)
 }

 /*
- * Note that text field supplied is a parameter name and does not require translation
+ * Note that text field supplied is a parameter name and does not require
+ * translation
  */
-#define RecoveryRequiresIntParameter(param_name, currValue, checkpointValue) \
+#define RecoveryRequiresIntParameter(param_name, currValue, minValue) \
 { \
-    if (currValue < checkpointValue) \
+    if (currValue < minValue) \
         ereport(ERROR, \
             (errmsg("recovery connections cannot continue because " \
                     "%s = %u is a lower setting than on WAL source server (value was %u)", \
                     param_name, \
                     currValue, \
-                    checkpointValue))); \
+                    minValue))); \
 }

 /*
@@ -5557,21 +5586,37 @@ GetLatestXLogTime(void)
  * for various aspects of recovery operation.
  */
 static void
-CheckRequiredParameterValues(CheckPoint checkPoint)
+CheckRequiredParameterValues(void)
 {
-    /* We ignore autovacuum_max_workers when we make this test. */
-    RecoveryRequiresIntParameter("max_connections",
-                                 MaxConnections, checkPoint.MaxConnections);
+    /*
+     * For archive recovery, the WAL must be generated with at least
+     * 'archive' wal_mode.
+     */
+    if (InArchiveRecovery && ControlFile->wal_mode == WAL_MODE_MINIMAL)
+    {
+        ereport(WARNING,
+                (errmsg("WAL was generated with wal_mode='minimal', data may be missing"),
+                 errhint("This happens if you temporarily set wal_mode='minimal' without taking a new base
backup.")));
+    }

-    RecoveryRequiresIntParameter("max_prepared_xacts",
-                          max_prepared_xacts, checkPoint.max_prepared_xacts);
-    RecoveryRequiresIntParameter("max_locks_per_xact",
-                          max_locks_per_xact, checkPoint.max_locks_per_xact);
+    /*
+     * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
+     * and we must have at least as many backend slots as the primary.
+     */
+    if (InArchiveRecovery && XLogRequestRecoveryConnections)
+    {
+        if (ControlFile->wal_mode < WAL_MODE_HOT_STANDBY)
+            ereport(ERROR,
+                    (errmsg("recovery connections cannot start because wal_mode was not set to 'hot_standby' on the
WALsource server"))); 

-    if (!checkPoint.XLogStandbyInfoMode)
-        ereport(ERROR,
-                (errmsg("recovery connections cannot start because the recovery_connections "
-                        "parameter is disabled on the WAL source server")));
+        /* We ignore autovacuum_max_workers when we make this test. */
+        RecoveryRequiresIntParameter("max_connections",
+                                     MaxConnections, ControlFile->MaxConnections);
+        RecoveryRequiresIntParameter("max_prepared_xacts",
+                                     max_prepared_xacts, ControlFile->max_prepared_xacts);
+        RecoveryRequiresIntParameter("max_locks_per_xact",
+                                     max_locks_per_xact, ControlFile->max_locks_per_xact);
+    }
 }

 /*
@@ -5904,6 +5949,9 @@ StartupXLOG(void)
                                 BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
         }

+        /* Check that the GUCs used to generate the WAL allow recovery */
+        CheckRequiredParameterValues();
+
         /*
          * Initialize recovery connections, if enabled. We won't let backends
          * in yet, not until we've reached the min recovery point specified in
@@ -5915,8 +5963,6 @@ StartupXLOG(void)
             TransactionId *xids;
             int            nxids;

-            CheckRequiredParameterValues(checkPoint);
-
             ereport(DEBUG1,
                     (errmsg("initializing recovery connections")));

@@ -6401,6 +6447,13 @@ StartupXLOG(void)
     }

     /*
+     * If any of the critical GUCs have changed, log them before we allow
+     * any backends to write WAL
+     */
+    LocalSetXLogInsertAllowed();
+    XLogReportParameters();
+
+    /*
      * All done.  Allow backends to write WAL.    (Although the bool flag is
      * probably atomic in itself, we use the info_lck here to ensure that
      * there are no race conditions concerning visibility of other recent
@@ -6998,12 +7051,6 @@ CreateCheckPoint(int flags)
     MemSet(&checkPoint, 0, sizeof(checkPoint));
     checkPoint.time = (pg_time_t) time(NULL);

-    /* Set important parameter values for use when replaying WAL */
-    checkPoint.MaxConnections = MaxConnections;
-    checkPoint.max_prepared_xacts = max_prepared_xacts;
-    checkPoint.max_locks_per_xact = max_locks_per_xact;
-    checkPoint.XLogStandbyInfoMode = XLogStandbyInfoActive();
-
     /*
      * We must hold WALInsertLock while examining insert state to determine
      * the checkpoint REDO pointer.
@@ -7647,28 +7694,49 @@ RequestXLogSwitch(void)
 }

 /*
- * Write an XLOG UNLOGGED record, indicating that some operation was
- * performed on data that we fsync()'d directly to disk, skipping
- * WAL-logging.
- *
- * Such operations screw up archive recovery, so we complain if we see
- * these records during archive recovery. That shouldn't happen in a
- * correctly configured server, but you can induce it by temporarily
- * disabling archiving and restarting, so it's good to at least get a
- * warning of silent data loss in such cases. These records serve no
- * other purpose and are simply ignored during crash recovery.
+ * Check if any of the GUC parameters that are critical for hot standby
+ * have changed, and update the value in pg_control file if necessary.
  */
-void
-XLogReportUnloggedStatement(char *reason)
+static void
+XLogReportParameters(void)
 {
-    XLogRecData rdata;
+    if (wal_mode != ControlFile->wal_mode ||
+        MaxConnections != ControlFile->MaxConnections ||
+        max_prepared_xacts != ControlFile->max_prepared_xacts ||
+        max_locks_per_xact != max_locks_per_xact)
+    {
+        /*
+         * The change in number of backend slots doesn't need to be
+         * WAL-logged if archiving is not enabled, as you can't start
+         * archive recovery with wal_mode='minimal' anyway. We don't
+         * really care about the values in pg_control either if
+         * wal_mode='minimal', but seems better to keep them up-to-date
+         * to avoid confusion.
+         */
+        if (wal_mode != ControlFile->wal_mode || XLogIsNeeded())
+        {
+            XLogRecData rdata;
+            xl_parameter_change xlrec;

-    rdata.buffer = InvalidBuffer;
-    rdata.data = reason;
-    rdata.len = strlen(reason) + 1;
-    rdata.next = NULL;
+            xlrec.MaxConnections = MaxConnections;
+            xlrec.max_prepared_xacts = max_prepared_xacts;
+            xlrec.max_locks_per_xact = max_locks_per_xact;
+            xlrec.wal_mode = wal_mode;
+
+            rdata.buffer = InvalidBuffer;
+            rdata.data = (char *) &xlrec;
+            rdata.len = sizeof(xlrec);
+            rdata.next = NULL;
+
+            XLogInsert(RM_XLOG_ID, XLOG_PARAMETER_CHANGE, &rdata);
+        }

-    XLogInsert(RM_XLOG_ID, XLOG_UNLOGGED, &rdata);
+        ControlFile->MaxConnections = MaxConnections;
+        ControlFile->max_prepared_xacts = max_prepared_xacts;
+        ControlFile->max_locks_per_xact = max_locks_per_xact;
+        ControlFile->wal_mode = wal_mode;
+        UpdateControlFile();
+    }
 }

 /*
@@ -7709,10 +7777,6 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
                               checkPoint.nextMultiOffset);
         SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);

-        /* Check to see if any changes to max_connections give problems */
-        if (standbyState != STANDBY_DISABLED)
-            CheckRequiredParameterValues(checkPoint);
-
         /*
          * If we see a shutdown checkpoint, we know that nothing was
          * running on the master at this point. So fake-up an empty
@@ -7834,18 +7898,21 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
             LWLockRelease(ControlFileLock);
         }
     }
-    else if (info == XLOG_UNLOGGED)
+    else if (info == XLOG_PARAMETER_CHANGE)
     {
-        if (InArchiveRecovery)
-        {
-            /*
-             * Note: We don't print the reason string from the record, because
-             * that gets added as a line using xlog_desc()
-             */
-            ereport(WARNING,
-                (errmsg("unlogged operation performed, data may be missing"),
-                 errhint("This can happen if you temporarily disable archive_mode without taking a new base
backup.")));
-        }
+        xl_parameter_change xlrec;
+
+        /* Update our copy of the parameters in pg_control */
+        memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+
+        ControlFile->MaxConnections = xlrec.MaxConnections;
+        ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+        ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
+        ControlFile->wal_mode = xlrec.wal_mode;
+        UpdateControlFile();
+
+        /* Check to see if any changes to max_connections give problems */
+        CheckRequiredParameterValues();
     }
 }

@@ -7896,11 +7963,30 @@ xlog_desc(StringInfo buf, uint8 xl_info, char *rec)
         appendStringInfo(buf, "backup end: %X/%X",
                          startpoint.xlogid, startpoint.xrecoff);
     }
-    else if (info == XLOG_UNLOGGED)
+    else if (info == XLOG_PARAMETER_CHANGE)
     {
-        char       *reason = rec;
+        xl_parameter_change xlrec;
+        const char *wal_mode_str;
+        const struct config_enum_entry *entry;
+
+        memcpy(&xlrec, rec, sizeof(xl_parameter_change));
+
+        /* Find a string representation for wal_mode */
+        wal_mode_str = "?";
+        for (entry = wal_mode_options; entry->name; entry++)
+        {
+            if (entry->val == xlrec.wal_mode)
+            {
+                wal_mode_str = entry->name;
+                break;
+            }
+        }

-        appendStringInfo(buf, "unlogged operation: %s", reason);
+        appendStringInfo(buf, "parameter change: max_connections=%d max_prepared_xacts=%d max_locks_per_xact=%d
wal_mode=%s",
+                         xlrec.MaxConnections,
+                         xlrec.max_prepared_xacts,
+                         xlrec.max_locks_per_xact,
+                         wal_mode_str);
     }
     else
         appendStringInfo(buf, "UNKNOWN");
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 30a00ab..ccb4599 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -787,23 +787,6 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
      */
     use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!use_wal && !NewHeap->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 32];
-
-        if (OldIndex != NULL)
-            snprintf(reason, sizeof(reason), "CLUSTER on \"%s\"",
-                     RelationGetRelationName(NewHeap));
-        else
-            snprintf(reason, sizeof(reason), "VACUUM FULL on \"%s\"",
-                     RelationGetRelationName(NewHeap));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /* use_wal off requires smgr_targblock be initially invalid */
     Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 84a83f1..9d46e47 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2223,14 +2223,7 @@ CopyFrom(CopyState cstate)
      * indexes since those use WAL anyway)
      */
     if (hi_options & HEAP_INSERT_SKIP_WAL)
-    {
-        char        reason[NAMEDATALEN + 30];
-
-        snprintf(reason, sizeof(reason), "COPY FROM on \"%s\"",
-                 RelationGetRelationName(cstate->rel));
-        XLogReportUnloggedStatement(reason);
         heap_sync(cstate->rel);
-    }
 }


diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 25b2807..9b5ce65 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3272,14 +3272,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)

         /* If we skipped writing WAL, then we need to sync the heap. */
         if (hi_options & HEAP_INSERT_SKIP_WAL)
-        {
-            char        reason[NAMEDATALEN + 30];
-
-            snprintf(reason, sizeof(reason), "table rewrite on \"%s\"",
-                     RelationGetRelationName(newrel));
-            XLogReportUnloggedStatement(reason);
             heap_sync(newrel);
-        }

         heap_close(newrel, NoLock);
     }
@@ -7021,20 +7014,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace)

     heap_close(pg_class, RowExclusiveLock);

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!XLogIsNeeded() && !rel->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 40];
-
-        snprintf(reason, sizeof(reason), "ALTER TABLE SET TABLESPACE on \"%s\"",
-                 RelationGetRelationName(rel));
-
-        XLogReportUnloggedStatement(reason);
-    }
-
     relation_close(rel, NoLock);

     /* Make sure the reltablespace change is visible */
@@ -7063,10 +7042,6 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
     /*
      * We need to log the copied data in WAL iff WAL archiving/streaming is
      * enabled AND it's not a temp rel.
-     *
-     * Note: If you change the conditions here, update the conditions in
-     * ATExecSetTableSpace() for when an XLOG UNLOGGED record is written to
-     * match.
      */
     use_wal = XLogIsNeeded() && !istemp;

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d5e7e3a..d299310 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2241,14 +2241,7 @@ CloseIntoRel(QueryDesc *queryDesc)

         /* If we skipped using WAL, must heap_sync before commit */
         if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
-        {
-            char        reason[NAMEDATALEN + 30];
-
-            snprintf(reason, sizeof(reason), "SELECT INTO on \"%s\"",
-                     RelationGetRelationName(myState->rel));
-            XLogReportUnloggedStatement(reason);
             heap_sync(myState->rel);
-        }

         /* close rel, but keep lock until commit */
         heap_close(myState->rel, NoLock);
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 47f71bd..eeab0f3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -728,6 +728,12 @@ PostmasterMain(int argc, char *argv[])
         write_stderr("%s: superuser_reserved_connections must be less than max_connections\n", progname);
         ExitPostmaster(1);
     }
+    if (XLogArchiveMode && wal_mode == WAL_MODE_MINIMAL)
+        ereport(WARNING,
+                (errmsg("archive_mode ignored because wal_mode is 'minimal'")));
+    if (max_wal_senders > 0 && wal_mode == WAL_MODE_MINIMAL)
+        ereport(WARNING,
+                (errmsg("WAL streaming connections not allowed because wal_mode is 'minimal'")));

     /*
      * Other one-time internal sanity checks can go here, if they are fast.
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3838665..35bc772 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -253,6 +253,24 @@ WalSndHandshake(void)
                     {
                         StringInfoData buf;

+                        /*
+                         * Check that we're logging enough information in the
+                         * WAL for log-shipping.
+                         *
+                         * NOTE: This only checks the current value of
+                         * wal_mode. Even if the current setting is not
+                         * 'minimal', there can be old WAL in the pg_xlog
+                         * directory that was created with 'minimal'.
+                         * So this is not bulletproof, the purpose is
+                         * just to give a user-friendly error message that
+                         * hints how to configure the system correctly.
+                         */
+                        if (wal_mode == WAL_MODE_MINIMAL)
+                            ereport(FATAL,
+                                    (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+                                     errmsg("standby connections not allowed because wal_mode='minimal'")));
+
+
                         /* Send a CopyOutResponse message, and start streaming */
                         pq_beginmessage(&buf, 'H');
                         pq_sendbyte(&buf, 0);
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 51753d6..a92a874 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -256,7 +256,7 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
      */
     if (!TransactionIdIsValid(latestRemovedXid))
     {
-        elog(DEBUG1, "Invalid latestremovexXid reported, using latestcompletedxid instead");
+        elog(DEBUG1, "invalid latestremovexXid reported, using latestcompletedxid instead");

         LWLockAcquire(ProcArrayLock, LW_SHARED);
         latestRemovedXid = ShmemVariableCache->latestCompletedXid;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2434cc0..2fb4090 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -340,6 +340,7 @@ static const struct config_enum_entry constraint_exclusion_options[] = {
 /*
  * Options for enum values stored in other modules
  */
+extern const struct config_enum_entry wal_mode_options[];
 extern const struct config_enum_entry sync_method_options[];

 /*
@@ -2785,6 +2786,15 @@ static struct config_enum ConfigureNamesEnum[] =
     },

     {
+        {"wal_mode", PGC_POSTMASTER, WAL_SETTINGS,
+            gettext_noop("Set the level of information written to the WAL."),
+            NULL
+        },
+        &wal_mode,
+        WAL_MODE_MINIMAL, wal_mode_options, NULL
+    },
+
+    {
         {"wal_sync_method", PGC_SIGHUP, WAL_SETTINGS,
             gettext_noop("Selects the method used for forcing WAL updates to disk."),
             NULL
@@ -7862,7 +7872,7 @@ pg_timezone_abbrev_initialize(void)
 static const char *
 show_archive_command(void)
 {
-    if (XLogArchiveMode)
+    if (XLogArchivingActive())
         return XLogArchiveCommand;
     else
         return "(disabled)";
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 92763eb..c9ee77c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -150,6 +150,7 @@

 # - Settings -

+#wal_mode = minimal            # minimal, archive, or hot_standby
 #fsync = on                # turns forced synchronization on or off
 #synchronous_commit = on        # immediate fsync at commit
 #wal_sync_method = fsync        # the default is the first option
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index cb16cdf..668a912 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -60,6 +60,21 @@ dbState(DBState state)
     return _("unrecognized status code");
 }

+static const char *
+wal_mode_str(int wal_mode)
+{
+    switch (wal_mode)
+    {
+        case WAL_MODE_MINIMAL:
+            return "minimal";
+        case WAL_MODE_ARCHIVE:
+            return "archive";
+        case WAL_MODE_HOT_STANDBY:
+            return "hot_standby";
+    }
+    return _("unrecognized wal_mode");
+}
+

 int
 main(int argc, char *argv[])
@@ -206,6 +221,14 @@ main(int argc, char *argv[])
     printf(_("Backup start location:                %X/%X\n"),
            ControlFile.backupStartPoint.xlogid,
            ControlFile.backupStartPoint.xrecoff);
+    printf(_("Last wal_mode setting:                %s\n"),
+           wal_mode_str(ControlFile.wal_mode));
+    printf(_("Last max_connections setting:         %d\n"),
+           ControlFile.MaxConnections);
+    printf(_("Last max_prepared_xacts setting:      %d\n"),
+           ControlFile.max_prepared_xacts);
+    printf(_("Last max_locks_per_xact setting:      %d\n"),
+           ControlFile.max_locks_per_xact);
     printf(_("Maximum data alignment:               %u\n"),
            ControlFile.maxAlign);
     /* we don't print floatFormat since can't say much useful about it */
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 3d2a7bc..9f2a592 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -628,6 +628,15 @@ RewriteControlFile(void)
     ControlFile.backupStartPoint.xlogid = 0;
     ControlFile.backupStartPoint.xrecoff = 0;

+    /*
+     * Use the defaults for max_* settings. The values don't matter
+     * as long as wal_mode='minimal'.
+     */
+    ControlFile.MaxConnections = 100;
+    ControlFile.max_prepared_xacts = 0;
+    ControlFile.max_locks_per_xact = 64;
+    ControlFile.wal_mode = WAL_MODE_MINIMAL;
+
     /* Now we can force the recorded xlog seg size to the right thing. */
     ControlFile.xlog_seg_size = XLogSegSize;

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 6bfc7d5..1fdd648 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -13,6 +13,7 @@

 #include "access/rmgr.h"
 #include "access/xlogdefs.h"
+#include "catalog/pg_control.h"
 #include "lib/stringinfo.h"
 #include "storage/buf.h"
 #include "utils/pg_crc.h"
@@ -195,24 +196,19 @@ extern int    XLogArchiveTimeout;
 extern bool log_checkpoints;
 extern bool XLogRequestRecoveryConnections;
 extern int    MaxStandbyDelay;
+extern int    wal_mode;

-#define XLogArchivingActive()    (XLogArchiveMode)
+#define XLogArchivingActive()    (XLogArchiveMode && wal_mode >= WAL_MODE_ARCHIVE)
 #define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')

 /*
- * This is in walsender.c, but declared here so that we don't need to include
- * walsender.h in all files that check XLogIsNeeded()
+ * Is WAL-logging necessary for archival or log-shipping, or can we skip
+ * WAL-logging if we fsync() the data before committing instead?
  */
-extern int    max_wal_senders;
-
-/*
- * Is WAL-logging necessary? We need to log an XLOG record iff either
- * WAL archiving is enabled or XLOG streaming is allowed.
- */
-#define XLogIsNeeded() (XLogArchivingActive() || (max_wal_senders > 0))
+#define XLogIsNeeded() (wal_mode >= WAL_MODE_ARCHIVE)

 /* Do we need to WAL-log information required only for Hot Standby? */
-#define XLogStandbyInfoActive() (XLogRequestRecoveryConnections && XLogIsNeeded())
+#define XLogStandbyInfoActive() (wal_mode >= WAL_MODE_HOT_STANDBY)

 #ifdef WAL_DEBUG
 extern bool XLOG_DEBUG;
@@ -293,7 +289,6 @@ extern void InitXLOGAccess(void);
 extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
 extern void XLogPutNextOid(Oid nextOid);
-extern void XLogReportUnloggedStatement(char *reason);
 extern XLogRecPtr GetRedoRecPtr(void);
 extern XLogRecPtr GetInsertRecPtr(void);
 extern XLogRecPtr GetWriteRecPtr(void);
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 9d65546..54fce3e 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -21,7 +21,7 @@


 /* Version identifier for this pg_control format */
-#define PG_CONTROL_VERSION    901
+#define PG_CONTROL_VERSION    901 /* XXX bump before commit */

 /*
  * Body of CheckPoint XLOG records.  This is declared here because we keep
@@ -41,12 +41,6 @@ typedef struct CheckPoint
     Oid            oldestXidDB;    /* database with minimum datfrozenxid */
     pg_time_t    time;            /* time stamp of checkpoint */

-    /* Important parameter settings at time of shutdown checkpoints */
-    int            MaxConnections;
-    int            max_prepared_xacts;
-    int            max_locks_per_xact;
-    bool        XLogStandbyInfoMode;
-
     /*
      * Oldest XID still running. This is only needed to initialize hot standby
      * mode from an online checkpoint, so we only bother calculating this for
@@ -63,7 +57,7 @@ typedef struct CheckPoint
 #define XLOG_NEXTOID                    0x30
 #define XLOG_SWITCH                        0x40
 #define XLOG_BACKUP_END                    0x50
-#define XLOG_UNLOGGED                    0x60
+#define XLOG_PARAMETER_CHANGE            0x60


 /* System status indicator */
@@ -77,6 +71,14 @@ typedef enum DBState
     DB_IN_PRODUCTION
 } DBState;

+/* WAL modes */
+typedef enum WalMode
+{
+    WAL_MODE_MINIMAL = 0,
+    WAL_MODE_ARCHIVE,
+    WAL_MODE_HOT_STANDBY
+} WalMode;
+
 /*
  * Contents of pg_control.
  *
@@ -142,6 +144,15 @@ typedef struct ControlFileData
     XLogRecPtr    backupStartPoint;

     /*
+     * Parameter settings that determine if the WAL can be used for archival
+     * or hot standby.
+     */
+    WalMode        wal_mode;
+    int            MaxConnections;
+    int            max_prepared_xacts;
+    int            max_locks_per_xact;
+
+    /*
      * This data is used to check for hardware-architecture compatibility of
      * the database and the backend executable.  We need not check endianness
      * explicitly, since the pg_control version will surely look wrong to a
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 6ad40a9..db64c88 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -39,6 +39,7 @@ extern bool am_walsender;

 /* user-settable parameters */
 extern int    WalSndDelay;
+extern int    max_wal_senders;

 extern int    WalSenderMain(void);
 extern void WalSndSignals(void);

On Tue, Apr 27, 2010 at 5:09 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Ok, here's a patch that includes the changes to add new wal_mode GUC
> (http://archives.postgresql.org/message-id/4BD581A6.60602@enterprisedb.com),
> and implements Tom's design to keep a copy of wal_mode and the
> max_connections, max_prepared_xacts and max_locks_per_xact settings in
> pg_control.

I have some comments:

config.sgml
> <literal>on</literal>.  It is thought that there is little
> measurable difference in performance from using this feature, so
> feedback is welcome if any production impacts are noticeable.
> It is likely that this parameter will be removed in later releases.

Is this description still required for recovery_connections?


> if (!XLogArchivingActive())
>   ereport(ERROR,
>     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>      errmsg("WAL archiving is not active"),
>      errhint("archive_mode must be enabled at server start.")));

You need to change the error messages which refer to archive_mode,
like the above.


+ /*
+  * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
+  * and we must have at least as many backend slots as the primary.
+  */
+ if (InArchiveRecovery && XLogRequestRecoveryConnections)
+ {
+   if (ControlFile->wal_mode < WAL_MODE_HOT_STANDBY)
+       ereport(ERROR,
+              (errmsg("recovery connections cannot start because
wal_mode was not set to 'hot_standby' on the WAL source server")));

This seems to always prevent the server from doing an archive recovery
since wal_mode is expected to be WAL_MODE_ARCHIVE in that case.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> config.sgml
>> <literal>on</literal>.  It is thought that there is little
>> measurable difference in performance from using this feature, so
>> feedback is welcome if any production impacts are noticeable.
>> It is likely that this parameter will be removed in later releases.
> 
> Is this description still required for recovery_connections?

Hmm, I guess it was referring to setting recovery_connections in the
master, I don't see us removing that option from the standby in the
future. recovery_connections in the master is being replaced with the
wal_mode setting, so I guess that's not required anymore.

>> if (!XLogArchivingActive())
>>   ereport(ERROR,
>>     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>      errmsg("WAL archiving is not active"),
>>      errhint("archive_mode must be enabled at server start.")));
> 
> You need to change the error messages which refer to archive_mode,
> like the above.

Hmm, I think we should change not only the error message, but the logic
too. There's two related checks there:

>     if (!XLogArchivingActive())
>         ereport(ERROR,
>                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                  errmsg("WAL archiving is not active"),
>                  errhint("archive_mode must be enabled at server start.")));
> 
>     if (!XLogArchiveCommandSet())
>         ereport(ERROR,
>                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                  errmsg("WAL archiving is not active"),
>                  errhint("archive_command must be defined before "
>                          "online backups can be made safely.")));

You can use streaming replication too to transport the WAL generated
during the backup, so I think we should just check that wal_mode>='archive'.

> + /*
> +  * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
> +  * and we must have at least as many backend slots as the primary.
> +  */
> + if (InArchiveRecovery && XLogRequestRecoveryConnections)
> + {
> +   if (ControlFile->wal_mode < WAL_MODE_HOT_STANDBY)
> +       ereport(ERROR,
> +              (errmsg("recovery connections cannot start because
> wal_mode was not set to 'hot_standby' on the WAL source server")));
> 
> This seems to always prevent the server from doing an archive recovery
> since wal_mode is expected to be WAL_MODE_ARCHIVE in that case.

No, it doesn't prevent archive recovery. It only prevents hot standby if
wal_mode was not 'hot_standby' in the master. I think you missed the "&&
XLogRequestRecoveryConnections" condition above.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Tue, Apr 27, 2010 at 6:49 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>> config.sgml
>>> <literal>on</literal>.  It is thought that there is little
>>> measurable difference in performance from using this feature, so
>>> feedback is welcome if any production impacts are noticeable.
>>> It is likely that this parameter will be removed in later releases.
>>
>> Is this description still required for recovery_connections?
>
> Hmm, I guess it was referring to setting recovery_connections in the
> master, I don't see us removing that option from the standby in the
> future. recovery_connections in the master is being replaced with the
> wal_mode setting, so I guess that's not required anymore.

Agreed.

>>> if (!XLogArchivingActive())
>>>   ereport(ERROR,
>>>     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>      errmsg("WAL archiving is not active"),
>>>      errhint("archive_mode must be enabled at server start.")));
>>
>> You need to change the error messages which refer to archive_mode,
>> like the above.
>
> Hmm, I think we should change not only the error message, but the logic
> too. There's two related checks there:
>
>>       if (!XLogArchivingActive())
>>               ereport(ERROR,
>>                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>                                errmsg("WAL archiving is not active"),
>>                                errhint("archive_mode must be enabled at server start.")));
>>
>>       if (!XLogArchiveCommandSet())
>>               ereport(ERROR,
>>                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>                                errmsg("WAL archiving is not active"),
>>                                errhint("archive_command must be defined before "
>>                                                "online backups can be made safely.")));
>
> You can use streaming replication too to transport the WAL generated
> during the backup, so I think we should just check that wal_mode>='archive'.

It's OK in pg_start_backup(), but seems NG in pg_stop_backup() since
it waits until some WAL files have been archived by the archiver. No?

>> + /*
>> +  * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
>> +  * and we must have at least as many backend slots as the primary.
>> +  */
>> + if (InArchiveRecovery && XLogRequestRecoveryConnections)
>> + {
>> +   if (ControlFile->wal_mode < WAL_MODE_HOT_STANDBY)
>> +       ereport(ERROR,
>> +              (errmsg("recovery connections cannot start because
>> wal_mode was not set to 'hot_standby' on the WAL source server")));
>>
>> This seems to always prevent the server from doing an archive recovery
>> since wal_mode is expected to be WAL_MODE_ARCHIVE in that case.
>
> No, it doesn't prevent archive recovery. It only prevents hot standby if
> wal_mode was not 'hot_standby' in the master. I think you missed the "&&
> XLogRequestRecoveryConnections" condition above.

Even if we do only archive recovery, XLogRequestRecoveryConnections
might be TRUE. Or we need to ensure that the recovery_connection is
FALSE in the postgresql.conf before starting archive recovery?

And I tried archive recovery, and encountered the following error.
 LOG:  starting archive recovery LOG:  restored log file "000000010000000000000001" from archive FATAL:  recovery
connectionscannot start because wal_mode was not 
set to 'hot_standby' on the WAL source server LOG:  startup process (PID 32512) exited with exit code 1 LOG:  aborting
startupdue to startup process failure 

XLOG-related parameters in postgresql.conf archive_mode = on archive_command = 'cp %p ../data.arh/%f' wal_mode =
archiverecovery_connections = on 

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Tue, Apr 27, 2010 at 6:49 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Fujii Masao wrote:
>>>> if (!XLogArchivingActive())
>>>>   ereport(ERROR,
>>>>     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>      errmsg("WAL archiving is not active"),
>>>>      errhint("archive_mode must be enabled at server start.")));
>>> You need to change the error messages which refer to archive_mode,
>>> like the above.
>> Hmm, I think we should change not only the error message, but the logic
>> too. There's two related checks there:
>>
>>>       if (!XLogArchivingActive())
>>>               ereport(ERROR,
>>>                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>                                errmsg("WAL archiving is not active"),
>>>                                errhint("archive_mode must be enabled at server start.")));
>>>
>>>       if (!XLogArchiveCommandSet())
>>>               ereport(ERROR,
>>>                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>                                errmsg("WAL archiving is not active"),
>>>                                errhint("archive_command must be defined before "
>>>                                                "online backups can be made safely.")));
>> You can use streaming replication too to transport the WAL generated
>> during the backup, so I think we should just check that wal_mode>='archive'.
> 
> It's OK in pg_start_backup(), but seems NG in pg_stop_backup() since
> it waits until some WAL files have been archived by the archiver. No?

Good point, that logic would need to be changed too. Should it simply
return immediately if archive_mode=off?

>>> + /*
>>> +  * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
>>> +  * and we must have at least as many backend slots as the primary.
>>> +  */
>>> + if (InArchiveRecovery && XLogRequestRecoveryConnections)
>>> + {
>>> +   if (ControlFile->wal_mode < WAL_MODE_HOT_STANDBY)
>>> +       ereport(ERROR,
>>> +              (errmsg("recovery connections cannot start because
>>> wal_mode was not set to 'hot_standby' on the WAL source server")));
>>>
>>> This seems to always prevent the server from doing an archive recovery
>>> since wal_mode is expected to be WAL_MODE_ARCHIVE in that case.
>> No, it doesn't prevent archive recovery. It only prevents hot standby if
>> wal_mode was not 'hot_standby' in the master. I think you missed the "&&
>> XLogRequestRecoveryConnections" condition above.
> 
> Even if we do only archive recovery, XLogRequestRecoveryConnections
> might be TRUE. Or we need to ensure that the recovery_connection is
> FALSE in the postgresql.conf before starting archive recovery?

Umm, yes, if you have recovery_connnections=on, it means you want hot
standby. And for that you need wal_mode='hot_standby'.

By "it doesn't prevent archive recovery" I meant "you can do traditional
archive recovery without hot standby". It doesn't matter how the WAL is
transported, via the archive or via streaming replication.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Tue, Apr 27, 2010 at 7:50 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> It's OK in pg_start_backup(), but seems NG in pg_stop_backup() since
>> it waits until some WAL files have been archived by the archiver. No?
>
> Good point, that logic would need to be changed too. Should it simply
> return immediately if archive_mode=off?

What if we wrongly set archive_mode to on and wal_mode to minimal?
I think that checking XLogArchivingActive() in pg_stop_backup() is
adequate.

>>>> + /*
>>>> +  * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
>>>> +  * and we must have at least as many backend slots as the primary.
>>>> +  */
>>>> + if (InArchiveRecovery && XLogRequestRecoveryConnections)
>>>> + {
>>>> +   if (ControlFile->wal_mode < WAL_MODE_HOT_STANDBY)
>>>> +       ereport(ERROR,
>>>> +              (errmsg("recovery connections cannot start because
>>>> wal_mode was not set to 'hot_standby' on the WAL source server")));
>>>>
>>>> This seems to always prevent the server from doing an archive recovery
>>>> since wal_mode is expected to be WAL_MODE_ARCHIVE in that case.
>>> No, it doesn't prevent archive recovery. It only prevents hot standby if
>>> wal_mode was not 'hot_standby' in the master. I think you missed the "&&
>>> XLogRequestRecoveryConnections" condition above.
>>
>> Even if we do only archive recovery, XLogRequestRecoveryConnections
>> might be TRUE. Or we need to ensure that the recovery_connection is
>> FALSE in the postgresql.conf before starting archive recovery?
>
> Umm, yes, if you have recovery_connnections=on, it means you want hot
> standby. And for that you need wal_mode='hot_standby'.

Since the default value of recovery_connections is TRUE, I think that
the trouble which I encountered would often happen. We should disable
recovery_connections by default? Furthermore should move it from
postgresql.conf to recovery.conf?

On the other hand, I feel that recovery_connections=on in an archive
recovery is valid configuration *until* any read only connections are
requested. How about moving the above check to postmaster or backend?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Tue, Apr 27, 2010 at 7:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Apr 27, 2010 at 7:50 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>>> It's OK in pg_start_backup(), but seems NG in pg_stop_backup() since
>>> it waits until some WAL files have been archived by the archiver. No?
>>
>> Good point, that logic would need to be changed too. Should it simply
>> return immediately if archive_mode=off?
>
> What if we wrongly set archive_mode to on and wal_mode to minimal?
> I think that checking XLogArchivingActive() in pg_stop_backup() is
> adequate.

That case should be rejected at primary startup.

>>>>> + /*
>>>>> +  * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
>>>>> +  * and we must have at least as many backend slots as the primary.
>>>>> +  */
>>>>> + if (InArchiveRecovery && XLogRequestRecoveryConnections)
>>>>> + {
>>>>> +   if (ControlFile->wal_mode < WAL_MODE_HOT_STANDBY)
>>>>> +       ereport(ERROR,
>>>>> +              (errmsg("recovery connections cannot start because
>>>>> wal_mode was not set to 'hot_standby' on the WAL source server")));
>>>>>
>>>>> This seems to always prevent the server from doing an archive recovery
>>>>> since wal_mode is expected to be WAL_MODE_ARCHIVE in that case.
>>>> No, it doesn't prevent archive recovery. It only prevents hot standby if
>>>> wal_mode was not 'hot_standby' in the master. I think you missed the "&&
>>>> XLogRequestRecoveryConnections" condition above.
>>>
>>> Even if we do only archive recovery, XLogRequestRecoveryConnections
>>> might be TRUE. Or we need to ensure that the recovery_connection is
>>> FALSE in the postgresql.conf before starting archive recovery?
>>
>> Umm, yes, if you have recovery_connnections=on, it means you want hot
>> standby. And for that you need wal_mode='hot_standby'.
>
> Since the default value of recovery_connections is TRUE, I think that
> the trouble which I encountered would often happen. We should disable
> recovery_connections by default? Furthermore should move it from
> postgresql.conf to recovery.conf?
>
> On the other hand, I feel that recovery_connections=on in an archive
> recovery is valid configuration *until* any read only connections are
> requested. How about moving the above check to postmaster or backend?

Or just not starting recovery connections, but still doing archive
recovery?  I think in this case a WARNING might be adequate.

...Robert


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Ok, here's a patch that includes the changes to add new wal_mode GUC
> (http://archives.postgresql.org/message-id/4BD581A6.60602@enterprisedb.com),

I haven't read this in any detail, but why does it add inclusion of
pg_control.h to xlog.h?  I don't see any reason for that in the actual
changes in xlog.h.
        regards, tom lane


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Ok, here's a patch that includes the changes to add new wal_mode GUC
>> (http://archives.postgresql.org/message-id/4BD581A6.60602@enterprisedb.com),
> 
> I haven't read this in any detail, but why does it add inclusion of
> pg_control.h to xlog.h?  I don't see any reason for that in the actual
> changes in xlog.h.

I put the enum for wal_mode to pg_control.h, so that it's available to
pg_controlinfo.c without #including xlog.h there. The
XLogArchivingActive() macro in xlog.h needs the enum values:

#define XLogArchivingActive()  (XLogArchiveMode && wal_mode >=
WAL_MODE_ARCHIVE

I'm all ears for better suggestions, I didn't like that much either.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> I haven't read this in any detail, but why does it add inclusion of
>> pg_control.h to xlog.h?  I don't see any reason for that in the actual
>> changes in xlog.h.

> I put the enum for wal_mode to pg_control.h, so that it's available to
> pg_controlinfo.c without #including xlog.h there. The
> XLogArchivingActive() macro in xlog.h needs the enum values:

Oh, I see.

> I'm all ears for better suggestions, I didn't like that much either.

How about putting the enum {} declaration in xlog.h, and making the
field in pg_control.h just be declared "int"?  I'm not sure declaring
it as enum is a great idea anyway, since that makes the on-disk
representation dependent on a compiler's whim as to how wide the
enum will be.
        regards, tom lane


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> How about putting the enum {} declaration in xlog.h, and making the
> field in pg_control.h just be declared "int"?

I tried that at first, but the problem was with pg_controldata.c. In
bin/. I wanted it to print wal_mode in human-readable format, so it
needed the values of the enum from somewhere. I tried to "#include
<access/xlog.h>" in pg_controlinfo.c, but got a bunch of errors.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas escribió:
> Tom Lane wrote:
> > How about putting the enum {} declaration in xlog.h, and making the
> > field in pg_control.h just be declared "int"?
> 
> I tried that at first, but the problem was with pg_controldata.c. In
> bin/. I wanted it to print wal_mode in human-readable format, so it
> needed the values of the enum from somewhere. I tried to "#include
> <access/xlog.h>" in pg_controlinfo.c, but got a bunch of errors.

Hmm, AFAICS the problem with controldata is that it uses postgres_fe.h
instead of postgres.h.  It's a bit of a stretch to use the latter, but
maybe that's a better solution?  After all, it *is* poking into the
backend internals.

I know I had to hack around pg_controldata some time ago (I don't recall
what for) and found that it could be cleaned up like this.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Alvaro Herrera <alvherre@commandprompt.com> writes:
> Hmm, AFAICS the problem with controldata is that it uses postgres_fe.h
> instead of postgres.h.  It's a bit of a stretch to use the latter, but
> maybe that's a better solution?  After all, it *is* poking into the
> backend internals.

I seem to recall that Solaris had problems with that due to dtrace
support or something?  However, we are doing it in pg_resetxlog,
so I suppose it's ok for pg_controldata as well.
        regards, tom lane


Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Hmm, AFAICS the problem with controldata is that it uses postgres_fe.h
> > instead of postgres.h.  It's a bit of a stretch to use the latter, but
> > maybe that's a better solution?  After all, it *is* poking into the
> > backend internals.
> 
> I seem to recall that Solaris had problems with that due to dtrace
> support or something?

Hmm, I wonder if you're referring to the fact that Zdenek wanted to
restructure the headers for something?  I don't know if this was because
of compiler issues or the binary migration tool he was working on.

> However, we are doing it in pg_resetxlog, so I suppose it's ok for
> pg_controldata as well.

I hadn't noticed that, but yes.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Hmm, AFAICS the problem with controldata is that it uses postgres_fe.h
>> instead of postgres.h.  It's a bit of a stretch to use the latter, but
>> maybe that's a better solution?  After all, it *is* poking into the
>> backend internals.
>
> I seem to recall that Solaris had problems with that due to dtrace
> support or something?  However, we are doing it in pg_resetxlog,
> so I suppose it's ok for pg_controldata as well.

Ok, did that. Here's an updated patch:

* made incorrect combinations of wal_mode and
archive_mode/max_wal_senders throw an ERROR instead of WARNING

* renamed wal_mode to wal_level

* reworded the documentation changes a bit

This doesn't contain any changes to pg_start_backup() yet, that's a
separate issue and still under discussion.

At commit, should I bump catversion, or PG_CONTROL_VERSION, or both? The
patch replaces the unlogged-operation WAL record with a record
containing current parameter values, and it changes pg_control. I'm
guessing both.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index eb5765a..27506d4 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -689,8 +689,7 @@ archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/ser
    </para>

    <para>
-    When <varname>archive_mode</> is <literal>off</> and <xref
-    linkend="guc-max-wal-senders"> is zero some SQL commands
+    When <varname>wal_level</> is <literal>minimal</> some SQL commands
     are optimized to avoid WAL logging, as described in <xref
     linkend="populate-pitr">.  If archiving or streaming replication were
     turned on during execution of one of these statements, WAL would not
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c5692ba..9f2a358 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1353,6 +1353,45 @@ SET ENABLE_SEQSCAN TO OFF;
      <title>Settings</title>
      <variablelist>

+     <varlistentry id="guc-wal-level" xreflabel="wal_level">
+      <term><varname>wal_level</varname> (<type>enum</type>)</term>
+      <indexterm>
+       <primary><varname>wal_level</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        <varname>wal_level</> determines how much information is written
+        to the WAL. The default value is <literal>minimal</>, which writes
+        only minimal information needed to recover from a crash or immediate
+        shutdown. <literal>archive</> adds logging required for WAL archiving,
+        and <literal>hot_standby</> further adds information required to run
+        read-only queries on a standby server.
+        This parameter can only be set at server start.
+       </para>
+       <para>
+        In <literal>minimal</> mode, WAL-logging of some bulk operations, like
+        <command>CREATE INDEX</>, <command>CLUSTER</> and <command>COPY</> on
+        a table that was created or truncated in the same transaction can be
+        safely skipped, which can make those operations much faster (see
+        <xref linkend="populate-pitr">). But minimal WAL does not contain
+        enough information to reconstruct the data from a base backup and the
+        WAL logs, so at least <literal>archive</> level must be used to enable
+        WAL archiving (<xref linkend="guc-archive-mode">) and streaming
+        replication.
+       </para>
+       <para>
+        In <literal>hot_standby</> mode, the same information is logged as
+        in <literal>archive</> mode, plus information needed to reconstruct
+        the status of running transactions from the WAL. To enable read-only
+        queries on a standby server, <varname>wal_level</> must be set to
+        <literal>hot_standby</> on the primary. It is thought that there is
+        little measurable difference in performance from using
+        <literal>hot_standby</> mode over <literal>archive</>, so feedback
+        is welcome if any production impacts are noticeable.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-fsync" xreflabel="fsync">
       <indexterm>
        <primary><varname>fsync</> configuration parameter</primary>
@@ -1726,7 +1765,9 @@ SET ENABLE_SEQSCAN TO OFF;
         <varname>archive_mode</> and <varname>archive_command</> are
         separate variables so that <varname>archive_command</> can be
         changed without leaving archiving mode.
-        This parameter can only be set at server start.
+        This parameter can only be set at server start. <varname>wal_level</>
+        must be set to <literal>archive</> or <literal>hot_standby</> to
+        enable <varname>archive_mode</>.
        </para>
       </listitem>
      </varlistentry>
@@ -1818,7 +1859,9 @@ SET ENABLE_SEQSCAN TO OFF;
         Specifies the maximum number of concurrent connections from standby
         servers (i.e., the maximum number of simultaneously running WAL sender
         processes). The default is zero. This parameter can only be set at
-        server start.
+        server start. <varname>wal_level</> must be set to <literal>archive</>
+        or <literal>hot_standby</> to allow connections from standby
+        connections.
        </para>
        </listitem>
       </varlistentry>
@@ -1884,16 +1927,11 @@ SET ENABLE_SEQSCAN TO OFF;
       </indexterm>
       <listitem>
        <para>
-        Parameter has two roles. During recovery, specifies whether or not
-        you can connect and run queries to enable <xref linkend="hot-standby">.
-        During normal running, specifies whether additional information is written
-        to WAL to allow recovery connections on a standby server that reads
-        WAL data generated by this server. The default value is
-        <literal>on</literal>.  It is thought that there is little
-        measurable difference in performance from using this feature, so
-        feedback is welcome if any production impacts are noticeable.
-        It is likely that this parameter will be removed in later releases.
-        This parameter can only be set at server start.
+        During recovery, specifies whether or not you can connect and run
+        queries to enable <xref linkend="hot-standby">. The default value is
+        <literal>on</literal>.
+        This parameter can only be set at server start. It is ignored when
+        not in standby mode.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index d69f2ea..992ccf7 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1589,9 +1589,9 @@ LOG:  database system is ready to accept read only connections
 </programlisting>

     Consistency information is recorded once per checkpoint on the primary, as long
-    as <varname>recovery_connections</> is enabled on the primary.  It is not possible
+    as <varname>wal_level</> is set to <literal>hot_standby</> on the primary.  It is not possible
     to enable recovery connections on the standby when reading WAL written during the
-    period that <varname>recovery_connections</> was disabled on the primary.
+    period that <varname>wal_level</> was not set to <literal>hot_standby</> on the primary.
     Reaching a consistent state can also be delayed in the presence
     of both of these conditions:

@@ -1838,7 +1838,7 @@ LOG:  database system is ready to accept read only connections
    </para>

    <para>
-    On the primary, parameters <varname>recovery_connections</> and
+    On the primary, parameters <varname>wal_level</> and
     <varname>vacuum_defer_cleanup_age</> can be used.
     <varname>max_standby_delay</> has no effect if set on the primary.
    </para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index b00e69f..62b583b 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -835,10 +835,9 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     <command>TRUNCATE</command> command. In such cases no WAL
     needs to be written, because in case of an error, the files
     containing the newly loaded data will be removed anyway.
-    However, this consideration does not apply when
-    <xref linkend="guc-archive-mode"> is on or streaming replication
-    is allowed (i.e., <xref linkend="guc-max-wal-senders"> is more
-    than or equal to one), as all commands must write WAL in that case.
+    However, this consideration only applies when
+    <xref linkend="guc-wal-level"> is <literal>minimal</> as all commands
+    must write WAL otherwise.
    </para>

   </sect2>
@@ -910,29 +909,27 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
   </sect2>

   <sect2 id="populate-pitr">
-   <title>Turn off <varname>archive_mode</varname> and streaming replication</title>
+   <title>Disable WAL archival and streaming replication</title>

    <para>
     When loading large amounts of data into an installation that uses
-    WAL archiving or streaming replication, you might want to disable
-    archiving (turn off the <xref linkend="guc-archive-mode">
-    configuration variable) and replication (zero the
-    <xref linkend="guc-max-wal-senders"> configuration variable)
-    while loading.  It might be
-    faster to take a new base backup after the load has completed
-    than to process a large amount of incremental WAL data.
-    But note that changing either of these variables requires
-    a server restart.
+    WAL archiving or streaming replication, it might be faster to take a
+    new base backup after the load has completed than to process a large
+    amount of incremental WAL data. You might want to disable archiving
+    and streaming replication while loading, by setting
+    <xref linkend="guc-wal-level"> to <literal>minimal</>,
+    <xref linkend="guc-archive-mode"> <literal>off</>, and
+    <xref linkend="guc-max-wal-senders"> to zero).
+    But note that changing these settings requires a server restart.
    </para>

    <para>
     Aside from avoiding the time for the archiver or WAL sender to
     process the WAL data,
     doing this will actually make certain commands faster, because they
-    are designed not to write WAL at all if <varname>archive_mode</varname>
-    is off and <varname>max_wal_senders</varname> is zero.  (They can
-    guarantee crash safety more cheaply by doing an
-    <function>fsync</> at the end than by writing WAL.)
+    are designed not to write WAL at all if <varname>wal_level</varname>
+    is <literal>minimal</>.  (They can guarantee crash safety more cheaply
+    by doing an <function>fsync</> at the end than by writing WAL.)
     This applies to the following commands:
     <itemizedlist>
      <listitem>
@@ -1014,10 +1011,12 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
      </listitem>
      <listitem>
       <para>
-       If using WAL archiving, consider disabling it during the restore.
-       To do that, turn off <varname>archive_mode</varname> before loading the
-       dump script, and afterwards turn it back on
-       and take a fresh base backup.
+       If using WAL archiving or streaming replication, consider disabling
+       them during the restore. To do that, set <varname>arcive_mode</> off,
+       <varname>wal_level</varname> to <literal>minimal</>, and
+       <varname>max_wal_senders</> zero before loading the dump script,
+       and afterwards set them back to the right values and take a fresh
+       base backup.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index ac075cb..99235da 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -278,16 +278,6 @@ end_heap_rewrite(RewriteState state)
                    (char *) state->rs_buffer, true);
     }

-    /* Write an XLOG UNLOGGED record if WAL-logging was skipped */
-    if (!state->rs_use_wal && !state->rs_new_rel->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 30];
-
-        snprintf(reason, sizeof(reason), "heap rewrite on \"%s\"",
-                 RelationGetRelationName(state->rs_new_rel));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /*
      * If the rel isn't temp, must fsync before commit.  We use heap_sync to
      * ensure that the toast table gets fsync'd too.
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 6cb3cac..89ed8a0 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -215,19 +215,6 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
      */
     wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!wstate.btws_use_wal && !wstate.index->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 20];
-
-        snprintf(reason, sizeof(reason), "b-tree build on \"%s\"",
-                 RelationGetRelationName(wstate.index));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /* reserve the metapage */
     wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
     wstate.btws_pages_written = 0;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7647f4e..bca7068 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -76,6 +76,7 @@ int            MaxStandbyDelay = 30;
 bool        fullPageWrites = true;
 bool        log_checkpoints = false;
 int            sync_method = DEFAULT_SYNC_METHOD;
+int            wal_level = WAL_LEVEL_MINIMAL;

 #ifdef WAL_DEBUG
 bool        XLOG_DEBUG = false;
@@ -97,6 +98,13 @@ bool        XLOG_DEBUG = false;
 /*
  * GUC support
  */
+const struct config_enum_entry wal_level_options[] = {
+    {"minimal", WAL_LEVEL_MINIMAL, false},
+    {"archive", WAL_LEVEL_ARCHIVE, false},
+    {"hot_standby", WAL_LEVEL_HOT_STANDBY, false},
+    {NULL, 0, false}
+};
+
 const struct config_enum_entry sync_method_options[] = {
     {"fsync", SYNC_METHOD_FSYNC, false},
 #ifdef HAVE_FSYNC_WRITETHROUGH
@@ -501,6 +509,18 @@ static bool reachedMinRecoveryPoint = false;
 static bool InRedo = false;

 /*
+ * Information logged when we detect a change in one of the parameters
+ * important for Hot Standby.
+ */
+typedef struct xl_parameter_change
+{
+    int            MaxConnections;
+    int            max_prepared_xacts;
+    int            max_locks_per_xact;
+    int            wal_level;
+} xl_parameter_change;
+
+/*
  * Flags set by interrupt handlers for later service in the redo loop.
  */
 static volatile sig_atomic_t got_SIGHUP = false;
@@ -522,7 +542,8 @@ static void readRecoveryCommandFile(void);
 static void exitArchiveRecovery(TimeLineID endTLI,
                     uint32 endLogId, uint32 endLogSeg);
 static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
-static void CheckRequiredParameterValues(CheckPoint checkPoint);
+static void CheckRequiredParameterValues(void);
+static void XLogReportParameters(void);
 static void LocalSetXLogInsertAllowed(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);

@@ -4922,6 +4943,13 @@ BootStrapXLOG(void)
     ControlFile->time = checkPoint.time;
     ControlFile->checkPoint = checkPoint.redo;
     ControlFile->checkPointCopy = checkPoint;
+
+    /* Set important parameter values for use when replaying WAL */
+    ControlFile->MaxConnections = MaxConnections;
+    ControlFile->max_prepared_xacts = max_prepared_xacts;
+    ControlFile->max_locks_per_xact = max_locks_per_xact;
+    ControlFile->wal_level = wal_level;
+
     /* some additional ControlFile fields are set in WriteControlFile() */

     WriteControlFile();
@@ -5539,17 +5567,18 @@ GetLatestXLogTime(void)
 }

 /*
- * Note that text field supplied is a parameter name and does not require translation
+ * Note that text field supplied is a parameter name and does not require
+ * translation
  */
-#define RecoveryRequiresIntParameter(param_name, currValue, checkpointValue) \
+#define RecoveryRequiresIntParameter(param_name, currValue, minValue) \
 { \
-    if (currValue < checkpointValue) \
+    if (currValue < minValue) \
         ereport(ERROR, \
             (errmsg("recovery connections cannot continue because " \
                     "%s = %u is a lower setting than on WAL source server (value was %u)", \
                     param_name, \
                     currValue, \
-                    checkpointValue))); \
+                    minValue))); \
 }

 /*
@@ -5557,21 +5586,37 @@ GetLatestXLogTime(void)
  * for various aspects of recovery operation.
  */
 static void
-CheckRequiredParameterValues(CheckPoint checkPoint)
+CheckRequiredParameterValues(void)
 {
-    /* We ignore autovacuum_max_workers when we make this test. */
-    RecoveryRequiresIntParameter("max_connections",
-                                 MaxConnections, checkPoint.MaxConnections);
+    /*
+     * For archive recovery, the WAL must be generated with at least
+     * 'archive' wal_level.
+     */
+    if (InArchiveRecovery && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
+    {
+        ereport(WARNING,
+                (errmsg("WAL was generated with wal_level='minimal', data may be missing"),
+                 errhint("This happens if you temporarily set wal_level='minimal' without taking a new base
backup.")));
+    }

-    RecoveryRequiresIntParameter("max_prepared_xacts",
-                          max_prepared_xacts, checkPoint.max_prepared_xacts);
-    RecoveryRequiresIntParameter("max_locks_per_xact",
-                          max_locks_per_xact, checkPoint.max_locks_per_xact);
+    /*
+     * For Hot Standby, the WAL must be generated with 'hot_standby' mode,
+     * and we must have at least as many backend slots as the primary.
+     */
+    if (InArchiveRecovery && XLogRequestRecoveryConnections)
+    {
+        if (ControlFile->wal_level < WAL_LEVEL_HOT_STANDBY)
+            ereport(ERROR,
+                    (errmsg("recovery connections cannot start because wal_level was not set to 'hot_standby' on the
WALsource server"))); 

-    if (!checkPoint.XLogStandbyInfoMode)
-        ereport(ERROR,
-                (errmsg("recovery connections cannot start because the recovery_connections "
-                        "parameter is disabled on the WAL source server")));
+        /* We ignore autovacuum_max_workers when we make this test. */
+        RecoveryRequiresIntParameter("max_connections",
+                                     MaxConnections, ControlFile->MaxConnections);
+        RecoveryRequiresIntParameter("max_prepared_xacts",
+                                     max_prepared_xacts, ControlFile->max_prepared_xacts);
+        RecoveryRequiresIntParameter("max_locks_per_xact",
+                                     max_locks_per_xact, ControlFile->max_locks_per_xact);
+    }
 }

 /*
@@ -5904,6 +5949,9 @@ StartupXLOG(void)
                                 BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
         }

+        /* Check that the GUCs used to generate the WAL allow recovery */
+        CheckRequiredParameterValues();
+
         /*
          * Initialize recovery connections, if enabled. We won't let backends
          * in yet, not until we've reached the min recovery point specified in
@@ -5915,8 +5963,6 @@ StartupXLOG(void)
             TransactionId *xids;
             int            nxids;

-            CheckRequiredParameterValues(checkPoint);
-
             ereport(DEBUG1,
                     (errmsg("initializing recovery connections")));

@@ -6401,6 +6447,13 @@ StartupXLOG(void)
     }

     /*
+     * If any of the critical GUCs have changed, log them before we allow
+     * any backends to write WAL
+     */
+    LocalSetXLogInsertAllowed();
+    XLogReportParameters();
+
+    /*
      * All done.  Allow backends to write WAL.    (Although the bool flag is
      * probably atomic in itself, we use the info_lck here to ensure that
      * there are no race conditions concerning visibility of other recent
@@ -6998,12 +7051,6 @@ CreateCheckPoint(int flags)
     MemSet(&checkPoint, 0, sizeof(checkPoint));
     checkPoint.time = (pg_time_t) time(NULL);

-    /* Set important parameter values for use when replaying WAL */
-    checkPoint.MaxConnections = MaxConnections;
-    checkPoint.max_prepared_xacts = max_prepared_xacts;
-    checkPoint.max_locks_per_xact = max_locks_per_xact;
-    checkPoint.XLogStandbyInfoMode = XLogStandbyInfoActive();
-
     /*
      * We must hold WALInsertLock while examining insert state to determine
      * the checkpoint REDO pointer.
@@ -7647,28 +7694,49 @@ RequestXLogSwitch(void)
 }

 /*
- * Write an XLOG UNLOGGED record, indicating that some operation was
- * performed on data that we fsync()'d directly to disk, skipping
- * WAL-logging.
- *
- * Such operations screw up archive recovery, so we complain if we see
- * these records during archive recovery. That shouldn't happen in a
- * correctly configured server, but you can induce it by temporarily
- * disabling archiving and restarting, so it's good to at least get a
- * warning of silent data loss in such cases. These records serve no
- * other purpose and are simply ignored during crash recovery.
+ * Check if any of the GUC parameters that are critical for hot standby
+ * have changed, and update the value in pg_control file if necessary.
  */
-void
-XLogReportUnloggedStatement(char *reason)
+static void
+XLogReportParameters(void)
 {
-    XLogRecData rdata;
+    if (wal_level != ControlFile->wal_level ||
+        MaxConnections != ControlFile->MaxConnections ||
+        max_prepared_xacts != ControlFile->max_prepared_xacts ||
+        max_locks_per_xact != max_locks_per_xact)
+    {
+        /*
+         * The change in number of backend slots doesn't need to be
+         * WAL-logged if archiving is not enabled, as you can't start
+         * archive recovery with wal_level='minimal' anyway. We don't
+         * really care about the values in pg_control either if
+         * wal_level='minimal', but seems better to keep them up-to-date
+         * to avoid confusion.
+         */
+        if (wal_level != ControlFile->wal_level || XLogIsNeeded())
+        {
+            XLogRecData rdata;
+            xl_parameter_change xlrec;

-    rdata.buffer = InvalidBuffer;
-    rdata.data = reason;
-    rdata.len = strlen(reason) + 1;
-    rdata.next = NULL;
+            xlrec.MaxConnections = MaxConnections;
+            xlrec.max_prepared_xacts = max_prepared_xacts;
+            xlrec.max_locks_per_xact = max_locks_per_xact;
+            xlrec.wal_level = wal_level;
+
+            rdata.buffer = InvalidBuffer;
+            rdata.data = (char *) &xlrec;
+            rdata.len = sizeof(xlrec);
+            rdata.next = NULL;
+
+            XLogInsert(RM_XLOG_ID, XLOG_PARAMETER_CHANGE, &rdata);
+        }

-    XLogInsert(RM_XLOG_ID, XLOG_UNLOGGED, &rdata);
+        ControlFile->MaxConnections = MaxConnections;
+        ControlFile->max_prepared_xacts = max_prepared_xacts;
+        ControlFile->max_locks_per_xact = max_locks_per_xact;
+        ControlFile->wal_level = wal_level;
+        UpdateControlFile();
+    }
 }

 /*
@@ -7709,10 +7777,6 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
                               checkPoint.nextMultiOffset);
         SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);

-        /* Check to see if any changes to max_connections give problems */
-        if (standbyState != STANDBY_DISABLED)
-            CheckRequiredParameterValues(checkPoint);
-
         /*
          * If we see a shutdown checkpoint, we know that nothing was
          * running on the master at this point. So fake-up an empty
@@ -7834,18 +7898,21 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
             LWLockRelease(ControlFileLock);
         }
     }
-    else if (info == XLOG_UNLOGGED)
+    else if (info == XLOG_PARAMETER_CHANGE)
     {
-        if (InArchiveRecovery)
-        {
-            /*
-             * Note: We don't print the reason string from the record, because
-             * that gets added as a line using xlog_desc()
-             */
-            ereport(WARNING,
-                (errmsg("unlogged operation performed, data may be missing"),
-                 errhint("This can happen if you temporarily disable archive_mode without taking a new base
backup.")));
-        }
+        xl_parameter_change xlrec;
+
+        /* Update our copy of the parameters in pg_control */
+        memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+
+        ControlFile->MaxConnections = xlrec.MaxConnections;
+        ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+        ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
+        ControlFile->wal_level = xlrec.wal_level;
+        UpdateControlFile();
+
+        /* Check to see if any changes to max_connections give problems */
+        CheckRequiredParameterValues();
     }
 }

@@ -7896,11 +7963,30 @@ xlog_desc(StringInfo buf, uint8 xl_info, char *rec)
         appendStringInfo(buf, "backup end: %X/%X",
                          startpoint.xlogid, startpoint.xrecoff);
     }
-    else if (info == XLOG_UNLOGGED)
+    else if (info == XLOG_PARAMETER_CHANGE)
     {
-        char       *reason = rec;
+        xl_parameter_change xlrec;
+        const char *wal_level_str;
+        const struct config_enum_entry *entry;
+
+        memcpy(&xlrec, rec, sizeof(xl_parameter_change));
+
+        /* Find a string representation for wal_level */
+        wal_level_str = "?";
+        for (entry = wal_level_options; entry->name; entry++)
+        {
+            if (entry->val == xlrec.wal_level)
+            {
+                wal_level_str = entry->name;
+                break;
+            }
+        }

-        appendStringInfo(buf, "unlogged operation: %s", reason);
+        appendStringInfo(buf, "parameter change: max_connections=%d max_prepared_xacts=%d max_locks_per_xact=%d
wal_level=%s",
+                         xlrec.MaxConnections,
+                         xlrec.max_prepared_xacts,
+                         xlrec.max_locks_per_xact,
+                         wal_level_str);
     }
     else
         appendStringInfo(buf, "UNKNOWN");
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 30a00ab..ccb4599 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -787,23 +787,6 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
      */
     use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!use_wal && !NewHeap->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 32];
-
-        if (OldIndex != NULL)
-            snprintf(reason, sizeof(reason), "CLUSTER on \"%s\"",
-                     RelationGetRelationName(NewHeap));
-        else
-            snprintf(reason, sizeof(reason), "VACUUM FULL on \"%s\"",
-                     RelationGetRelationName(NewHeap));
-        XLogReportUnloggedStatement(reason);
-    }
-
     /* use_wal off requires smgr_targblock be initially invalid */
     Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 84a83f1..9d46e47 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2223,14 +2223,7 @@ CopyFrom(CopyState cstate)
      * indexes since those use WAL anyway)
      */
     if (hi_options & HEAP_INSERT_SKIP_WAL)
-    {
-        char        reason[NAMEDATALEN + 30];
-
-        snprintf(reason, sizeof(reason), "COPY FROM on \"%s\"",
-                 RelationGetRelationName(cstate->rel));
-        XLogReportUnloggedStatement(reason);
         heap_sync(cstate->rel);
-    }
 }


diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 25b2807..9b5ce65 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3272,14 +3272,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)

         /* If we skipped writing WAL, then we need to sync the heap. */
         if (hi_options & HEAP_INSERT_SKIP_WAL)
-        {
-            char        reason[NAMEDATALEN + 30];
-
-            snprintf(reason, sizeof(reason), "table rewrite on \"%s\"",
-                     RelationGetRelationName(newrel));
-            XLogReportUnloggedStatement(reason);
             heap_sync(newrel);
-        }

         heap_close(newrel, NoLock);
     }
@@ -7021,20 +7014,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace)

     heap_close(pg_class, RowExclusiveLock);

-    /*
-     * Write an XLOG UNLOGGED record if WAL-logging was skipped because WAL
-     * archiving is not enabled.
-     */
-    if (!XLogIsNeeded() && !rel->rd_istemp)
-    {
-        char        reason[NAMEDATALEN + 40];
-
-        snprintf(reason, sizeof(reason), "ALTER TABLE SET TABLESPACE on \"%s\"",
-                 RelationGetRelationName(rel));
-
-        XLogReportUnloggedStatement(reason);
-    }
-
     relation_close(rel, NoLock);

     /* Make sure the reltablespace change is visible */
@@ -7063,10 +7042,6 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
     /*
      * We need to log the copied data in WAL iff WAL archiving/streaming is
      * enabled AND it's not a temp rel.
-     *
-     * Note: If you change the conditions here, update the conditions in
-     * ATExecSetTableSpace() for when an XLOG UNLOGGED record is written to
-     * match.
      */
     use_wal = XLogIsNeeded() && !istemp;

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d5e7e3a..d299310 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2241,14 +2241,7 @@ CloseIntoRel(QueryDesc *queryDesc)

         /* If we skipped using WAL, must heap_sync before commit */
         if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
-        {
-            char        reason[NAMEDATALEN + 30];
-
-            snprintf(reason, sizeof(reason), "SELECT INTO on \"%s\"",
-                     RelationGetRelationName(myState->rel));
-            XLogReportUnloggedStatement(reason);
             heap_sync(myState->rel);
-        }

         /* close rel, but keep lock until commit */
         heap_close(myState->rel, NoLock);
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 47f71bd..aa8f2da 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -728,6 +728,12 @@ PostmasterMain(int argc, char *argv[])
         write_stderr("%s: superuser_reserved_connections must be less than max_connections\n", progname);
         ExitPostmaster(1);
     }
+    if (XLogArchiveMode && wal_level == WAL_LEVEL_MINIMAL)
+        ereport(ERROR,
+                (errmsg("WAL archival (archive_mode='on') requires wal_level 'archive' or 'hot_standby'")));
+    if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
+        ereport(ERROR,
+                (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level 'archive' or 'hot_standby")));

     /*
      * Other one-time internal sanity checks can go here, if they are fast.
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3838665..e960175 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -253,6 +253,24 @@ WalSndHandshake(void)
                     {
                         StringInfoData buf;

+                        /*
+                         * Check that we're logging enough information in the
+                         * WAL for log-shipping.
+                         *
+                         * NOTE: This only checks the current value of
+                         * wal_level. Even if the current setting is not
+                         * 'minimal', there can be old WAL in the pg_xlog
+                         * directory that was created with 'minimal'.
+                         * So this is not bulletproof, the purpose is
+                         * just to give a user-friendly error message that
+                         * hints how to configure the system correctly.
+                         */
+                        if (wal_level == WAL_LEVEL_MINIMAL)
+                            ereport(FATAL,
+                                    (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+                                     errmsg("standby connections not allowed because wal_level='minimal'")));
+
+
                         /* Send a CopyOutResponse message, and start streaming */
                         pq_beginmessage(&buf, 'H');
                         pq_sendbyte(&buf, 0);
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 51753d6..a92a874 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -256,7 +256,7 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
      */
     if (!TransactionIdIsValid(latestRemovedXid))
     {
-        elog(DEBUG1, "Invalid latestremovexXid reported, using latestcompletedxid instead");
+        elog(DEBUG1, "invalid latestremovexXid reported, using latestcompletedxid instead");

         LWLockAcquire(ProcArrayLock, LW_SHARED);
         latestRemovedXid = ShmemVariableCache->latestCompletedXid;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2434cc0..809125e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -340,6 +340,7 @@ static const struct config_enum_entry constraint_exclusion_options[] = {
 /*
  * Options for enum values stored in other modules
  */
+extern const struct config_enum_entry wal_level_options[];
 extern const struct config_enum_entry sync_method_options[];

 /*
@@ -2785,6 +2786,15 @@ static struct config_enum ConfigureNamesEnum[] =
     },

     {
+        {"wal_level", PGC_POSTMASTER, WAL_SETTINGS,
+            gettext_noop("Set the level of information written to the WAL."),
+            NULL
+        },
+        &wal_level,
+        WAL_LEVEL_MINIMAL, wal_level_options, NULL
+    },
+
+    {
         {"wal_sync_method", PGC_SIGHUP, WAL_SETTINGS,
             gettext_noop("Selects the method used for forcing WAL updates to disk."),
             NULL
@@ -7862,7 +7872,7 @@ pg_timezone_abbrev_initialize(void)
 static const char *
 show_archive_command(void)
 {
-    if (XLogArchiveMode)
+    if (XLogArchivingActive())
         return XLogArchiveCommand;
     else
         return "(disabled)";
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 92763eb..5749711 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -150,6 +150,7 @@

 # - Settings -

+#wal_level = minimal            # minimal, archive, or hot_standby
 #fsync = on                # turns forced synchronization on or off
 #synchronous_commit = on        # immediate fsync at commit
 #wal_sync_method = fsync        # the default is the first option
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index cb16cdf..07ac955 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -8,13 +8,14 @@
  *
  * $PostgreSQL$
  */
-#include "postgres_fe.h"
+#include "postgres.h"

 #include <unistd.h>
 #include <time.h>
 #include <sys/stat.h>
 #include <fcntl.h>

+#include "access/xlog.h"
 #include "catalog/pg_control.h"


@@ -60,6 +61,21 @@ dbState(DBState state)
     return _("unrecognized status code");
 }

+static const char *
+wal_level_str(WalLevel wal_level)
+{
+    switch (wal_level)
+    {
+        case WAL_LEVEL_MINIMAL:
+            return "minimal";
+        case WAL_LEVEL_ARCHIVE:
+            return "archive";
+        case WAL_LEVEL_HOT_STANDBY:
+            return "hot_standby";
+    }
+    return _("unrecognized wal_level");
+}
+

 int
 main(int argc, char *argv[])
@@ -206,6 +222,14 @@ main(int argc, char *argv[])
     printf(_("Backup start location:                %X/%X\n"),
            ControlFile.backupStartPoint.xlogid,
            ControlFile.backupStartPoint.xrecoff);
+    printf(_("Last wal_level setting:               %s\n"),
+           wal_level_str(ControlFile.wal_level));
+    printf(_("Last max_connections setting:         %d\n"),
+           ControlFile.MaxConnections);
+    printf(_("Last max_prepared_xacts setting:      %d\n"),
+           ControlFile.max_prepared_xacts);
+    printf(_("Last max_locks_per_xact setting:      %d\n"),
+           ControlFile.max_locks_per_xact);
     printf(_("Maximum data alignment:               %u\n"),
            ControlFile.maxAlign);
     /* we don't print floatFormat since can't say much useful about it */
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 3d2a7bc..02faa8e 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -628,6 +628,15 @@ RewriteControlFile(void)
     ControlFile.backupStartPoint.xlogid = 0;
     ControlFile.backupStartPoint.xrecoff = 0;

+    /*
+     * Use the defaults for max_* settings. The values don't matter
+     * as long as wal_level='minimal'.
+     */
+    ControlFile.MaxConnections = 100;
+    ControlFile.max_prepared_xacts = 0;
+    ControlFile.max_locks_per_xact = 64;
+    ControlFile.wal_level = WAL_LEVEL_MINIMAL;
+
     /* Now we can force the recorded xlog seg size to the right thing. */
     ControlFile.xlog_seg_size = XLogSegSize;

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 6bfc7d5..432cd16 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -195,24 +195,26 @@ extern int    XLogArchiveTimeout;
 extern bool log_checkpoints;
 extern bool XLogRequestRecoveryConnections;
 extern int    MaxStandbyDelay;
+/* WAL levels */
+typedef enum WalLevel
+{
+    WAL_LEVEL_MINIMAL = 0,
+    WAL_LEVEL_ARCHIVE,
+    WAL_LEVEL_HOT_STANDBY
+} WalLevel;
+extern int    wal_level;

-#define XLogArchivingActive()    (XLogArchiveMode)
+#define XLogArchivingActive()    (XLogArchiveMode && wal_level >= WAL_LEVEL_ARCHIVE)
 #define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')

 /*
- * This is in walsender.c, but declared here so that we don't need to include
- * walsender.h in all files that check XLogIsNeeded()
- */
-extern int    max_wal_senders;
-
-/*
- * Is WAL-logging necessary? We need to log an XLOG record iff either
- * WAL archiving is enabled or XLOG streaming is allowed.
+ * Is WAL-logging necessary for archival or log-shipping, or can we skip
+ * WAL-logging if we fsync() the data before committing instead?
  */
-#define XLogIsNeeded() (XLogArchivingActive() || (max_wal_senders > 0))
+#define XLogIsNeeded() (wal_level >= WAL_LEVEL_ARCHIVE)

 /* Do we need to WAL-log information required only for Hot Standby? */
-#define XLogStandbyInfoActive() (XLogRequestRecoveryConnections && XLogIsNeeded())
+#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY)

 #ifdef WAL_DEBUG
 extern bool XLOG_DEBUG;
@@ -293,7 +295,6 @@ extern void InitXLOGAccess(void);
 extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
 extern void XLogPutNextOid(Oid nextOid);
-extern void XLogReportUnloggedStatement(char *reason);
 extern XLogRecPtr GetRedoRecPtr(void);
 extern XLogRecPtr GetInsertRecPtr(void);
 extern XLogRecPtr GetWriteRecPtr(void);
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 9d65546..87787c9 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -21,7 +21,7 @@


 /* Version identifier for this pg_control format */
-#define PG_CONTROL_VERSION    901
+#define PG_CONTROL_VERSION    901 /* XXX bump before commit */

 /*
  * Body of CheckPoint XLOG records.  This is declared here because we keep
@@ -41,12 +41,6 @@ typedef struct CheckPoint
     Oid            oldestXidDB;    /* database with minimum datfrozenxid */
     pg_time_t    time;            /* time stamp of checkpoint */

-    /* Important parameter settings at time of shutdown checkpoints */
-    int            MaxConnections;
-    int            max_prepared_xacts;
-    int            max_locks_per_xact;
-    bool        XLogStandbyInfoMode;
-
     /*
      * Oldest XID still running. This is only needed to initialize hot standby
      * mode from an online checkpoint, so we only bother calculating this for
@@ -63,7 +57,7 @@ typedef struct CheckPoint
 #define XLOG_NEXTOID                    0x30
 #define XLOG_SWITCH                        0x40
 #define XLOG_BACKUP_END                    0x50
-#define XLOG_UNLOGGED                    0x60
+#define XLOG_PARAMETER_CHANGE            0x60


 /* System status indicator */
@@ -142,6 +136,15 @@ typedef struct ControlFileData
     XLogRecPtr    backupStartPoint;

     /*
+     * Parameter settings that determine if the WAL can be used for archival
+     * or hot standby.
+     */
+    int            wal_level;
+    int            MaxConnections;
+    int            max_prepared_xacts;
+    int            max_locks_per_xact;
+
+    /*
      * This data is used to check for hardware-architecture compatibility of
      * the database and the backend executable.  We need not check endianness
      * explicitly, since the pg_control version will surely look wrong to a
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 6ad40a9..db64c88 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -39,6 +39,7 @@ extern bool am_walsender;

 /* user-settable parameters */
 extern int    WalSndDelay;
+extern int    max_wal_senders;

 extern int    WalSenderMain(void);
 extern void WalSndSignals(void);

On Wed, 2010-04-28 at 10:43 +0300, Heikki Linnakangas wrote:

> * renamed wal_mode to wal_level

I'm wondering whether this should be a list rather than an enum? If we
add something in the future that adds more info to WAL but doesn't fit
the one-dimensional model this implements then we could be in trouble.
Should this be

e.g. wal_xxxx = feature2, feature3
e.g. wal_xxxx = feature3
e.g. wal_xxxx = feature1

recognising that some features require other features, so as an example
feature2 requires and implies feature1.

The word "level" implies a one-dimensionality that "mode" did not and I
feel a little uncertain about that term.

Other words: attributes, features, contents, info, options.
Hmm, wal_options sounds OK.

Anyway, just throwing out some ideas to make sure we're doing the right
thing with this.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Wed, 2010-04-28 at 10:43 +0300, Heikki Linnakangas wrote:
> 
>> * renamed wal_mode to wal_level
> 
> I'm wondering whether this should be a list rather than an enum? If we
> add something in the future that adds more info to WAL but doesn't fit
> the one-dimensional model this implements then we could be in trouble.
> Should this be
> 
> e.g. wal_xxxx = feature2, feature3
> e.g. wal_xxxx = feature3
> e.g. wal_xxxx = feature1
> 
> recognising that some features require other features, so as an example
> feature2 requires and implies feature1.
> 
> The word "level" implies a one-dimensionality that "mode" did not and I
> feel a little uncertain about that term.

That's the same concern Tom had at
http://archives.postgresql.org/message-id/17411.1272291825@sss.pgh.pa.us.
If we want to make it a list, I think it would be something like:

wal_extra_info = ''        # "archive", or "archive, hot_standby"

That seems OK to me. Still, I'd like to go with "level", it's short and
descriptive of the current options, and it's not that big a deal to
rename it if we add more options in the future.

It's hard to picture what the future options might be like. Someone had
an idea years ago (you even?) to add more information like table names
or primary keys to the WAL records, to make it easier to scrape
information from the WAL for 3rd party replication solutions. Like
feeding a slony replica from the WAL. Or maybe we'd want to fold
full_page_writes to the new GUC.

If it's just one or two extra feature orthogonal to the
minimal-archive-hot_standby axis, it might still be best to keep the
wal_level name, and make it possible to tack extra options to it like a
list. Like "wal_level='archive, featureX'". "Level" would still be a
good name, that's the main dimension.

If the dependencies between the features get complicated, though, I
doubt we can correctly guess the best representation for them now anyway.

That's just my 2 cents, if others feel like "level" is painting us to
the corner, "wal_mode" with the same values is the second best option IMO.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Wed, 2010-04-28 at 11:27 +0300, Heikki Linnakangas wrote:
> It's hard to picture what the future options might be like. Someone
> had
> an idea years ago (you even?) to add more information like table names
> or primary keys to the WAL records, to make it easier to scrape
> information from the WAL for 3rd party replication solutions. Like
> feeding a slony replica from the WAL. Or maybe we'd want to fold
> full_page_writes to the new GUC.

Yeh, lots of ideas for adding value to WAL. WAL can be considered a
transport solution with many potential uses and properties.

I haven't ever suggested adding something that would only be available
to 3rd party solutions, perhaps that information was garbled. Yes, I did
suggest that Slony might use a WAL transport in future, as an option.

-- Simon Riggs           www.2ndQuadrant.com



On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Ok, did that. Here's an updated patch:

> +        During recovery, specifies whether or not you can connect and run
> +        queries to enable <xref linkend="hot-standby">. The default value is
> +        <literal>on</literal>.
> +        This parameter can only be set at server start. It is ignored when
> +        not in standby mode.

In the patch, I was not able to find any change which ignores
recovery_connections when standby_mode is OFF.

And that change spoils the combination of pg_standby and HS?
We drop the support for it?


> * renamed wal_mode to wal_level

We should replace "mode" with "level" in the document of wal_level
parameter as follows?

@@ -1368,9 +1368,9 @@ SET ENABLE_SEQSCAN TO OFF;        read-only queries on a standby server.        This parameter
canonly be set at server start.       </para>       <para>
 
-        In <literal>minimal</> mode, WAL-logging of some bulk operations, like
+        In <literal>minimal</> level, WAL-logging of some bulk operations, like        <command>CREATE INDEX</>,
<command>CLUSTER</>and <command>COPY</> on        a table that was created or truncated in the same transaction can be
     safely skipped, which can make those operations much faster (see        <xref linkend="populate-pitr">). But
minimalWAL does not contain
 
@@ -1379,15 +1379,15 @@ SET ENABLE_SEQSCAN TO OFF;        WAL archiving (<xref linkend="guc-archive-mode">) and
streaming       replication.       </para>       <para>
 
-        In <literal>hot_standby</> mode, the same information is logged as
-        in <literal>archive</> mode, plus information needed to reconstruct
+        In <literal>hot_standby</> level, the same information is logged as
+        in <literal>archive</> level, plus information needed to reconstruct        the status of running transactions
fromthe WAL. To enable read-only        queries on a standby server, <varname>wal_level</> must be set to
<literal>hot_standby</>on the primary. It is thought that there is        little measurable difference in performance
fromusing
 
-        <literal>hot_standby</> mode over <literal>archive</>, so feedback
+        <literal>hot_standby</> level over <literal>archive</>, so feedback        is welcome if any production
impactsare noticeable.       </para>      </listitem>     </varlistentry>
 

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, 2010-04-28 at 18:54 +0900, Fujii Masao wrote:
> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > Ok, did that. Here's an updated patch:
> 
> > +        During recovery, specifies whether or not you can connect and run
> > +        queries to enable <xref linkend="hot-standby">. The default value is
> > +        <literal>on</literal>.
> > +        This parameter can only be set at server start. It is ignored when
> > +        not in standby mode.
> 
> In the patch, I was not able to find any change which ignores
> recovery_connections when standby_mode is OFF.
> 
> And that change spoils the combination of pg_standby and HS?
> We drop the support for it?

Yeh, I hope nothing is being disabled in that area. Many people hope and
expect HS to work with 9.0 without any changes to their setup.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Ok, did that. Here's an updated patch:
> 
>> +        During recovery, specifies whether or not you can connect and run
>> +        queries to enable <xref linkend="hot-standby">. The default value is
>> +        <literal>on</literal>.
>> +        This parameter can only be set at server start. It is ignored when
>> +        not in standby mode.
> 
> In the patch, I was not able to find any change which ignores
> recovery_connections when standby_mode is OFF.
> 
> And that change spoils the combination of pg_standby and HS?
> We drop the support for it?

That sentence is poorly worded, recovery_connections works in archive
recovery too (and with pg_standby). I'll rephrase that to "It only has
effect during archive recovery or standby mode."

>> * renamed wal_mode to wal_level
> 
> We should replace "mode" with "level" in the document of wal_level
> parameter as follows?
> 
> ...

Yep.

Thanks!

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> At commit, should I bump catversion, or PG_CONTROL_VERSION, or both? The
> patch replaces the unlogged-operation WAL record with a record
> containing current parameter values, and it changes pg_control. I'm
> guessing both.

You *must* bump PG_CONTROL_VERSION because the content of pg_control
changed.  The correct way to reflect a change in WAL contents is to
bump the WAL page magic number.  I don't see any change here in either
system catalog contents or user table contents, so I see no need to
touch catversion.
        regards, tom lane


Simon Riggs <simon@2ndQuadrant.com> writes:
> On Wed, 2010-04-28 at 10:43 +0300, Heikki Linnakangas wrote:
>> * renamed wal_mode to wal_level

> I'm wondering whether this should be a list rather than an enum? If we
> add something in the future that adds more info to WAL but doesn't fit
> the one-dimensional model this implements then we could be in trouble.
> Should this be
> e.g. wal_xxxx = feature2, feature3
> e.g. wal_xxxx = feature3
> e.g. wal_xxxx = feature1

I'm a bit suspicious of going in this direction, mainly because
DateStyle has been such a PITA over the years.  It's not always obvious
to users whether adding or removing an item in a list causes something
to turn on or off.

In any case, the project's expectations for forward compatibility of
postgresql.conf settings have always been very low.  I don't think we
should try to design wal_mode to solve future problems, just the ones
we are faced with right now.  If it gets changed to look completely
different in some future version, that's not a problem.
        regards, tom lane


On Wed, 2010-04-28 at 10:19 -0400, Tom Lane wrote:

> I'm a bit suspicious of going in this direction, mainly because
> DateStyle has been such a PITA over the years.

LOL. Man that's a pain. That's an unfair brush though!

-- Simon Riggs           www.2ndQuadrant.com



Tom Lane wrote:
> The correct way to reflect a change in WAL contents is to
> bump the WAL page magic number.

Browsing the history of XLOG_PAGE_MAGIC, it used to be incremented by
one whenever the format changes, at least as long as it has been been in
xlog_internal.h. It started at 0xD05B in 2005, and was incremented by
one by each commit till 0xD062. But then the hot standby patch bumped it
to 0xD166, and on March 19th Simon changed it to 0x9002, and on March
28th to 0x9003.

Is there a plan somewhere on all that, or was it just random whacking?
It's a bit weird though that the commit on March 19th moved it
backwards. We don't guarantee backwards-compatibility, so any program
looking at the WAL needs to know not only minimum version it supports
but a list of all supported versions, so it's not catastrophic. Still
weird, though. It hasn't been like that for long yet, and not in any
released version, so we could still change it back to the original
scheme (0xD063).

0x9002 sounds like it means the 9.0 release, which might be a good
versioning scheme as well. If want to continue with that scheme, it
should be documented in the comments.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> The correct way to reflect a change in WAL contents is to
>> bump the WAL page magic number.

> Browsing the history of XLOG_PAGE_MAGIC, it used to be incremented by
> one whenever the format changes, at least as long as it has been been in
> xlog_internal.h. It started at 0xD05B in 2005, and was incremented by
> one by each commit till 0xD062. But then the hot standby patch bumped it
> to 0xD166, and on March 19th Simon changed it to 0x9002, and on March
> 28th to 0x9003.

> Is there a plan somewhere on all that, or was it just random whacking?

Random whacking.  Simon seems to have decided to try to make the number
mean something, but it never meant anything before.  It's not a version
number, it's a magic identifier.  We might need to avoid certain values
to prevent collisions with other file formats, so I'm suspicious of
trying to make it match up with some expectation.
        regards, tom lane


Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Browsing the history of XLOG_PAGE_MAGIC, it used to be incremented by
>> one whenever the format changes, at least as long as it has been been in
>> xlog_internal.h. It started at 0xD05B in 2005, and was incremented by
>> one by each commit till 0xD062. But then the hot standby patch bumped it
>> to 0xD166, and on March 19th Simon changed it to 0x9002, and on March
>> 28th to 0x9003.
> 
>> Is there a plan somewhere on all that, or was it just random whacking?
> 
> Random whacking.  Simon seems to have decided to try to make the number
> mean something, but it never meant anything before.  It's not a version
> number, it's a magic identifier.  We might need to avoid certain values
> to prevent collisions with other file formats, so I'm suspicious of
> trying to make it match up with some expectation.

Ok, for better or worse, I whacked it back to the old series.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Ok, I've finally committed the patch, using wal_level as the name of the
GUC.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, Apr 29, 2010 at 1:14 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Ok, I've finally committed the patch, using wal_level as the name of the
> GUC.

> !     if (InArchiveRecovery && XLogRequestRecoveryConnections)
> !     {
> !         if (ControlFile->wal_level < WAL_LEVEL_HOT_STANDBY)
> !             ereport(ERROR,
> !                     (errmsg("recovery connections cannot start because wal_level was not set to 'hot_standby' on
theWAL source server")));
 

I still have the complaint against the above check. Since the default value
of recovery_connections is TRUE, the users who need only archiving not
replication (i.e., wal_level is set to 'archive') are likely to often
see the failure
of the archive recovery by the above check.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Thu, Apr 29, 2010 at 1:14 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Ok, I've finally committed the patch, using wal_level as the name of the
>> GUC.
> 
>> !     if (InArchiveRecovery && XLogRequestRecoveryConnections)
>> !     {
>> !         if (ControlFile->wal_level < WAL_LEVEL_HOT_STANDBY)
>> !             ereport(ERROR,
>> !                     (errmsg("recovery connections cannot start because wal_level was not set to 'hot_standby' on
theWAL source server")));
 
> 
> I still have the complaint against the above check. Since the default value
> of recovery_connections is TRUE, the users who need only archiving not
> replication (i.e., wal_level is set to 'archive') are likely to often
> see the failure
> of the archive recovery by the above check.

Should we change the default to recovery_connections=off ? It is a quite
big change in default behavior over previous releases that hot standby
is enabled by default, so maybe that would be the right thing to do anyway.

Or maybe add a hint to the above, "disable hot standby by setting
recovery_connections=off, or set wal_level='hot_standby' in the primary"

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, 2010-04-29 at 10:55 +0300, Heikki Linnakangas wrote:

> Should we change the default to recovery_connections=off ? It is a quite
> big change in default behavior over previous releases that hot standby
> is enabled by default, so maybe that would be the right thing to do anyway.
> 
> Or maybe add a hint to the above, "disable hot standby by setting
> recovery_connections=off, or set wal_level='hot_standby' in the primary"

I've been waiting for this suggestion, a clear knock-on effect from your
earlier changes. If we just have a static default its bad either way.

The most sensible way is to make the default act intelligently depending
upon what it finds in the control file. If WAL contains hot_standby
info, then recovery_connections should be on by default, though can be
turned off explicitly if required. If WAL does not contain hot_standby
info we then we throw a WARNING and continue as if recovery_connections
= off, or if recovery_connections is specified explicitly then we should
throw an ERROR so that the user knows it won't be possible to use this.
I think that means we'd need to change recovery_connections to have 2 or
3 values, but non-boolean:

recovery_connections = auto (default) | off
or
recovery_connections = auto (default) | on | off

and I would suggest removing it from postgresql.conf.sample

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2010-04-29 at 10:55 +0300, Heikki Linnakangas wrote:
> 
>> Should we change the default to recovery_connections=off ? It is a quite
>> big change in default behavior over previous releases that hot standby
>> is enabled by default, so maybe that would be the right thing to do anyway.
>>
>> Or maybe add a hint to the above, "disable hot standby by setting
>> recovery_connections=off, or set wal_level='hot_standby' in the primary"
> 
> I've been waiting for this suggestion, a clear knock-on effect from your
> earlier changes. If we just have a static default its bad either way.
> 
> The most sensible way is to make the default act intelligently depending
> upon what it finds in the control file. If WAL contains hot_standby
> info, then recovery_connections should be on by default, though can be
> turned off explicitly if required. If WAL does not contain hot_standby
> info we then we throw a WARNING and continue as if recovery_connections
> = off, or if recovery_connections is specified explicitly then we should
> throw an ERROR so that the user knows it won't be possible to use this.
> I think that means we'd need to change recovery_connections to have 2 or
> 3 values, but non-boolean:
> recovery_connections = auto (default) | off
> or
> recovery_connections = auto (default) | on | off
>

Seems overly complicated. It would be simpler to just set it 'off' by
default.

If it's 'off' by default, and you want to use hot standby, you will
notice quickly that the server doesn't accept connections, and you
switch it on. If it's 'on' (or 'auto'), you might not realize that your
standby server is accepting connections, when you only wanted to set it
up as a warm standby server for high availability.

The 'auto' mode just makes it more surprising. Connections might be
allowed at first, but when someone switches wal_level in the primary for
some reason, your reporting standby is up, but no longer allows
connections. And vice versa, if you set up a warm standby server for
high availability where you don't want to allow connections, and someone
flips the wal_level switch in the master, the standby suddenly starts
accepting connections.

I can't imagine a situation where "allow connections if the WAL allows
it" would be a sensible setting. If there is one, we could have an
'auto' mode, but not as the default.

> and I would suggest removing it from postgresql.conf.sample

-1. Why try to hide it?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, 2010-04-29 at 12:55 +0300, Heikki Linnakangas wrote:

> Seems overly complicated. 

If you turn wal_level = hot_standby in the master, then the slaves work,
unless you say otherwise. What is complicated about that?

> It would be simpler to just set it 'off' by
> default.

As a parameter, yes, it is simpler. 
The behaviour is not simplified by doing that.

I care about the behaviour.

> If it's 'off' by default, and you want to use hot standby, you will
> notice quickly that the server doesn't accept connections, and you
> switch it on. If it's 'on' (or 'auto'), you might not realize that your
> standby server is accepting connections, when you only wanted to set it
> up as a warm standby server for high availability.

That requires we hit the problem, alter the parameter and restart. We
don't need to do that at all. It could "Just Work".

> The 'auto' mode just makes it more surprising. Connections might be
> allowed at first, but when someone switches wal_level in the primary for
> some reason, your reporting standby is up, but no longer allows
> connections.

If you don't do this, what would happen on standby? Does it silently
stop applying changes after the point you turned it off, or does it
throw an ERROR.

> And vice versa, if you set up a warm standby server for
> high availability where you don't want to allow connections, and someone
> flips the wal_level switch in the master, the standby suddenly starts
> accepting connections.

Some people would regard that as a good thing. The whole point of
wal_level discussion was to avoid needing multiple switches to turn
something on. You are proposing exactly that here. Why is this
discussion different? Why should we not be able to set in just one
place.

If that behaviour concerns the DBA then they can turn it off explicitly.

> I can't imagine a situation where "allow connections if the WAL allows
> it" would be a sensible setting. If there is one, we could have an
> 'auto' mode, but not as the default.

I can. Simple, obvious behaviour. Turn it on in the master, works on the
standbys. We want people to use the feature, not fumble with it.

Just think "auto" means "linked to master". If you don't like it you can
turn it off.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> The whole point of
> wal_level discussion was to avoid needing multiple switches to turn
> something on.

No, the point of wal_level was to make the behavior easier to
understand, by uncoupling the level of WAL-logging from various other
switches, controling it directly and explicitly with a new GUC instead.
It added a new switch, but made the system as a whole easier to
understand and configure.

>> > I can't imagine a situation where "allow connections if the WAL allows
>> > it" would be a sensible setting. If there is one, we could have an
>> > 'auto' mode, but not as the default.
> 
> I can. Simple, obvious behaviour. Turn it on in the master, works on the
> standbys.

Yes, but when would you want that?

Here's the use cases I can think of:

purpose of the standby - do you want hot standby or not?
reporting - yes
offloading queries from master - yes
warm standby for high availability - no
offloading taking filesystem-level backups from master - no
offloading pg_dump from master - yes

All of those either want hot standby, or don't. What use case is there
for "enabled, if the required information is in the WAL"? If there is
one, it sure doesn't seem like the most common one. When you just want
hot standby to be on or off, it's weird to control that from the master.
Action at a distance.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, Apr 29, 2010 at 5:55 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> The most sensible way is to make the default act intelligently depending
>> upon what it finds in the control file. If WAL contains hot_standby
>> info, then recovery_connections should be on by default, though can be
>> turned off explicitly if required. If WAL does not contain hot_standby
>> info we then we throw a WARNING and continue as if recovery_connections
>> = off, or if recovery_connections is specified explicitly then we should
>> throw an ERROR so that the user knows it won't be possible to use this.
>> I think that means we'd need to change recovery_connections to have 2 or
>> 3 values, but non-boolean:
>> recovery_connections = auto (default) | off
>> or
>> recovery_connections = auto (default) | on | off
>>
>
> Seems overly complicated. It would be simpler to just set it 'off' by
> default.

I kind of agree with Simon on this one, except I would probably choose
to have just on and off and make on work like his auto.

In other words, recovery_connections=on means, give me recovery
connections if possible, otherwise don't worry about it.

I'd rather not have to change the default to recovery_connections=off
- that's one more parameter someone has to set properly.

...Robert


On Thu, 2010-04-29 at 13:46 +0300, Heikki Linnakangas wrote:

> purpose of the standby - do you want hot standby or not?

> reporting - yes
> offloading queries from master - yes
> offloading pg_dump from master - yes

These would work using the proposed default.

> warm standby for high availability - no
> offloading taking filesystem-level backups from master - no

These can be explicitly turned off.
You're presuming there is benefit in turning recovery_connections = off,
though it is perfectly valid to do those use cases with it on. There are
many ways to control connections, not just that switch. It will
certainly be easier to monitor the HA system by running queries against
it than not. Do you have any evidence there is benefit in the *typical*
case for turning the setting off? 

> All of those either want hot standby, or don't. What use case is there
> for "enabled, if the required information is in the WAL"? If there is
> one, it sure doesn't seem like the most common one. 

I think "I want it to just work" is fairly common.

> When you just want
> hot standby to be on or off, it's weird to control that from the master.
> Action at a distance.

That's the proposed default, not the only control. The standby can
override what the master says if it definitely doesn't want it, but is
smart enough to avoid silly configuration errors. Turning off wal_level
on the master *does* affect the standby, so pretending we have a
completely separate configuration option makes no sense for me.

Robert's point when he raised the wal_level discussion was about
reducing the number of parameter settings required to make this work. An
intelligent default will reduce level of configuration and allow us to
make Hot Standby work on the standby, out of the box, and that is worth
having.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> I kind of agree with Simon on this one, except I would probably choose
> to have just on and off and make on work like his auto.
> 
> In other words, recovery_connections=on means, give me recovery
> connections if possible, otherwise don't worry about it.

If you're setting up a reporting server, and hot standby can't start,
the server is not functioning properly. I would like to get an error in
that case.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, Apr 29, 2010 at 7:19 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>> I kind of agree with Simon on this one, except I would probably choose
>> to have just on and off and make on work like his auto.
>>
>> In other words, recovery_connections=on means, give me recovery
>> connections if possible, otherwise don't worry about it.
>
> If you're setting up a reporting server, and hot standby can't start,
> the server is not functioning properly. I would like to get an error in
> that case.

Presumably you will actually try connecting to it, no?

And what happens when someone changes the setting on the master from
hot_standby back to archive?  I'd rather have the reporting server
continue recovery without being able to accept connections rather than
die in its tracks.

I think this is a problem that should be solved by monitoring, rather
than by automatically taking the server down.

...Robert


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2010-04-29 at 13:46 +0300, Heikki Linnakangas wrote:
>> warm standby for high availability - no
>> offloading taking filesystem-level backups from master - no
> 
> These can be explicitly turned off.
> You're presuming there is benefit in turning recovery_connections = off,
> though it is perfectly valid to do those use cases with it on. There are
> many ways to control connections, not just that switch. It will
> certainly be easier to monitor the HA system by running queries against
> it than not. Do you have any evidence there is benefit in the *typical*
> case for turning the setting off? 

It depends on your exact configuration, but one typical one is that you
have a work-balancing router or pgbouncer sitting in front of the
servers, directing traffic to the server that's up and running. If the
standby starts accepting connections prematurely, the clients will be
incorrectly routed to the standby server and update operations will fail
(and SELECTs will return slightly delayed data).

>> All of those either want hot standby, or don't. What use case is there
>> for "enabled, if the required information is in the WAL"? If there is
>> one, it sure doesn't seem like the most common one. 
> 
> I think "I want it to just work" is fairly common.

You need quite a bit of set up anyway, flipping one more GUC hardly
makes a difference. There is less risk of oversight and accidental
misconfiguration if the admin makes a conscious decision to turn it on.

I'd like to scaremonger with the following fictional story:

An administrator sets up two PostgreSQL servers in a high availability
warm standby set up. Clients are set up to try to connect to both
servers, so that when failover happens, they will automatically
reconnect to the remaining server. wal_level is set to 'archive' in the
master, and all is well.

After running successfully for six months in production, a reporting
server is introduced to offload heavy queries from the mission-critical
OLTP server. wal_level is set to 'hot_standby' in the master to allow
read-only queries to be run against the reporting server, and the
reporting server is set up using the same WAL archive used for the warm
standby server. All seems to be running well, the admin logs in to the
application and clicks through a few screens to test it. A few hours
later a user rings and complains that he's getting a "cannot execute
INSERT in a read-only transaction" error. What happened, and why does it
work just fine when the admin tries the same?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, 2010-04-29 at 14:55 +0300, Heikki Linnakangas wrote:

> I'd like to scaremonger

Seems so.

recovery_connections was on by default and unanimous agreement until
recently and I don't want to change that now, just because a change
somewhere else appears to be forcing that but need not be so. It was
sensible to add a switch to turn HS off, but it should not be the
default, especially not one that requires a restart to enable a high
availability feature.

That is important in a feature that takes a while to "kick-in" and so
the user may patiently wait for it to come up and it never does.

I have no wish to repeat the situation that PostgreSQL requires a
restart to enable a feature, while other forks retain the ability to
enable the parameter without restart, as occurs with archive_mode, IIRC.

Perhaps we should not re-think wal_level if we've just moved the
parameter problem somewhere else?

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> recovery_connections was on by default and unanimous agreement until
> recently and I don't want to change that now, just because a change
> somewhere else appears to be forcing that but need not be so.

We never really had that discussion, until now. It has always been 'on'
by default, and that has been useful to get more testing during the
development cycle, but it's not clear that's a good default for
real-life usage.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, Apr 29, 2010 at 7:55 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> What happened,

I don't find that story very compelling because there are an infinite
number of ways to have high-availability not work if you start by
supposing that the person who sets them up doesn't test them properly
and verify that everything actually works as expected.  You could do
all sorts of things wrong in that case.

How about this one?  The administrator sets up a master and a slave.
She's heard about this new Hot Standby feature and so decides to
enable it on the slave just to play around with it.  Subsequently, she
takes a better job at another company and they hire a new
administrator, who thinks the Hot Standby WAL may be causing a
performance problem on the master, so he switches wal_mode to archive.Six months later the primary fails.

I think you can construct a scenario where just about any default
setting causes a problem, but I like the idea of having this enabled
by default, and I think it works fine if you just treat the case where
recovery_connections=on but wal_mode=archive as a LOG or WARNING
rather than an ERROR.

...Robert


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> On Thu, Apr 29, 2010 at 7:19 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Robert Haas wrote:
>>> I kind of agree with Simon on this one, except I would probably choose
>>> to have just on and off and make on work like his auto.
>>>
>>> In other words, recovery_connections=on means, give me recovery
>>> connections if possible, otherwise don't worry about it.
>> If you're setting up a reporting server, and hot standby can't start,
>> the server is not functioning properly. I would like to get an error in
>> that case.
> 
> Presumably you will actually try connecting to it, no?

Sure. I guess it would be acceptable if 'on' meant 'on, if possible', as
long as 'off' is the default. Otherwise it's too surprising.

> And what happens when someone changes the setting on the master from
> hot_standby back to archive?  I'd rather have the reporting server
> continue recovery without being able to accept connections rather than
> die in its tracks.

As the code stands, if wal_level is switched from 'hot_standby' to
'archive' in the primary, and it's restarted, the standby will die.
There is currently no way to stop to accepting read-only connections
once they're allowed. It could restart with connections disallowed, but
it needs a restart.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> How about this one?  The administrator sets up a master and a slave.
> She's heard about this new Hot Standby feature and so decides to
> enable it on the slave just to play around with it.  Subsequently, she
> takes a better job at another company and they hire a new
> administrator, who thinks the Hot Standby WAL may be causing a
> performance problem on the master, so he switches wal_mode to archive.
>  Six months later the primary fails.

Umm, I don't see the problem. For high availability purposes, the
standby works just as well with wal_mode='archive'. Or are you saying
that the standby was configured with recovery_connections='on', and
failed to start for those six months because hot standby could not be
enabled, and no-one noticed?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, Apr 29, 2010 at 9:00 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>> On Thu, Apr 29, 2010 at 7:19 AM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> Robert Haas wrote:
>>>> I kind of agree with Simon on this one, except I would probably choose
>>>> to have just on and off and make on work like his auto.
>>>>
>>>> In other words, recovery_connections=on means, give me recovery
>>>> connections if possible, otherwise don't worry about it.
>>> If you're setting up a reporting server, and hot standby can't start,
>>> the server is not functioning properly. I would like to get an error in
>>> that case.
>>
>> Presumably you will actually try connecting to it, no?
>
> Sure. I guess it would be acceptable if 'on' meant 'on, if possible', as
> long as 'off' is the default. Otherwise it's too surprising.

I disagree.  I think on should mean 'on, if possible' and 'on' should
be the default.  I think that's how it was before this round of
changes - but maybe I'm mistaken?  It seems like it makes sense, at
any rate.

...Robert


On Thu, Apr 29, 2010 at 9:07 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>> How about this one?  The administrator sets up a master and a slave.
>> She's heard about this new Hot Standby feature and so decides to
>> enable it on the slave just to play around with it.  Subsequently, she
>> takes a better job at another company and they hire a new
>> administrator, who thinks the Hot Standby WAL may be causing a
>> performance problem on the master, so he switches wal_mode to archive.
>>  Six months later the primary fails.
>
> Umm, I don't see the problem. For high availability purposes, the
> standby works just as well with wal_mode='archive'. Or are you saying
> that the standby was configured with recovery_connections='on', and
> failed to start for those six months because hot standby could not be
> enabled, and no-one noticed?

Yep.

...Robert


Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 29, 2010 at 7:19 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> If you're setting up a reporting server, and hot standby can't start,
>> the server is not functioning properly. I would like to get an error in
>> that case.
>
> Presumably you will actually try connecting to it, no?
[...]
> I think this is a problem that should be solved by monitoring, rather
> than by automatically taking the server down.

+1 on both counts. Is that a +2?

Regards,
-- 
dim


Simon Riggs <simon@2ndQuadrant.com> writes:
> recovery_connections was on by default and unanimous agreement until
> recently and I don't want to change that now,

Uh, it was on by default only because a lot of us hadn't noticed that.
I agree with Heikki's position here: it should not be on by default.
        regards, tom lane


On Thu, 2010-04-29 at 09:48 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > recovery_connections was on by default and unanimous agreement until
> > recently and I don't want to change that now,
> 
> Uh, it was on by default only because a lot of us hadn't noticed that.
> I agree with Heikki's position here: it should not be on by default.

We're talking about the case where somebody has set up a standby
database. It's not like they happen on this accidentally.

* HS on by default, in the standby, via recovery_connections. 
* HS off by default, in the master, via wal_level. 

Overall, that *is* off by default. (Note: I said nothing about that).

We don't need it off *twice*, nor do we even need two switches.

Last week we had one switch and it was on by default, now we're looking
at two switches and off by default. I haven't yet heard a good reason
for the change being proposed here by Heikki. The use cases are rare, if
they truly exist at all. Monitoring the standby is much easier with HS
on, for example. 

What is the reason for the *extra* "off" switch? Why two? Why "off"
twice?

-- Simon Riggs           www.2ndQuadrant.com



* Simon Riggs <simon@2ndQuadrant.com> [100429 11:24]:
> 
> What is the reason for the *extra* "off" switch? Why two? Why "off"
> twice?

Because I run a PITR slave off a WAL archive... And I'm going to
continue to try and use this setup... I expect to try and experiment
with wal-streaming too when everything is up to 9, and be a little upset
when the "old faithfull" backup I've got and am betting my job on
suddenly starts acting differentlyl, or accepting connections and
causing my db clients to start thowing errors because I'm trying to get
a streaming replication w/ hot standby up on a *different* slave...

It's all about the path of least *suprise*.  My "least suprise" would
have been that a similar config I have now with my PITR slaves would
pretty much still work, and not suddenly start accepting connections,
and even worse, at some later date when I've already verified that it
was doing what I expected with the configuration.

There has got to be *something* that tells me it is trying to allow
connectins during recovery, but failing...  And i think that just loging
a WARNING and continuing isn't enough, because I'll admit to having
upgraded PG and just "testing" it, not looking thoroughly through the
logs for any new messages that might peak my interest...

That said, since I follow -hackers, that "something" could be considered
this email thread, and I know I will be extra careful about my
deployment of PITR slaves w/ 9 wrt making sure I'm explicit in all the
settings I can find...

And I'll make sure I look more carefully at logs when deploying 9 as well
;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2010-04-29 at 09:48 -0400, Tom Lane wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> recovery_connections was on by default and unanimous agreement until
>>> recently and I don't want to change that now,
>> Uh, it was on by default only because a lot of us hadn't noticed that.
>> I agree with Heikki's position here: it should not be on by default.
> 
> We're talking about the case where somebody has set up a standby
> database. It's not like they happen on this accidentally.
> 
> * HS on by default, in the standby, via recovery_connections. 
> * HS off by default, in the master, via wal_level. 
> 
> Overall, that *is* off by default. (Note: I said nothing about that).
> 
> We don't need it off *twice*, nor do we even need two switches.

wal_level is not supposed to control if Hot Standby is on or off. It
controls what information is written to WAL. If you have
wal_level='archive' and recovery_connections='on' in the standby, that's
a configuration error, just like setting wal_level='minimal' and
archive_mode='on'. You wanted to enable Hot Standby, but the information
required isn't in the WAL.

It's error-prone to control what happens in the standby from the master.
That's the "action at a distance" effect I mentioned earlier. The master
should be configured in the master, and each standby should configured
in the standby.

The right mental model is that wal_mode controls what is written to the
WAL. In fact, I might recommend always setting it to 'hot_standby'
instead of 'archive', even if you have no standbys and you're only doing
WAL archival - it's a lot easier to do PITR with hot standby. To control
whether hot standby is enabled in the standby, use recovery_connections.
And I'd prefer it to be off by default because off is the safer option.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, 2010-04-29 at 18:40 +0300, Heikki Linnakangas wrote:

> It's error-prone to control what happens in the standby from the master.
> That's the "action at a distance" effect I mentioned earlier. The master
> should be configured in the master, and each standby should configured
> in the standby.

Repeating the same thing when its been refuted doesn't help. What you
say has not been proposed.

If there is a case for HS-off-by-default, make it. If you want to change
code, arguing directly against your own position, mentioned many times,
we need a reason. How else can we know which argument of yours to
believe?

-- Simon Riggs           www.2ndQuadrant.com



On Thu, 2010-04-29 at 11:37 -0400, Aidan Van Dyk wrote:
> * Simon Riggs <simon@2ndQuadrant.com> [100429 11:24]:
> > 
> > What is the reason for the *extra* "off" switch? Why two? Why "off"
> > twice?
> 
> Because I run a PITR slave off a WAL archive... And I'm going to
> continue to try and use this setup... I expect to try and experiment
> with wal-streaming too when everything is up to 9, and be a little upset
> when the "old faithfull" backup I've got and am betting my job on
> suddenly starts acting differentlyl, or accepting connections and
> causing my db clients to start thowing errors because I'm trying to get
> a streaming replication w/ hot standby up on a *different* slave...
> 
> It's all about the path of least *suprise*.  My "least suprise" would
> have been that a similar config I have now with my PITR slaves would
> pretty much still work, and not suddenly start accepting connections,
> and even worse, at some later date when I've already verified that it
> was doing what I expected with the configuration.

If people upgrade to 9.0 and then are surprised that the features listed
are actually available, I too would be surprised. 

If we all believe that these radical surprises are a problem, then we
should also turn off join removal and loads of other features in 9.0
that will also cause surprises.

HS is the most requested feature across multiple polls and *not* being
able to connect to it is likely to come as a huge surprise to many
people. HS is just as secure as the main database.

There is no big use case for "run with HS turned off". Everybody wants
it on, as long as it works and don't cause problems. So far, there is no
evidence to make anyone think that it would and I wish to avoid that
implication. 

Heikki previously argued strongly against having any switch at all, so
it could be turned on all the time. Nothing's changed.

-- Simon Riggs           www.2ndQuadrant.com



* Simon Riggs <simon@2ndQuadrant.com> [100429 12:06]:
> Repeating the same thing when its been refuted doesn't help. What you
> say has not been proposed.
> 
> If there is a case for HS-off-by-default, make it. If you want to change
> code, arguing directly against your own position, mentioned many times,
> we need a reason. How else can we know which argument of yours to
> believe?

I'm not against HS being on-by-default.    But if it is, and the WAL
it's consuming doesn't have the HS-records by default, then I want PG to
consider that a problem, make sure I absolutely know it's a problem...

I agree with Heikki that the action-at-a-distance of HS
trying-to-work-but-maybe-not-this-time-depending-on-the-master is an
undesirable state...

Like everything else in PG, I'ld like it to "work completely", or tell
me there is a problem.  

That said, I'ld probalby be happy with PG 9 having a "default" config
of:
wal_mode = hot_standbyrecovery_connections = on

Make it set to generate enough WAL and actually do recovery connections.

But also make the recover_connections boolean really mean what it s
called.  It's not called try_recovery_connections

a.


-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2010-04-29 at 18:40 +0300, Heikki Linnakangas wrote:
> 
>> It's error-prone to control what happens in the standby from the master.
>> That's the "action at a distance" effect I mentioned earlier. The master
>> should be configured in the master, and each standby should configured
>> in the standby.
> 
> Repeating the same thing when its been refuted doesn't help. What you
> say has not been proposed.

I was responding to your mail where you said that there is two settings
for turning hot standby off, and asking why we need two. What I'm saying
is that you shouldn't think of wal_level as a setting to turn hot
standby on or off. It would be error-prone to control that from the
master. So there is only one setting to turn hot standby on/off,
recovery_connections in the standby.

> If there is a case for HS-off-by-default, make it. If you want to change
> code, arguing directly against your own position, mentioned many times,
> we need a reason. How else can we know which argument of yours to
> believe?

Now you lost me.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


* Simon Riggs <simon@2ndQuadrant.com> [100429 12:14]:HS is the most requested feature across multiple polls and *not*
being
> able to connect to it is likely to come as a huge surprise to many
> people. HS is just as secure as the main database.
> 
> There is no big use case for "run with HS turned off". Everybody wants
> it on, as long as it works and don't cause problems. So far, there is no
> evidence to make anyone think that it would and I wish to avoid that
> implication. 
> 
> Heikki previously argued strongly against having any switch at all, so
> it could be turned on all the time. Nothing's changed.

I'm not arguing against having it on by default.

What I'm against (and that's strong, it should probably be "prefer not
to have") is having it configured "on", but having it "not work".

If it's been configured to run in a state it can't, I would prefer it
didn't run, not that it ran, but in a slightly different state...

But I know that's just a preference... And one from an old-school unix
admin too...

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

On Thu, Apr 29, 2010 at 12:17 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> * Simon Riggs <simon@2ndQuadrant.com> [100429 12:06]:
>
>> Repeating the same thing when its been refuted doesn't help. What you
>> say has not been proposed.
>>
>> If there is a case for HS-off-by-default, make it. If you want to change
>> code, arguing directly against your own position, mentioned many times,
>> we need a reason. How else can we know which argument of yours to
>> believe?
>
> I'm not against HS being on-by-default.    But if it is, and the WAL
> it's consuming doesn't have the HS-records by default, then I want PG to
> consider that a problem, make sure I absolutely know it's a problem...

Nobody is proposing otherwise.  What Simon and I are proposing is that
if the master is configured to support HS, it comes up on the slave by
default without requiring additional configuration.  Now maybe that's
too much spooky action at a distance, but I suspect it IS the behavior
most people will want.  If Tom and Heikki get their way and change the
default behavior, it'll just mean (nearly) everyone flips one extra
configuration switch.

> I agree with Heikki that the action-at-a-distance of HS
> trying-to-work-but-maybe-not-this-time-depending-on-the-master is an
> undesirable state...
>
> Like everything else in PG, I'ld like it to "work completely", or tell
> me there is a problem.
>
> That said, I'ld probalby be happy with PG 9 having a "default" config
> of:
>
>        wal_mode = hot_standby
>        recovery_connections = on
>
> Make it set to generate enough WAL and actually do recovery connections.

That would be a bad idea - there's a significant performance penalty
from setting wal_level to anything other than minimal (just as there
is for turning on archive_mode in 8.4).

> But also make the recover_connections boolean really mean what it s
> called.  It's not called try_recovery_connections

Well, sure.  But setting work_mem to 1GB doesn't force the planner to
use a gigabyte of memory for every sort, either.  It just gives
permission.

...Robert


Simon Riggs <simon@2ndQuadrant.com> wrote:
> There is no big use case for "run with HS turned off".
There was mention of running the live database and a warm standby,
with a connection pooler pointed to both so that it could
automatically reconnect on failover.
> Everybody wants it on, as long as it works and don't cause
> problems.
I don't see any immediate use cases for our shop, unless it is the
only way to get WAL copied to our backup server through SR.  Franky,
I'd much rather have a WAL receiver which assembled the WAL file
segments (at the "archive" level of logging) as they arrived and
then fed them to our warm standby clusters.  (Of course, I've never
claimed to have a "typical" environment.)
I would be uncomfortable with a default of "auto" for a replica to
allow connections based on WAL file contents.  If someone explicitly
sets it to "auto", then it's up to them to understand the
implications.  IMO, If someone sets it to "on", they should get a
highly visible failure if it can't run in that mode.
-Kevin


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2010-04-29 at 11:37 -0400, Aidan Van Dyk wrote:
>> It's all about the path of least *suprise*.  My "least suprise" would
>> have been that a similar config I have now with my PITR slaves would
>> pretty much still work, and not suddenly start accepting connections,
>> and even worse, at some later date when I've already verified that it
>> was doing what I expected with the configuration.
> 
> If people upgrade to 9.0 and then are surprised that the features listed
> are actually available, I too would be surprised. 

Being available is not the same as being on by default.

> There is no big use case for "run with HS turned off". Everybody wants
> it on, as long as it works and don't cause problems. So far, there is no
> evidence to make anyone think that it would and I wish to avoid that
> implication. 

The use case I explained earlier exists, the one with a master and a
warm standby server with pgbouncer in front. And Aidan and me are human
beings, included in "everybody".

You know that there is cases where it causes problems in the standby,
even ignoring the possibility of bugs. For example, if you increase
max_connections in the master, the standby will abort. It's safer to run
with recovery_connections off if you don't need the feature.

> Heikki previously argued strongly against having any switch at all, so
> it could be turned on all the time. Nothing's changed.

I argued that we should always WAL-log the information required to
enable hot standby, on the grounds that the overhead is small. I.e. that
wal_level would be a two-state switch, either 'minimal' or
'hot_standby'. But I'm happy with how that is now.

But that's not what we're discussing now, don't confuse things. We're
talking about recovery_connections, not wal_level.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Robert Haas <robertmhaas@gmail.com> writes:
> Nobody is proposing otherwise.  What Simon and I are proposing is that
> if the master is configured to support HS, it comes up on the slave by
> default without requiring additional configuration.  Now maybe that's
> too much spooky action at a distance, but I suspect it IS the behavior
> most people will want.  If Tom and Heikki get their way and change the
> default behavior, it'll just mean (nearly) everyone flips one extra
> configuration switch.

We already bought into the "one extra switch" penalty when we agreed to
invent a separate wal_level parameter.  The entire point of that was to
have more, but simpler-to-understand, parameters with fewer hidden
interactions.  Arguing that there are now too many knobs to twiddle
amounts to trying to revisit that discussion, which we don't have time
for now.

>> That said, I'ld probalby be happy with PG 9 having a "default" config
>> of:
>> � � � �wal_mode = hot_standby
>> � � � �recovery_connections = on

> That would be a bad idea - there's a significant performance penalty
> from setting wal_level to anything other than minimal (just as there
> is for turning on archive_mode in 8.4).

There is not only a performance penalty, but a reliability penalty.
Enabling these switches turns on a whole lot of code that, with all
due respect to those who have worked on it, is absolutely positively
guaranteed to be full of bugs.  Not all of which are going to be flushed
out during beta.  If we ship 9.0 with these things on by default, it
will result in an immediate reliability downgrade for installations that
are simply doing what they did before and not even interested in HS/SR.

Maybe by 9.1 or 9.2 it would be sensible to have some of this code
turned on by default.  But it is absolutely not in keeping with this
project's mindset or historical practice to enable it by default now.
        regards, tom lane


On Thu, 2010-04-29 at 11:48 -0500, Kevin Grittner wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>  
> > There is no big use case for "run with HS turned off".
>  
> There was mention of running the live database and a warm standby,
> with a connection pooler pointed to both so that it could
> automatically reconnect on failover.

pgpool already copes by design and has been working against HS for
months.

If the connection pooler doesn't have some way of knowing the server is
a standby, then it will try to connect and fail, which presumably it
would retry.

This seems like a hypothetical connection pooler rather than a real one,
to me.

HS can continue to be used with people's existing Warm Standby setup,
just now you can read the database as well. Which is what everybody has
requested.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs <simon@2ndQuadrant.com> wrote:
> Which is what everybody has requested.
You continue to use the term "everybody" rather loosely.
I don't begrudge people features they want, even when *I* don't need
them; but please don't take my lack of argument against adding this
feature as a silent request for it.  SR has a place in our shop,
especially if it can create traditional WAL file segments on the
receiving end. (I understand that will not be a built-in feature for
9.0, but perhaps there will be a pgfoundry project or some such.) HS
is no use to us, and I would rather not pay a performance penalty or
take the risks of exercising complex new code paths because others
need it.
-Kevin


On Thu, Apr 29, 2010 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> That said, I'ld probalby be happy with PG 9 having a "default" config
>>> of:
>>>        wal_mode = hot_standby
>>>        recovery_connections = on
>
>> That would be a bad idea - there's a significant performance penalty
>> from setting wal_level to anything other than minimal (just as there
>> is for turning on archive_mode in 8.4).
>
> There is not only a performance penalty, but a reliability penalty.
> Enabling these switches turns on a whole lot of code that, with all
> due respect to those who have worked on it, is absolutely positively
> guaranteed to be full of bugs.  Not all of which are going to be flushed
> out during beta.  If we ship 9.0 with these things on by default, it
> will result in an immediate reliability downgrade for installations that
> are simply doing what they did before and not even interested in HS/SR.

This is a pretty good argument, and Heikki's argument just upthread
that a mismatched max_connections setting could bollix things is an
even better one.  So I'm now changing my mind and thinking this should
be off by default, also.

...Robert


On Thu, 2010-04-29 at 13:03 -0400, Tom Lane wrote:

> There is not only a performance penalty

I've asked for evidence that recovery is any slower as a result of HS.
If we had some, I'm guessing we'd be tuning rather than discussing this.

> Enabling these switches turns on a whole lot of code that, with all
> due respect to those who have worked on it, is absolutely positively
> guaranteed to be full of bugs. 

I understand that view and don't take offence at all, thank you for your
concern.

I wish to take seriously the possibility you may be correct, so I want
to squash unfounded rumours that will scare people into keeping HS
turned off, because that will keep the bugs in, rather than flush them
out so we can fix them.

Can we keep the default=on during beta and collect evidence before
making a final decision at release?

-- Simon Riggs           www.2ndQuadrant.com



Aidan Van Dyk wrote:
> It's all about the path of least *suprise*.  My "least suprise" would
> have been that a similar config I have now with my PITR slaves would
> pretty much still work, and not suddenly start accepting connections,
> and even worse, at some later date when I've already verified that it
> was doing what I expected with the configuration.
>   

The first time I setup a system to test HS, when I got to the point 
where the slave was up and running as a standard warm standby, I 
expected there to be something else I had to do in order for it to be 
available for queries.  When I fired up psql and I was able to run 
queries without doing anything extra, I was surprised--but it was that 
fun, everything just works when I expected it to be harder than that 
kind of surprise.

One of the reasons the version number was bumped up to 9.0 was to put 
people on warning that they should not assume their old setups would 
port forward without behavioral changes.  The fact that existing 
warm-standby server users will be surprised to find they can run queries 
without doing anything special could be considered under that banner.  
If you feel that's not obvious enough, that could argue for more 
prominent documentation of that fact, rather than turning it off.  The 
idea that it should be made harder to enable just to protect the 
expectations current users, and therefore introduce yet another place 
where PostgreSQL is less friendly to get started with than it could be, 
is backwards from the perspective of making things as easy as possible 
for new users.

Arguing from a usability standpoint needs to consider both new and 
existing user requirements, and those are quite opposed to one another 
in terms of what default makes more sense IMHO.  Now, if the argument is 
from the perspective of "this adds performance/reliability issues that 
weren't there before", and those go away if the feature is disabled by 
default, that's a respectable and indisputable reason to do so.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
"Joshua D. Drake"
Date:
On Thu, 2010-04-29 at 13:03 -0400, Tom Lane wrote:

> There is not only a performance penalty, but a reliability penalty.
> Enabling these switches turns on a whole lot of code that, with all
> due respect to those who have worked on it, is absolutely positively
> guaranteed to be full of bugs.  Not all of which are going to be flushed
> out during beta.  If we ship 9.0 with these things on by default, it
> will result in an immediate reliability downgrade for installations that
> are simply doing what they did before and not even interested in HS/SR.
>
> Maybe by 9.1 or 9.2 it would be sensible to have some of this code
> turned on by default.  But it is absolutely not in keeping with this
> project's mindset or historical practice to enable it by default now.
>
>             regards, tom lane

+1

Joshua D. Drake


>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
On Thu, 2010-04-29 at 12:16 -0500, Kevin Grittner wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>  
> > Which is what everybody has requested.
>  
> You continue to use the term "everybody" rather loosely.

Sorry, that was imprecise, I didn't mean to exaggerate.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Can we keep the default=on during beta and collect evidence before
> making a final decision at release?

That's tempting to get more (inadvertent) testing of HS, but I don't
think it would be fair to the beta testers. The expectation is that
what's in beta is what's in the release, unless some new issue pops up.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Simon Riggs <simon@2ndQuadrant.com> wrote:
> I've asked for evidence that recovery is any slower as a result of
> HS.
Can you quantify the impact on the number of bytes written to WAL
when switching from the archiving level to the hot standby level?
-Kevin


On Thu, 2010-04-29 at 19:55 +0300, Heikki Linnakangas wrote:

> And Aidan and me are human
> beings, included in "everybody".

Yes, that was too loose. I dislike that kind of argument and I'm sorry
that slipped in here.

> You know that there is cases where it causes problems in the standby,
> even ignoring the possibility of bugs. For example, if you increase
> max_connections in the master, the standby will abort.

That behaviour was suggested by you. I don't think its anywhere near
necessary that it does that and would like to remove that restriction.
The likelihood we'll ever run out of slots is small even with large
increases in max_connections. Example: if we increase from 100 to 500
we'd only hit the limit if we had all 500 connections on the master
simultaneously running write transactions with an average 12
subtransactions each.

> It's safer to run
> with recovery_connections off if you don't need the feature.

Presumably your advice is also that people should not run with Streaming
Replication if they don't need that feature? And that we should also
have an enable_joinremoval flag so the risk it poses can be minimized?
You didn't argue this way with regard to vacuum/FSM in 8.4, which was
much more critical; we're just talking about a standby server.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs <simon@2ndQuadrant.com> writes:
> Can we keep the default=on during beta and collect evidence before
> making a final decision at release?

I'd be for doing that if it didn't have adverse effects on the use of
pg_start_backup (which is where we started this thread).  But it would
be to get more testing, not because we would consider beta results as
even possibly giving sufficient confidence to justify shipping 9.0.0
with the default = on.

I realize that you've sweated blood over a long period to get this
stuff in there, and if I were you I'd probably be wishing it could
be on by default too.  But from a project management standpoint
it's just way too risky to even consider.  HS/SR at this point have
to be seen as being about as trustworthy as the Windows port was in
8.0.0.  No DBA is going to be happy with that stuff getting turned
on unless he specifically asks for it.

Maybe in 9.1 or 9.2, but not in 9.0.
        regards, tom lane


On Thu, 2010-04-29 at 13:20 -0400, Robert Haas wrote:

> This is a pretty good argument, and Heikki's argument just upthread
> that a mismatched max_connections setting could bollix things is an
> even better one.  So I'm now changing my mind and thinking this should
> be off by default, also.

What Heikki hasn't said is that setting max_prepared_transactions
incorrectly has always caused a fatal error at failover, and still does
so now when HS is disabled. If these concerns were real ones, then that
situation would not have existed for years and would not still do so. 

9.0 without HS enabled is fairly untested on the standby, so it's not
clear to me turning it off makes you any safer.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs <simon@2ndQuadrant.com> wrote:
> 9.0 without HS enabled is fairly untested on the standby, so it's
> not clear to me turning it off makes you any safer.
I think you've just made the strongest possible case for not
defaulting it on during beta testing.
-Kevin


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2010-04-29 at 19:55 +0300, Heikki Linnakangas wrote:
>> It's safer to run
>> with recovery_connections off if you don't need the feature.
> 
> Presumably your advice is also that people should not run with Streaming
> Replication if they don't need that feature? 

Umm, yes. Why would you bother to set it up if you don't need it?

> And that we should also
> have an enable_joinremoval flag so the risk it poses can be minimized?
> You didn't argue this way with regard to vacuum/FSM in 8.4, which was
> much more critical; we're just talking about a standby server.

This is getting bizarre...

I wasn't really following the join removal discussions, but it seems
much safer and less complex. And it would not have been feasible to have
both implementations of FSM around and have the user to choose. I have
been relieved by the lack of bug reports on the new FSM, that was a big
change that impacted all installations.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Thu, 2010-04-29 at 13:01 -0500, Kevin Grittner wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>  
> > I've asked for evidence that recovery is any slower as a result of
> > HS.
>  
> Can you quantify the impact on the number of bytes written to WAL
> when switching from the archiving level to the hot standby level?

Sure, done that a few times.

Extra WAL data is written for these actions, listed in order of
increasing size

* commit records contain a variable length list of
relcacheinvalidations, mostly applies only to DDL
* one extra WAL record in most VACUUMs, fairly small, optimised away in
some cases
* a transaction issues more than 64 subtransactions it will issue a
record approx 256 bytes plus header
* one extra WAL record every checkpoint, containing a full current
snapshot's worth of running xids 100-400 bytes typically, could go up
from there to 4000 bytes in very extreme write workloads that also have
many, many subtransactions

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> What Heikki hasn't said is that setting max_prepared_transactions
> incorrectly has always caused a fatal error at failover, and still does
> so now when HS is disabled.

Hmm, good point. That should probably be documented somewhere. Now that
we WAL-log the max_prepared_xacts value, I suppose we could throw a
WARNING too.

> If these concerns were real ones, then that
> situation would not have existed for years and would not still do so. 

There's very few people using two-phase commit out there, and even less
so in a standby configuration. I wouldn't draw any conclusions from that.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

From
"Joshua D. Drake"
Date:
On Thu, 2010-04-29 at 21:27 +0300, Heikki Linnakangas wrote:


> I wasn't really following the join removal discussions, but it seems
> much safer and less complex. And it would not have been feasible to have
> both implementations of FSM around and have the user to choose. I have
> been relieved by the lack of bug reports on the new FSM, that was a big
> change that impacted all installations.

I agree that Simon's argument doesn't hold a lot of water in comparison
to the features he is citing. I say leave it on for Beta just so we can
hopefully get some inherit testing but other than that, its off for
release.

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>> 9.0 without HS enabled is fairly untested on the standby, so it's
>> not clear to me turning it off makes you any safer.
> I think you've just made the strongest possible case for not
> defaulting it on during beta testing.

Yes.  If we've had it on so far, flipping it at the start of beta seems
sensible from a test coverage standpoint.  We will certainly have a
reasonable number of beta testers who are interested in the feature and
turn it on to test it; but we need to make sure the "off" position isn't
accidentally broken.
        regards, tom lane


Simon Riggs <simon@2ndQuadrant.com> wrote:
> [most significant extra WAL logging for hot standby is:]
> * one extra WAL record every checkpoint, containing a full current
> snapshot's worth of running xids 100-400 bytes typically, could go
> up from there to 4000 bytes in very extreme write workloads that
> also have many, many subtransactions
The most convincing evidence that there's no significant performance
hit for those not needing the feature (and not a bad thing from a
testing point of view anyway) would be for someone to run some
benchmarks comparing the archive and hot_standby logging levels,
with no archive script or SR.  If that hasn't been done yet, maybe
you could find somebody who knows his way around pgbench-tools, who
could construct a reasonable test and produce graphs and all that
cool stuff.  ;-)  IMO, isolating the cost of generating and writing
the extra WAL would be very valuable, and it would be reassuring to
know that it worked with a large number of clients without exposing
any dire race conditions.
-Kevin


Okay, this thread seems to be running out of steam, and we are running
out of hours before the beta1 wrap.  I am going to take it on my own
authority to do the following:

* rename recovery_connections to hot_standby (and the underlying
variable from XLogRequestRecoveryConnections to something like
EnableHotStandby);

* change its default value to 'off'.

I believe the only other code changes needed for beta1 are the
pg_start/stop_backup error check changes that Heikki proposed at
<4BD953A6.70409@enterprisedb.com>.  If he doesn't commit that
within an hour or two, I will.

This train is leaving the station.
        regards, tom lane


On Thu, Apr 29, 2010 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This train is leaving the station.

An enthusiastic +1 for train departure.

...Robert


On Thu, 2010-04-29 at 16:18 -0400, Tom Lane wrote:

> * rename recovery_connections to hot_standby (and the underlying
> variable from XLogRequestRecoveryConnections to something like
> EnableHotStandby);

OK

> * change its default value to 'off'.

OK

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: [COMMITTERS] pgsql: MakeCheckRequiredParameterValues() depend upon correct

From
"Joshua D. Drake"
Date:
On Thu, 2010-04-29 at 13:03 -0400, Tom Lane wrote:

> There is not only a performance penalty, but a reliability penalty.
> Enabling these switches turns on a whole lot of code that, with all
> due respect to those who have worked on it, is absolutely positively
> guaranteed to be full of bugs.  Not all of which are going to be flushed
> out during beta.  If we ship 9.0 with these things on by default, it
> will result in an immediate reliability downgrade for installations that
> are simply doing what they did before and not even interested in HS/SR.
> 
> Maybe by 9.1 or 9.2 it would be sensible to have some of this code
> turned on by default.  But it is absolutely not in keeping with this
> project's mindset or historical practice to enable it by default now.
> 
>             regards, tom lane

+1

Joshua D. Drake


> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering




Re: Re: [COMMITTERS] pgsql: MakeCheckRequiredParameterValues() depend upon correct

From
"Joshua D. Drake"
Date:
On Thu, 2010-04-29 at 21:27 +0300, Heikki Linnakangas wrote:


> I wasn't really following the join removal discussions, but it seems
> much safer and less complex. And it would not have been feasible to have
> both implementations of FSM around and have the user to choose. I have
> been relieved by the lack of bug reports on the new FSM, that was a big
> change that impacted all installations.

I agree that Simon's argument doesn't hold a lot of water in comparison
to the features he is citing. I say leave it on for Beta just so we can
hopefully get some inherit testing but other than that, its off for
release.

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering




Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> >> Browsing the history of XLOG_PAGE_MAGIC, it used to be incremented by
> >> one whenever the format changes, at least as long as it has been been in
> >> xlog_internal.h. It started at 0xD05B in 2005, and was incremented by
> >> one by each commit till 0xD062. But then the hot standby patch bumped it
> >> to 0xD166, and on March 19th Simon changed it to 0x9002, and on March
> >> 28th to 0x9003.
> > 
> >> Is there a plan somewhere on all that, or was it just random whacking?
> > 
> > Random whacking.  Simon seems to have decided to try to make the number
> > mean something, but it never meant anything before.  It's not a version
> > number, it's a magic identifier.  We might need to avoid certain values
> > to prevent collisions with other file formats, so I'm suspicious of
> > trying to make it match up with some expectation.
> 
> Ok, for better or worse, I whacked it back to the old series.

Agreed.  Making it match the version would be odd because you then have
t bump it for every major release.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>> Tom Lane wrote:
>>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>>> Browsing the history of XLOG_PAGE_MAGIC, it used to be incremented by
>>>> one whenever the format changes, at least as long as it has been been in
>>>> xlog_internal.h. It started at 0xD05B in 2005, and was incremented by
>>>> one by each commit till 0xD062. But then the hot standby patch bumped it
>>>> to 0xD166, and on March 19th Simon changed it to 0x9002, and on March
>>>> 28th to 0x9003.
>>>> Is there a plan somewhere on all that, or was it just random whacking?
>>> Random whacking.  Simon seems to have decided to try to make the number
>>> mean something, but it never meant anything before.  It's not a version
>>> number, it's a magic identifier.  We might need to avoid certain values
>>> to prevent collisions with other file formats, so I'm suspicious of
>>> trying to make it match up with some expectation.
>> Ok, for better or worse, I whacked it back to the old series.
> 
> Agreed.  Making it match the version would be odd because you then have
> t bump it for every major release.

Not necessarily. It would make sense to me to just keep it the same if
there has been no changes. The magic number would indicate the first
version that used that format.

I'm not advocating a switch to that system, but it would be sensible as
well.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com