Re: Hot Standby documentation updates - Mailing list pgsql-docs

From Bruce Momjian
Subject Re: Hot Standby documentation updates
Date
Msg-id 201002190016.o1J0GUg04654@momjian.us
Whole thread Raw
In response to Hot Standby documentation updates  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-docs
Applied;  updated version attached.  I talked to Greg via IM and we
decide to not use his buffer cleanup locks text below but to use
something simpler:

         The standby waiting longer than <varname>max_standby_delay</>
         to acquire a buffer cleanup lock.

---------------------------------------------------------------------------

Greg Smith wrote:
> Attached is a patch that fixes up a couple of spots I felt could use
> improvement in the Hot Standby documentation.  Most of this is simple
> wordsmithing, or minor expansion of points I tried to explain more
> clearly.  A few of the edits I'd marked up on paper were already applied
> in the last update done today--even the same commas added or removed in
> some spots.  I think this section is closing in on complete and fully
> edited when we are all finding the same things to nitpick over.
>
> The main change I added here that could use a review is this addition:
>
> +         Waiting to acquire buffer cleanup locks.  This can occur when one
> +         database backend is trying to access to data block that is
> "pinned"
> +         by another backend that is making a change to it.  Such pins
> should
> +         normally be very short-lived, but they can take longer than normal
> +         under some circumstances, such as when a cursor executing over
> +         a set of data has stopped for some reason.
>
> That's how I interpreted what Simon told me about this subject, but I
> may not have captured the details accurately.  I didn't think that
> saying something as ominous sounding as "Waiting to acquire buffer
> cleanup locks" should be there without any description at all what
> conditions that happens under, which is what's there right now.
>
> There are only two open things in these docs I'm still a little bugged by:
>
> -The first paragraph mentions "restoring a backup to an exact state with
> great precision"; I have no idea what that's supposed to mean in this
> context.
>
> -The section about stats collection says "The stats file is deleted at
> the start of recovery, so stats from primary and standby will differ;
> this is considered a feature not a bug".  This left me going "why?", and
> some clarification of the reasoning or limitation causing that drift
> would be nice.
>
> P.S.  I touched a few other files as well, related to a debate from
> dinner the other night that touched on Simon introducing more UK
> spelling into the docs.  This is what the docs look like now:
>
> postgresql/doc/src/sgml $ grep -r behavior * | wc -l
> 386
>
> postgresql/doc/src/sgml $ grep -r behaviour * | wc -l
> 9
>
> Half of the latter were in the new material, so while on a binge fixing
> those I adjusted the mis-"behaviour"s in other sections too.
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>


>
> --
> Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-docs

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/backup.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.141
diff -c -c -r2.141 backup.sgml
*** doc/src/sgml/backup.sgml    9 Feb 2010 16:50:25 -0000    2.141
--- doc/src/sgml/backup.sgml    19 Feb 2010 00:15:12 -0000
***************
*** 126,132 ****
      execute after an SQL error is encountered. You might wish to run
      <application>psql</application> with
      the <literal>ON_ERROR_STOP</> variable set to alter that
!     behaviour and have <application>psql</application> exit with an
      exit status of 3 if an SQL error occurs:
  <programlisting>
  psql --set ON_ERROR_STOP=on dbname < infile
--- 126,132 ----
      execute after an SQL error is encountered. You might wish to run
      <application>psql</application> with
      the <literal>ON_ERROR_STOP</> variable set to alter that
!     behavior and have <application>psql</application> exit with an
      exit status of 3 if an SQL error occurs:
  <programlisting>
  psql --set ON_ERROR_STOP=on dbname < infile
Index: doc/src/sgml/ecpg.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ecpg.sgml,v
retrieving revision 1.96
diff -c -c -r1.96 ecpg.sgml
*** doc/src/sgml/ecpg.sgml    17 Feb 2010 04:19:37 -0000    1.96
--- doc/src/sgml/ecpg.sgml    19 Feb 2010 00:15:12 -0000
***************
*** 2882,2888 ****
     program and most of the additional functions of the <productname>Informix</productname> compatibility
     mode allow you to operate on those C host program types. Note however that
     the extent of the compatibility is limited. It does not try to copy <productname>Informix</productname>
!    behaviour; it allows you to do more or less the same operations and gives
     you functions that have the same name and the same basic behavior but it is
     no drop-in replacement if you are using <productname>Informix</productname> at the moment. Moreover,
     some of the data types are different. For example,
--- 2882,2888 ----
     program and most of the additional functions of the <productname>Informix</productname> compatibility
     mode allow you to operate on those C host program types. Note however that
     the extent of the compatibility is limited. It does not try to copy <productname>Informix</productname>
!    behavior; it allows you to do more or less the same operations and gives
     you functions that have the same name and the same basic behavior but it is
     no drop-in replacement if you are using <productname>Informix</productname> at the moment. Moreover,
     some of the data types are different. For example,
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.504
diff -c -c -r1.504 func.sgml
*** doc/src/sgml/func.sgml    16 Feb 2010 22:34:42 -0000    1.504
--- doc/src/sgml/func.sgml    19 Feb 2010 00:15:12 -0000
***************
*** 13801,13808 ****
        function, <function>suppress_redundant_updates_trigger</>,
        which will prevent any update
        that does not actually change the data in the row from taking place, in
!       contrast to the normal behaviour which always performs the update
!       regardless of whether or not the data has changed. (This normal behaviour
        makes updates run faster, since no checking is required, and is also
        useful in certain cases.)
      </para>
--- 13801,13808 ----
        function, <function>suppress_redundant_updates_trigger</>,
        which will prevent any update
        that does not actually change the data in the row from taking place, in
!       contrast to the normal behavior which always performs the update
!       regardless of whether or not the data has changed. (This normal behavior
        makes updates run faster, since no checking is required, and is also
        useful in certain cases.)
      </para>
Index: doc/src/sgml/high-availability.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v
retrieving revision 1.46
diff -c -c -r1.46 high-availability.sgml
*** doc/src/sgml/high-availability.sgml    18 Feb 2010 04:14:38 -0000    1.46
--- doc/src/sgml/high-availability.sgml    19 Feb 2010 00:15:12 -0000
***************
*** 1056,1063 ****
      is useful for both log shipping replication and for restoring a backup
      to an exact state with great precision.
      The term Hot Standby also refers to the ability of the server to move
!     from recovery through to normal running while users continue running
!     queries and/or continue their connections.
     </para>

     <para>
--- 1056,1063 ----
      is useful for both log shipping replication and for restoring a backup
      to an exact state with great precision.
      The term Hot Standby also refers to the ability of the server to move
!     from recovery through to normal operation while users continue running
!     queries and/or keep their connections open.
     </para>

     <para>
***************
*** 1082,1088 ****
      return differing results. Eventually, the standby will be
      consistent with the primary.
      Queries executed on the standby will be correct with regard to the transactions
!     that had been recovered at the start of the query, or start of first statement,
      in the case of serializable transactions. In comparison with the primary,
      the standby returns query results that could have been obtained on the primary
      at some moment in the past.
--- 1082,1088 ----
      return differing results. Eventually, the standby will be
      consistent with the primary.
      Queries executed on the standby will be correct with regard to the transactions
!     that had been recovered at the start of the query, or start of first statement
      in the case of serializable transactions. In comparison with the primary,
      the standby returns query results that could have been obtained on the primary
      at some moment in the past.
***************
*** 1103,1110 ****
     </para>

     <para>
!     "Read-only" above means no writes to the permanent database tables.
!     There are no problems with queries that use transient sort and
      work files.
     </para>

--- 1103,1110 ----
     </para>

     <para>
!     "Read-only" above means no writes to the permanent or temporary database
!     tables.  There are no problems with queries that use transient sort and
      work files.
     </para>

***************
*** 1203,1214 ****
       </listitem>
       <listitem>
        <para>
-        <command>LOCK TABLE</>, in short default form, since it requests <literal>ACCESS EXCLUSIVE MODE</>.
         <command>LOCK TABLE</> that explicitly requests a mode higher than <literal>ROW EXCLUSIVE MODE</>.
        </para>
       </listitem>
       <listitem>
        <para>
         Transaction management commands that explicitly set non-read-only state:
          <itemizedlist>
           <listitem>
--- 1203,1218 ----
       </listitem>
       <listitem>
        <para>
         <command>LOCK TABLE</> that explicitly requests a mode higher than <literal>ROW EXCLUSIVE MODE</>.
        </para>
       </listitem>
       <listitem>
        <para>
+        <command>LOCK TABLE</> in short default form, since it requests <literal>ACCESS EXCLUSIVE MODE</>.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
         Transaction management commands that explicitly set non-read-only state:
          <itemizedlist>
           <listitem>
***************
*** 1241,1247 ****
       </listitem>
       <listitem>
        <para>
!        sequence update - nextval()
        </para>
       </listitem>
       <listitem>
--- 1245,1251 ----
       </listitem>
       <listitem>
        <para>
!        Sequence update - <function>nextval()</>
        </para>
       </listitem>
       <listitem>
***************
*** 1253,1261 ****
     </para>

     <para>
!     Note that the current behaviour of read only transactions when not in
      recovery is to allow the last two actions, so there are small and
!     subtle differences in behaviour between read-only transactions
      run on a standby and run during normal operation.
      It is possible that <command>LISTEN</>, <command>UNLISTEN</>,
      <command>NOTIFY</>, and temporary tables might be allowed in a
--- 1257,1265 ----
     </para>

     <para>
!     Note that the current behavior of read only transactions when not in
      recovery is to allow the last two actions, so there are small and
!     subtle differences in behavior between read-only transactions
      run on a standby and run during normal operation.
      It is possible that <command>LISTEN</>, <command>UNLISTEN</>,
      <command>NOTIFY</>, and temporary tables might be allowed in a
***************
*** 1275,1281 ****
      issuing <command>SHOW transaction_read_only</>.  In addition, a set of
      functions (<xref linkend="functions-recovery-info-table">) allow users to
      access information about the standby server. These allow you to write
!     functions that are aware of the current state of the database. These
      can be used to monitor the progress of recovery, or to allow you to
      write complex programs that restore the database to particular states.
     </para>
--- 1279,1285 ----
      issuing <command>SHOW transaction_read_only</>.  In addition, a set of
      functions (<xref linkend="functions-recovery-info-table">) allow users to
      access information about the standby server. These allow you to write
!     programs that are aware of the current state of the database. These
      can be used to monitor the progress of recovery, or to allow you to
      write complex programs that restore the database to particular states.
     </para>
***************
*** 1338,1344 ****
         </listitem>
         <listitem>
          <para>
!          Waiting to acquire buffer cleanup locks
          </para>
         </listitem>
         <listitem>
--- 1342,1349 ----
         </listitem>
         <listitem>
          <para>
!          The standby waiting longer than <varname>max_standby_delay</>
!          to acquire a buffer cleanup lock.
          </para>
         </listitem>
         <listitem>
***************
*** 1350,1376 ****
     </para>

     <para>
!     Some WAL redo actions will be for <acronym>DDL</> actions. These DDL actions are
!     repeating actions that have already committed on the primary node, so
!     they must not fail on the standby node. These DDL locks take priority
!     and will automatically *cancel* any read-only transactions that get in
!     their way, after a grace period. This is similar to the possibility of
!     being canceled by the deadlock detector, but in this case the standby
!     process always wins, since the replayed actions must not fail. This
!     also ensures that replication does not fall behind while waiting for a
!     query to complete. Again, the assumption is that the standby is
!     primarily for high availability.
     </para>

     <para>
!     An example of the above would be an Administrator on Primary server
      running <command>DROP TABLE</> on a table that is currently being queried
      on the standby server.
      Clearly the query cannot continue if <command>DROP TABLE</>
      proceeds. If this situation occurred on the primary, the <command>DROP TABLE</>
      would wait until the query had finished. When <command>DROP TABLE</> is
      run on the primary, the primary doesn't have
!     information about which queries are running on the standby and so
      cannot wait for any of the standby queries. The WAL change records come through to the
      standby while the standby query is still running, causing a conflict.
     </para>
--- 1355,1382 ----
     </para>

     <para>
!     Some WAL redo actions will be for <acronym>DDL</> execution. These DDL
!     actions are replaying changes that have already committed on the primary
!     node, so they must not fail on the standby node. These DDL locks take
!     priority and will automatically *cancel* any read-only transactions that
!     get in their way, after a grace period. This is similar to the possibility
!     of being canceled by the deadlock detector.  But in this case, the standby
!     recovery process always wins, since the replayed actions must not fail.
!     This also ensures that replication does not fall behind while waiting for a
!     query to complete. This prioritization presumes that the standby exists
!     primarily for high availability, and that adjusting the grace period
!     will allow a sufficient guard against unexpected cancellation.
     </para>

     <para>
!     An example of the above would be an administrator on the primary server
      running <command>DROP TABLE</> on a table that is currently being queried
      on the standby server.
      Clearly the query cannot continue if <command>DROP TABLE</>
      proceeds. If this situation occurred on the primary, the <command>DROP TABLE</>
      would wait until the query had finished. When <command>DROP TABLE</> is
      run on the primary, the primary doesn't have
!     information about which queries are running on the standby, so it
      cannot wait for any of the standby queries. The WAL change records come through to the
      standby while the standby query is still running, causing a conflict.
     </para>
***************
*** 1407,1414 ****
          <para>
           If the conflict is caused by a lock, the conflicting standby
           transaction is cancelled immediately. If the transaction is
!          idle-in-transaction then the session is aborted
!          instead, though this might change in the future.
          </para>
         </listitem>

--- 1413,1420 ----
          <para>
           If the conflict is caused by a lock, the conflicting standby
           transaction is cancelled immediately. If the transaction is
!          idle-in-transaction, then the session is aborted instead.
!          This behavior might change in the future.
          </para>
         </listitem>

***************
*** 1456,1467 ****
      for as long as needed to run queries on the standby. This guarantees that
      a WAL cleanup record is never generated and query conflicts do not occur,
      as described above. This could be done using <filename>contrib/dblink</>
!     and <function>pg_sleep()</>, or via other mechanisms. If you do this, you should note
!     that this will delay cleanup of dead rows by vacuum or HOT and
!     people might find this undesirable. However, remember that the
!     primary and standby nodes are linked via the WAL, so this situation is no
!     different from the case where the query ran on the primary node itself
!     except for the benefit of off-loading the execution onto the standby.
     </para>

     <para>
--- 1462,1474 ----
      for as long as needed to run queries on the standby. This guarantees that
      a WAL cleanup record is never generated and query conflicts do not occur,
      as described above. This could be done using <filename>contrib/dblink</>
!     and <function>pg_sleep()</>, or via other mechanisms. If you do this, you
!     should note that this will delay cleanup of dead rows on the primary by
!     vacuum or HOT, and people might find this undesirable. However, remember
!     that the primary and standby nodes are linked via the WAL, so the cleanup
!     situation is no different from the case where the query ran on the primary
!     node itself.  And you are still getting the benefit of off-loading the
!     execution onto the standby.
     </para>

     <para>
***************
*** 1494,1501 ****
      be disabled via <filename>postgresql.conf</>. The server might take
      some time to enable recovery connections since the server must first complete
      sufficient recovery to provide a consistent state against which queries
!     can run before enabling read only connections. Look for these messages
!     in the server logs:

  <programlisting>
  LOG:  entering standby mode
--- 1501,1510 ----
      be disabled via <filename>postgresql.conf</>. The server might take
      some time to enable recovery connections since the server must first complete
      sufficient recovery to provide a consistent state against which queries
!     can run before enabling read only connections.  During this period,
!     clients that attempt to connect will be refused with an error message.
!     To confirm the server has come up, either loop retrying to connect from
!     the application, or look for these messages in the server logs:

  <programlisting>
  LOG:  entering standby mode
***************
*** 1617,1625 ****

     <para>
      As a result, you cannot create additional indexes that exist solely
!     on the standby, nor can statistics exist solely on the standby.
!     If these administration commands are needed they should be executed
!     on the primary so that the changes will propagate to the
      standby.
     </para>

--- 1626,1634 ----

     <para>
      As a result, you cannot create additional indexes that exist solely
!     on the standby, nor statistics that exist solely on the standby.
!     If these administration commands are needed, they should be executed
!     on the primary, and eventually those changes will propagate to the
      standby.
     </para>

***************
*** 1646,1657 ****

     <para>
      The <productname>Nagios</> plugin <productname>check_pgsql</> will
!     work, but it is very simple.
!     <productname>check_postgres</> will also work, though some actions
!     could give different or confusing results.
      For example, last vacuum time will not be maintained, since no
!     vacuum occurs on the standby (though vacuums running on the primary do
!     send their changes to the standby).
     </para>

     <para>
--- 1655,1666 ----

     <para>
      The <productname>Nagios</> plugin <productname>check_pgsql</> will
!     work, because the simple information it checks for exists.
!     The <productname>check_postgres</> monitoring script will also work,
!     though some reported values could give different or confusing results.
      For example, last vacuum time will not be maintained, since no
!     vacuum occurs on the standby.  Vacuums running on the primary
!     do still send their changes to the standby.
     </para>

     <para>
***************
*** 1715,1721 ****
      In normal (non-recovery) mode, if you issue <command>DROP USER</> or <command>DROP ROLE</>
      for a role with login capability while that user is still connected then
      nothing happens to the connected user - they remain connected. The user cannot
!     reconnect however. This behaviour applies in recovery also, so a
      <command>DROP USER</> on the primary does not disconnect that user on the standby.
     </para>

--- 1724,1730 ----
      In normal (non-recovery) mode, if you issue <command>DROP USER</> or <command>DROP ROLE</>
      for a role with login capability while that user is still connected then
      nothing happens to the connected user - they remain connected. The user cannot
!     reconnect however. This behavior applies in recovery also, so a
      <command>DROP USER</> on the primary does not disconnect that user on the standby.
     </para>

***************
*** 1729,1743 ****
     </para>

     <para>
!     Autovacuum is not active during recovery, though it will start normally
!     at the end of recovery.
     </para>

     <para>
      The background writer is active during recovery and will perform
      restartpoints (similar to checkpoints on the primary) and normal block
!     cleaning activities. (Remember, hint bits will cause blocks to
!     be modified on the standby server.)
      The <command>CHECKPOINT</> command is accepted during recovery,
      though it performs a restartpoint rather than a new checkpoint.
     </para>
--- 1738,1752 ----
     </para>

     <para>
!     Autovacuum is not active during recovery, it will start normally at the
!     end of recovery.
     </para>

     <para>
      The background writer is active during recovery and will perform
      restartpoints (similar to checkpoints on the primary) and normal block
!     cleaning activities. This can include updates of the hint bit
!     information stored on the standby server.
      The <command>CHECKPOINT</> command is accepted during recovery,
      though it performs a restartpoint rather than a new checkpoint.
     </para>
***************
*** 1792,1802 ****
      <para>
       Valid starting points for recovery connections are generated at each
       checkpoint on the master. If the standby is shut down while the master
!      is in a shutdown state it might not be possible to re-enter Hot Standby
!      until the primary is started up so that it generates further starting
!      points in the WAL logs. This is not considered a serious issue
!      because the standby is usually switched to act as primary when
!      the first node is taken down.
      </para>
     </listitem>
     <listitem>
--- 1801,1815 ----
      <para>
       Valid starting points for recovery connections are generated at each
       checkpoint on the master. If the standby is shut down while the master
!      is in a shutdown state, it might not be possible to re-enter Hot Standby
!      until the primary is started up, so that it generates further starting
!      points in the WAL logs.  This situation isn't a problem in the most
!      common situations where it might happen. Generally, if the primary is
!      shut down and not available anymore, that's likely due to a serious
!      failure that requires the standby being converted to operate as
!      the new primary anyway.  And in situations where the primary is
!      being intentionally taken down, coordinating to make sure the standby
!      becomes the new primary smoothly is also standard procedure.
      </para>
     </listitem>
     <listitem>
Index: doc/src/sgml/installation.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v
retrieving revision 1.341
diff -c -c -r1.341 installation.sgml
*** doc/src/sgml/installation.sgml    3 Feb 2010 17:25:05 -0000    1.341
--- doc/src/sgml/installation.sgml    19 Feb 2010 00:15:12 -0000
***************
*** 2237,2243 ****
       When implementing PostgreSQL version 8.1 on AIX 5.3, we
       periodically ran into problems where the statistics collector
       would <quote>mysteriously</quote> not come up successfully.  This
!      appears to be the result of unexpected behaviour in the IPv6
       implementation.  It looks like PostgreSQL and IPv6 do not play
       very well together at this time on AIX.
      </para>
--- 2237,2243 ----
       When implementing PostgreSQL version 8.1 on AIX 5.3, we
       periodically ran into problems where the statistics collector
       would <quote>mysteriously</quote> not come up successfully.  This
!      appears to be the result of unexpected behavior in the IPv6
       implementation.  It looks like PostgreSQL and IPv6 do not play
       very well together at this time on AIX.
      </para>
Index: doc/src/sgml/unaccent.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/unaccent.sgml,v
retrieving revision 1.2
diff -c -c -r1.2 unaccent.sgml
*** doc/src/sgml/unaccent.sgml    20 Aug 2009 12:12:37 -0000    1.2
--- doc/src/sgml/unaccent.sgml    19 Feb 2010 00:15:12 -0000
***************
*** 9,15 ****
    <filename>unaccent</> removes accents (diacritic signs) from a lexeme.
    It's a filtering dictionary, that means its output is
    always passed to the next dictionary (if any), contrary to the standard
!   behaviour. Currently, it supports most important accents from european
    languages.
   </para>

--- 9,15 ----
    <filename>unaccent</> removes accents (diacritic signs) from a lexeme.
    It's a filtering dictionary, that means its output is
    always passed to the next dictionary (if any), contrary to the standard
!   behavior. Currently, it supports most important accents from european
    languages.
   </para>


pgsql-docs by date:

Previous
From: Greg Smith
Date:
Subject: Hot Standby documentation updates
Next
From: Heikki Linnakangas
Date:
Subject: Re: [HACKERS] Streaming Replication docs