*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 1492,2879 **** archive_command = 'local_backup_script.sh'
-
- Warm Standby Servers for High Availability
-
-
- warm standby
-
-
-
- PITR standby
-
-
-
- standby server
-
-
-
- log shipping
-
-
-
- witness server
-
-
-
- STONITH
-
-
-
- high availability
-
-
-
- Continuous archiving can be used to create a high
- availability> (HA) cluster configuration with one or more
- standby servers> ready to take over operations if the
- primary server fails. This capability is widely referred to as
- warm standby> or log shipping>.
-
-
-
- The primary and standby server work together to provide this capability,
- though the servers are only loosely coupled. The primary server operates
- in continuous archiving mode, while each standby server operates in
- continuous recovery mode, reading the WAL files from the primary. No
- changes to the database tables are required to enable this capability,
- so it offers low administration overhead compared to some other
- replication approaches. This configuration also has relatively low
- performance impact on the primary server.
-
-
-
- Directly moving WAL records from one database server to another
- is typically described as log shipping. PostgreSQL>
- implements file-based log shipping, which means that WAL records are
- transferred one file (WAL segment) at a time. WAL files (16MB) can be
- shipped easily and cheaply over any distance, whether it be to an
- adjacent system, another system at the same site, or another system on
- the far side of the globe. The bandwidth required for this technique
- varies according to the transaction rate of the primary server.
- Record-based log shipping is also possible with custom-developed
- procedures, as discussed in .
-
-
-
- It should be noted that the log shipping is asynchronous, i.e., the WAL
- records are shipped after transaction commit. As a result there is a
- window for data loss should the primary server suffer a catastrophic
- failure: transactions not yet shipped will be lost. The length of the
- window of data loss can be limited by use of the
- archive_timeout parameter, which can be set as low
- as a few seconds if required. However such a low setting will
- substantially increase the bandwidth required for file shipping.
- If you need a window of less than a minute or so, it's probably better
- to consider record-based log shipping.
-
-
-
- The standby server is not available for access, since it is continually
- performing recovery processing. Recovery performance is sufficiently
- good that the standby will typically be only moments away from full
- availability once it has been activated. As a result, we refer to this
- capability as a warm standby configuration that offers high
- availability. Restoring a server from an archived base backup and
- rollforward will take considerably longer, so that technique only
- offers a solution for disaster recovery, not high availability.
-
-
-
- Planning
-
-
- It is usually wise to create the primary and standby servers
- so that they are as similar as possible, at least from the
- perspective of the database server. In particular, the path names
- associated with tablespaces will be passed across unmodified, so both
- primary and standby servers must have the same mount paths for
- tablespaces if that feature is used. Keep in mind that if
-
- is executed on the primary, any new mount point needed for it must
- be created on the primary and all standby servers before the command
- is executed. Hardware need not be exactly the same, but experience shows
- that maintaining two identical systems is easier than maintaining two
- dissimilar ones over the lifetime of the application and system.
- In any case the hardware architecture must be the same — shipping
- from, say, a 32-bit to a 64-bit system will not work.
-
-
-
- In general, log shipping between servers running different major
- PostgreSQL> release
- levels is not possible. It is the policy of the PostgreSQL Global
- Development Group not to make changes to disk formats during minor release
- upgrades, so it is likely that running different minor release levels
- on primary and standby servers will work successfully. However, no
- formal support for that is offered and you are advised to keep primary
- and standby servers at the same release level as much as possible.
- When updating to a new minor release, the safest policy is to update
- the standby servers first — a new minor release is more likely
- to be able to read WAL files from a previous minor release than vice
- versa.
-
-
-
- There is no special mode required to enable a standby server. The
- operations that occur on both primary and standby servers are
- normal continuous archiving and recovery tasks. The only point of
- contact between the two database servers is the archive of WAL files
- that both share: primary writing to the archive, standby reading from
- the archive. Care must be taken to ensure that WAL archives from separate
- primary servers do not become mixed together or confused. The archive
- need not be large if it is only required for standby operation.
-
-
-
- The magic that makes the two loosely coupled servers work together is
- simply a restore_command> used on the standby that,
- when asked for the next WAL file, waits for it to become available from
- the primary. The restore_command> is specified in the
- recovery.conf> file on the standby server. Normal recovery
- processing would request a file from the WAL archive, reporting failure
- if the file was unavailable. For standby processing it is normal for
- the next WAL file to be unavailable, so we must be patient and wait for
- it to appear. For files ending in .backup> or
- .history> there is no need to wait, and a non-zero return
- code must be returned. A waiting restore_command> can be
- written as a custom script that loops after polling for the existence of
- the next WAL file. There must also be some way to trigger failover, which
- should interrupt the restore_command>, break the loop and
- return a file-not-found error to the standby server. This ends recovery
- and the standby will then come up as a normal server.
-
-
-
- Pseudocode for a suitable restore_command> is:
-
- triggered = false;
- while (!NextWALFileReady() && !triggered)
- {
- sleep(100000L); /* wait for ~0.1 sec */
- if (CheckForExternalTrigger())
- triggered = true;
- }
- if (!triggered)
- CopyWALFileForRecovery();
-
-
-
-
- A working example of a waiting restore_command> is provided
- as a contrib> module named pg_standby>. It
- should be used as a reference on how to correctly implement the logic
- described above. It can also be extended as needed to support specific
- configurations and environments.
-
-
-
- PostgreSQL does not provide the system
- software required to identify a failure on the primary and notify
- the standby database server. Many such tools exist and are well
- integrated with the operating system facilities required for
- successful failover, such as IP address migration.
-
-
-
- The method for triggering failover is an important part of planning
- and design. One potential option is the restore_command>
- command. It is executed once for each WAL file, but the process
- running the restore_command> is created and dies for
- each file, so there is no daemon or server process, and we cannot
- use signals or a signal handler. Therefore, the
- restore_command> is not suitable to trigger failover.
- It is possible to use a simple timeout facility, especially if
- used in conjunction with a known archive_timeout>
- setting on the primary. However, this is somewhat error prone
- since a network problem or busy primary server might be sufficient
- to initiate failover. A notification mechanism such as the explicit
- creation of a trigger file is ideal, if this can be arranged.
-
-
-
- The size of the WAL archive can be minimized by using the %r>
- option of the restore_command>. This option specifies the
- last archive file name that needs to be kept to allow the recovery to
- restart correctly. This can be used to truncate the archive once
- files are no longer required, assuming the archive is writable from the
- standby server.
-
-
-
-
- Implementation
-
-
- The short procedure for configuring a standby server is as follows. For
- full details of each step, refer to previous sections as noted.
-
-
-
- Set up primary and standby systems as nearly identical as
- possible, including two identical copies of
- PostgreSQL> at the same release level.
-
-
-
-
- Set up continuous archiving from the primary to a WAL archive
- directory on the standby server. Ensure that
- ,
- and
-
- are set appropriately on the primary
- (see ).
-
-
-
-
- Make a base backup of the primary server (see ), and load this data onto the standby.
-
-
-
-
- Begin recovery on the standby server from the local WAL
- archive, using a recovery.conf> that specifies a
- restore_command> that waits as described
- previously (see ).
-
-
-
-
-
-
- Recovery treats the WAL archive as read-only, so once a WAL file has
- been copied to the standby system it can be copied to tape at the same
- time as it is being read by the standby database server.
- Thus, running a standby server for high availability can be performed at
- the same time as files are stored for longer term disaster recovery
- purposes.
-
-
-
- For testing purposes, it is possible to run both primary and standby
- servers on the same system. This does not provide any worthwhile
- improvement in server robustness, nor would it be described as HA.
-
-
-
-
- Failover
-
-
- If the primary server fails then the standby server should begin
- failover procedures.
-
-
-
- If the standby server fails then no failover need take place. If the
- standby server can be restarted, even some time later, then the recovery
- process can also be immediately restarted, taking advantage of
- restartable recovery. If the standby server cannot be restarted, then a
- full new standby server instance should be created.
-
-
-
- If the primary server fails and the standby server becomes the
- new primary, and then the old primary restarts, you must have
- a mechanism for informing old primary that it is no longer the primary. This is
- sometimes known as STONITH (Shoot The Other Node In The Head), which is
- necessary to avoid situations where both systems think they are the
- primary, which will lead to confusion and ultimately data loss.
-
-
-
- Many failover systems use just two systems, the primary and the standby,
- connected by some kind of heartbeat mechanism to continually verify the
- connectivity between the two and the viability of the primary. It is
- also possible to use a third system (called a witness server) to prevent
- some cases of inappropriate failover, but the additional complexity
- might not be worthwhile unless it is set up with sufficient care and
- rigorous testing.
-
-
-
- Once failover to the standby occurs, we have only a
- single server in operation. This is known as a degenerate state.
- The former standby is now the primary, but the former primary is down
- and might stay down. To return to normal operation we must
- fully recreate a standby server,
- either on the former primary system when it comes up, or on a third,
- possibly new, system. Once complete the primary and standby can be
- considered to have switched roles. Some people choose to use a third
- server to provide backup for the new primary until the new standby
- server is recreated,
- though clearly this complicates the system configuration and
- operational processes.
-
-
-
- So, switching from primary to standby server can be fast but requires
- some time to re-prepare the failover cluster. Regular switching from
- primary to standby is useful, since it allows regular downtime on
- each system for maintenance. This also serves as a test of the
- failover mechanism to ensure that it will really work when you need it.
- Written administration procedures are advised.
-
-
-
-
- Record-based Log Shipping
-
-
- PostgreSQL directly supports file-based
- log shipping as described above. It is also possible to implement
- record-based log shipping, though this requires custom development.
-
-
-
- An external program can call the pg_xlogfile_name_offset()>
- function (see )
- to find out the file name and the exact byte offset within it of
- the current end of WAL. It can then access the WAL file directly
- and copy the data from the last known end of WAL through the current end
- over to the standby servers. With this approach, the window for data
- loss is the polling cycle time of the copying program, which can be very
- small, and there is no wasted bandwidth from forcing partially-used
- segment files to be archived. Note that the standby servers'
- restore_command> scripts can only deal with whole WAL files,
- so the incrementally copied data is not ordinarily made available to
- the standby servers. It is of use only when the primary dies —
- then the last partial WAL file is fed to the standby before allowing
- it to come up. The correct implementation of this process requires
- cooperation of the restore_command> script with the data
- copying program.
-
-
-
- Starting with PostgreSQL> version 8.5, you can use
- streaming replication (see ) to
- achieve the same with less effort.
-
-
-
-
- Streaming Replication
-
-
- PostgreSQL> includes a simple streaming replication
- mechanism, which lets the standby server to stay more up-to-date than
- file-based replication allows. The standby connects to the primary
- and the primary starts streaming WAL records from where the standby
- left off, and continues streaming them as they are generated, without
- waiting for the WAL file to be filled. So with streaming replication,
- archive_timeout> does not need to be configured.
-
-
-
- Streaming replication relies on file-based continuous archiving for
- making the base backup and for allowing a standby to catch up if it's
- disconnected from the primary for long enough for the primary to
- delete old WAL files still required by the standby.
-
-
-
- Setup
-
- The short procedure for configuring streaming replication is as follows.
- For full details of each step, refer to other sections as noted.
-
-
-
- Set up primary and standby systems as near identically as possible,
- including two identical copies of PostgreSQL> at the
- same release level.
-
-
-
-
- Set up continuous archiving from the primary to a WAL archive located
- in a directory on the standby server. Ensure that
- ,
- and
-
- are set appropriately on the primary
- (see ).
-
-
-
-
-
- Set up connections and authentication so that the standby server can
- successfully connect to the pseudo replication> database of
- the primary server (see
- ). Ensure that
- and pg_hba.conf> are
- configured appropriately on the primary.
-
-
- On systems that support the keepalive socket option, setting
- ,
- and
- helps you to find the
- troubles with replication (e.g., the network outage or the failure of
- the standby server) as soon as possible.
-
-
-
-
- Set the maximum number of concurrent connections from the standby servers
- (see for details).
-
-
-
-
- Enable WAL archiving in the primary server because we need to make a base
- backup of it later (see and
- for details).
-
-
-
-
- Start the PostgreSQL> server on the primary.
-
-
-
-
- Make a base backup of the primary server (see
- ), and load this data onto the
- standby. Note that all files present in pg_xlog>
- and pg_xlog/archive_status> on the standby>
- server should be removed because they might be obsolete.
-
-
-
-
- Set up WAL archiving, connections and authentication like the primary
- server, because the standby server might work as a primary server after
- failover. Ensure that your settings are consistent with the
- future> environment after the primary and the standby
- server are interchanged by failover. If you're setting up the standby
- server for e.g reporting purposes, with no plans to fail over to it,
- configure the standby accordingly.
-
-
-
-
- Create a recovery command file recovery.conf> in the data
- directory on the standby server.
-
-
-
-
- standby_mode (boolean)
-
-
- Specifies whether to start the PostgreSQL> server as
- a standby. If this parameter is on>, the streaming
- replication is enabled and the standby server will try to connect
- to the primary to receive and apply WAL records continuously. The
- default is off>, which allows only an archive recovery
- without replication. So, streaming replication requires this
- parameter to be explicitly set to on>.
-
-
-
-
- primary_conninfo (string)
-
-
- Specifies a connection string which is used for the standby server
- to connect with the primary. This string is in the same format as
- described in . If any option is
- unspecified in this string, then the corresponding environment
- variable (see ) is checked. If the
- environment variable is not set either, then the indicated built-in
- defaults are used.
-
-
- The built-in replication requires that a host name (or host address)
- or port number which the primary server listens on should be
- specified in this string, respectively. Also ensure that a role with
- the SUPERUSER> and LOGIN> privileges on the
- primary is set (see
- ). Note that
- the password needs to be set if the primary demands password
- authentication.
-
-
-
-
- trigger_file (string)
-
-
- Specifies a trigger file whose presence activates the standby.
- If no trigger file is specified, the standby never exits
- recovery.
-
-
-
-
-
-
-
- Start the PostgreSQL> server on the standby. The standby
- server will go into recovery mode and proceeds to receive WAL records
- from the primary and apply them continuously.
-
-
-
-
-
-
- Authentication
-
- It's very important that the access privilege for replication are set
- properly so that only trusted users can read the WAL stream, because it's
- easy to extract serious information from it.
-
-
- Only superuser is allowed to connect to the primary as the replication
- standby. So a role with the SUPERUSER> and LOGIN>
- privileges needs to be created in the primary.
-
-
- Client authentication for replication is controlled by the
- pg_hba.conf> record specifying replication> in the
- database> field. For example, if the standby is running on
- host IP 192.168.1.100> and the superuser's name for replication
- is foo>, the administrator can add the following line to the
- pg_hba.conf> file on the primary.
-
-
- # Allow the user "foo" from host 192.168.1.100 to connect to the primary
- # as a replication standby if the user's password is correctly supplied.
- #
- # TYPE DATABASE USER CIDR-ADDRESS METHOD
- host replication foo 192.168.1.100/32 md5
-
-
-
- The host name and port number of the primary, user name to connect as,
- and password are specified in the recovery.conf> file or
- the corresponding environment variable on the standby.
- For example, if the primary is running on host IP 192.168.1.50>,
- port 5432, the superuser's name for replication is
- foo>, and the password is foopass>, the administrator
- can add the following line to the recovery.conf> file on the
- standby.
-
-
- # The standby connects to the primary that is running on host 192.168.1.50
- # and port 5432 as the user "foo" whose password is "foopass".
- primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
-
-
-
-
-
-
- Incrementally Updated Backups
-
-
- incrementally updated backups
-
-
-
- change accumulation
-
-
-
- In a warm standby configuration, it is possible to offload the expense of
- taking periodic base backups from the primary server; instead base backups
- can be made by backing
- up a standby server's files. This concept is generally known as
- incrementally updated backups, log change accumulation, or more simply,
- change accumulation.
-
-
-
- If we take a file system backup of the standby server's data
- directory while it is processing
- logs shipped from the primary, we will be able to reload that backup and
- restart the standby's recovery process from the last restart point.
- We no longer need to keep WAL files from before the standby's restart point.
- If we need to recover, it will be faster to recover from the incrementally
- updated backup than from the original base backup.
-
-
-
- Since the standby server is not live>, it is not possible to
- use pg_start_backup()> and pg_stop_backup()>
- to manage the backup process; it will be up to you to determine how
- far back you need to keep WAL segment files to have a recoverable
- backup. You can do this by running pg_controldata>
- on the standby server to inspect the control file and determine the
- current checkpoint WAL location, or by using the
- log_checkpoints> option to print values to the standby's
- server log.
-
-
-
-
-
- Hot Standby
-
-
- Hot Standby
-
-
-
- Hot Standby is the term used to describe the ability to connect to
- the server and run queries while the server is in archive recovery. This
- is useful for both log shipping replication and for restoring a backup
- to an exact state with great precision.
- The term Hot Standby also refers to the ability of the server to move
- from recovery through to normal running while users continue running
- queries and/or continue their connections.
-
-
-
- Running queries in recovery is in many ways the same as normal running
- though there are a large number of usage and administrative points
- to note.
-
-
-
- User's Overview
-
-
- Users can connect to the database while the server is in recovery
- and perform read-only queries. Read-only access to catalogs and views
- will also occur as normal.
-
-
-
- The data on the standby takes some time to arrive from the primary server
- so there will be a measurable delay between primary and standby. Running the
- same query nearly simultaneously on both primary and standby might therefore
- return differing results. We say that data on the standby is eventually
- consistent with the primary.
- Queries executed on the standby will be correct with regard to the transactions
- that had been recovered at the start of the query, or start of first statement,
- in the case of serializable transactions. In comparison with the primary,
- the standby returns query results that could have been obtained on the primary
- at some exact moment in the past.
-
-
-
- When a transaction is started in recovery, the parameter
- transaction_read_only> will be forced to be true, regardless of the
- default_transaction_read_only> setting in postgresql.conf>.
- It can't be manually set to false either. As a result, all transactions
- started during recovery will be limited to read-only actions only. In all
- other ways, connected sessions will appear identical to sessions
- initiated during normal processing mode. There are no special commands
- required to initiate a connection at this time, so all interfaces
- work normally without change. After recovery finishes, the session
- will allow normal read-write transactions at the start of the next
- transaction, if these are requested.
-
-
-
- Read-only here means "no writes to the permanent database tables".
- There are no problems with queries that make use of transient sort and
- work files.
-
-
-
- The following actions are allowed
-
-
-
-
- Query access - SELECT, COPY TO including views and SELECT RULEs
-
-
-
-
- Cursor commands - DECLARE, FETCH, CLOSE,
-
-
-
-
- Parameters - SHOW, SET, RESET
-
-
-
-
- Transaction management commands
-
-
-
- BEGIN, END, ABORT, START TRANSACTION
-
-
-
-
- SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
-
-
-
-
- EXCEPTION blocks and other internal subtransactions
-
-
-
-
-
-
-
- LOCK TABLE, though only when explicitly in one of these modes:
- ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
-
-
-
-
- Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
-
-
-
-
- Plugins and extensions - LOAD
-
-
-
-
-
-
- These actions produce error messages
-
-
-
-
- Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
- Note that there are no allowed actions that result in a trigger
- being executed during recovery.
-
-
-
-
- Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
- This also applies to temporary tables currently because currently their
- definition causes writes to catalog tables.
-
-
-
-
- SELECT ... FOR SHARE | UPDATE which cause row locks to be written
-
-
-
-
- RULEs on SELECT statements that generate DML commands.
-
-
-
-
- LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
- LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
-
-
-
-
- Transaction management commands that explicitly set non-read only state
-
-
-
- BEGIN READ WRITE,
- START TRANSACTION READ WRITE
-
-
-
-
- SET TRANSACTION READ WRITE,
- SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
-
-
-
-
- SET transaction_read_only = off
-
-
-
-
-
-
-
- Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
- ROLLBACK PREPARED because even read-only transactions need to write
- WAL in the prepare phase (the first phase of two phase commit).
-
-
-
-
- sequence update - nextval()
-
-
-
-
- LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
-
-
-
-
-
-
- Note that current behaviour of read only transactions when not in
- recovery is to allow the last two actions, so there are small and
- subtle differences in behaviour between read-only transactions
- run on standby and during normal running.
- It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
- temporary tables may be lifted in a future release, if their internal
- implementation is altered to make this possible.
-
-
-
- If failover or switchover occurs the database will switch to normal
- processing mode. Sessions will remain connected while the server
- changes mode. Current transactions will continue, though will remain
- read-only. After recovery is complete, it will be possible to initiate
- read-write transactions.
-
-
-
- Users will be able to tell whether their session is read-only by
- issuing SHOW transaction_read_only. In addition a set of
- functions allow users to
- access information about Hot Standby. These allow you to write
- functions that are aware of the current state of the database. These
- can be used to monitor the progress of recovery, or to allow you to
- write complex programs that restore the database to particular states.
-
-
-
- In recovery, transactions will not be permitted to take any table lock
- higher than RowExclusiveLock. In addition, transactions may never assign
- a TransactionId and may never write WAL.
- Any LOCK TABLE> command that runs on the standby and requests
- a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
-
-
-
- In general queries will not experience lock conflicts with the database
- changes made by recovery. This is becase recovery follows normal
- concurrency control mechanisms, known as MVCC>. There are
- some types of change that will cause conflicts, covered in the following
- section.
-
-
-
-
- Handling query conflicts
-
-
- The primary and standby nodes are in many ways loosely connected. Actions
- on the primary will have an effect on the standby. As a result, there is
- potential for negative interactions or conflicts between them. The easiest
- conflict to understand is performance: if a huge data load is taking place
- on the primary then this will generate a similar stream of WAL records on the
- standby, so standby queries may contend for system resources, such as I/O.
-
-
-
- There are also additional types of conflict that can occur with Hot Standby.
- These conflicts are hard conflicts> in the sense that we may
- need to cancel queries and in some cases disconnect sessions to resolve them.
- The user is provided with a number of optional ways to handle these
- conflicts, though we must first understand the possible reasons behind a conflict.
-
-
-
-
- Access Exclusive Locks from primary node, including both explicit
- LOCK commands and various kinds of DDL action
-
-
-
-
- Dropping tablespaces on the primary while standby queries are using
- those tablespaces for temporary work files (work_mem overflow)
-
-
-
-
- Dropping databases on the primary while users are connected to that
- database on the standby.
-
-
-
-
- Waiting to acquire buffer cleanup locks
-
-
-
-
- Early cleanup of data still visible to the current query's snapshot
-
-
-
-
-
-
- Some WAL redo actions will be for DDL actions. These DDL actions are
- repeating actions that have already committed on the primary node, so
- they must not fail on the standby node. These DDL locks take priority
- and will automatically *cancel* any read-only transactions that get in
- their way, after a grace period. This is similar to the possibility of
- being canceled by the deadlock detector, but in this case the standby
- process always wins, since the replayed actions must not fail. This
- also ensures that replication doesn't fall behind while we wait for a
- query to complete. Again, we assume that the standby is there for high
- availability purposes primarily.
-
-
-
- An example of the above would be an Administrator on Primary server
- runs a DROP TABLE> on a table that's currently being queried
- in the standby server.
- Clearly the query cannot continue if we let the DROP TABLE>
- proceed. If this situation occurred on the primary, the DROP TABLE>
- would wait until the query has finished. When the query is on the standby
- and the DROP TABLE> is on the primary, the primary doesn't have
- information about which queries are running on the standby and so the query
- does not wait on the primary. The WAL change records come through to the
- standby while the standby query is still running, causing a conflict.
-
-
-
- The most common reason for conflict between standby queries and WAL redo is
- "early cleanup". Normally, PostgreSQL> allows cleanup of old
- row versions when there are no users who may need to see them to ensure correct
- visibility of data (the heart of MVCC). If there is a standby query that has
- been running for longer than any query on the primary then it is possible
- for old row versions to be removed by either a vacuum or HOT. This will
- then generate WAL records that, if applied, would remove data on the
- standby that might *potentially* be required by the standby query.
- In more technical language, the primary's xmin horizon is later than
- the standby's xmin horizon, allowing dead rows to be removed.
-
-
-
- Experienced users should note that both row version cleanup and row version
- freezing will potentially conflict with recovery queries. Running a
- manual VACUUM FREEZE> is likely to cause conflicts even on tables
- with no updated or deleted rows.
-
-
-
- We have a number of choices for resolving query conflicts. The default
- is that we wait and hope the query completes. The server will wait
- automatically until the lag between primary and standby is at most
- max_standby_delay> seconds. Once that grace period expires,
- we take one of the following actions:
-
-
-
-
- If the conflict is caused by a lock, we cancel the conflicting standby
- transaction immediately. If the transaction is idle-in-transaction
- then currently we abort the session instead, though this may change
- in the future.
-
-
-
-
-
- If the conflict is caused by cleanup records we tell the standby query
- that a conflict has occurred and that it must cancel itself to avoid the
- risk that it silently fails to read relevant data because
- that data has been removed. (This is regrettably very similar to the
- much feared and iconic error message "snapshot too old"). Some cleanup
- records only cause conflict with older queries, though some types of
- cleanup record affect all queries.
-
-
-
- If cancellation does occur, the query and/or transaction can always
- be re-executed. The error is dynamic and will not necessarily occur
- the same way if the query is executed again.
-
-
-
-
-
-
- max_standby_delay> is set in postgresql.conf>.
- The parameter applies to the server as a whole so if the delay is all used
- up by a single query then there may be little or no waiting for queries that
- follow immediately, though they will have benefited equally from the initial
- waiting period. The server may take time to catch up again before the grace
- period is available again, though if there is a heavy and constant stream
- of conflicts it may seldom catch up fully.
-
-
-
- Users should be clear that tables that are regularly and heavily updated on
- primary server will quickly cause cancellation of longer running queries on
- the standby. In those cases max_standby_delay> can be
- considered somewhat but not exactly the same as setting
- statement_timeout>.
-
-
-
- Other remedial actions exist if the number of cancellations is unacceptable.
- The first option is to connect to primary server and keep a query active
- for as long as we need to run queries on the standby. This guarantees that
- a WAL cleanup record is never generated and we don't ever get query
- conflicts as described above. This could be done using contrib/dblink
- and pg_sleep(), or via other mechanisms. If you do this, you should note
- that this will delay cleanup of dead rows by vacuum or HOT and many
- people may find this undesirable. However, we should remember that
- primary and standby nodes are linked via the WAL, so this situation is no
- different to the case where we ran the query on the primary node itself
- except we have the benefit of off-loading the execution onto the standby.
-
-
-
- It is also possible to set vacuum_defer_cleanup_age> on the primary
- to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
- more time for queries to execute before they are cancelled on the standby,
- without the need for setting a high max_standby_delay>.
-
-
-
- Three-way deadlocks are possible between AccessExclusiveLocks arriving from
- the primary, cleanup WAL records that require buffer cleanup locks and
- user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
- are resolved by time-out when we exceed max_standby_delay>.
-
-
-
- Dropping tablespaces or databases is discussed in the administrator's
- section since they are not typical user situations.
-
-
-
-
- Administrator's Overview
-
-
- If there is a recovery.conf> file present the server will start
- in Hot Standby mode by default, though recovery_connections> can
- be disabled via postgresql.conf>, if required. The server may take
- some time to enable recovery connections since the server must first complete
- sufficient recovery to provide a consistent state against which queries
- can run before enabling read only connections. Look for these messages
- in the server logs
-
-
- LOG: initializing recovery connections
-
- ... then some time later ...
-
- LOG: consistent recovery state reached
- LOG: database system is ready to accept read only connections
-
-
- Consistency information is recorded once per checkpoint on the primary, as long
- as recovery_connections> is enabled (on the primary). If this parameter
- is disabled, it will not be possible to enable recovery connections on the standby.
- The consistent state can also be delayed in the presence of both of these conditions
-
-
-
-
- a write transaction has more than 64 subtransactions
-
-
-
-
- very long-lived write transactions
-
-
-
-
- If you are running file-based log shipping ("warm standby"), you may need
- to wait until the next WAL file arrives, which could be as long as the
- archive_timeout> setting on the primary.
-
-
-
- The setting of some parameters on the standby will need reconfiguration
- if they have been changed on the primary. The value on the standby must
- be equal to or greater than the value on the primary. If these parameters
- are not set high enough then the standby will not be able to track work
- correctly from recovering transactions. If these values are set too low the
- the server will halt. Higher values can then be supplied and the server
- restarted to begin recovery again.
-
-
-
-
- max_connections>
-
-
-
-
- max_prepared_transactions>
-
-
-
-
- max_locks_per_transaction>
-
-
-
-
-
-
- It is important that the administrator consider the appropriate setting
- of max_standby_delay>, set in postgresql.conf>.
- There is no optimal setting and should be set according to business
- priorities. For example if the server is primarily tasked as a High
- Availability server, then you may wish to lower
- max_standby_delay> or even set it to zero, though that is a
- very aggressive setting. If the standby server is tasked as an additional
- server for decision support queries then it may be acceptable to set this
- to a value of many hours (in seconds).
-
-
-
- Transaction status "hint bits" written on primary are not WAL-logged,
- so data on standby will likely re-write the hints again on the standby.
- Thus the main database blocks will produce write I/Os even though
- all users are read-only; no changes have occurred to the data values
- themselves. Users will be able to write large sort temp files and
- re-generate relcache info files, so there is no part of the database
- that is truly read-only during hot standby mode. There is no restriction
- on the use of set returning functions, or other users of tuplestore/tuplesort
- code. Note also that writes to remote databases will still be possible,
- even though the transaction is read-only locally.
-
-
-
- The following types of administrator command are not accepted
- during recovery mode
-
-
-
-
- Data Definition Language (DDL) - e.g. CREATE INDEX
-
-
-
-
- Privilege and Ownership - GRANT, REVOKE, REASSIGN
-
-
-
-
- Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
-
-
-
-
-
-
- Note again that some of these commands are actually allowed during
- "read only" mode transactions on the primary.
-
-
-
- As a result, you cannot create additional indexes that exist solely
- on the standby, nor can statistics that exist solely on the standby.
- If these administrator commands are needed they should be executed
- on the primary so that the changes will propagate through to the
- standby.
-
-
-
- pg_cancel_backend()> will work on user backends, but not the
- Startup process, which performs recovery. pg_stat_activity does not
- show an entry for the Startup process, nor do recovering transactions
- show as active. As a result, pg_prepared_xacts is always empty during
- recovery. If you wish to resolve in-doubt prepared transactions
- then look at pg_prepared_xacts on the primary and issue commands to
- resolve those transactions there.
-
-
-
- pg_locks will show locks held by backends as normal. pg_locks also shows
- a virtual transaction managed by the Startup process that owns all
- AccessExclusiveLocks held by transactions being replayed by recovery.
- Note that Startup process does not acquire locks to
- make database changes and thus locks other than AccessExclusiveLocks
- do not show in pg_locks for the Startup process, they are just presumed
- to exist.
-
-
-
- check_pgsql> will work, but it is very simple.
- check_postgres> will also work, though many some actions
- could give different or confusing results.
- e.g. last vacuum time will not be maintained for example, since no
- vacuum occurs on the standby (though vacuums running on the primary do
- send their changes to the standby).
-
-
-
- WAL file control commands will not work during recovery
- e.g. pg_start_backup>, pg_switch_xlog> etc..
-
-
-
- Dynamically loadable modules work, including pg_stat_statements.
-
-
-
- Advisory locks work normally in recovery, including deadlock detection.
- Note that advisory locks are never WAL logged, so it is not possible for
- an advisory lock on either the primary or the standby to conflict with WAL
- replay. Nor is it possible to acquire an advisory lock on the primary
- and have it initiate a similar advisory lock on the standby. Advisory
- locks relate only to a single server on which they are acquired.
-
-
-
- Trigger-based replication systems such as Slony>,
- Londiste> and Bucardo> won't run on the
- standby at all, though they will run happily on the primary server as
- long as the changes are not sent to standby servers to be applied.
- WAL replay is not trigger-based so you cannot relay from the
- standby to any system that requires additional database writes or
- relies on the use of triggers.
-
-
-
- New oids cannot be assigned, though some UUID> generators may still
- work as long as they do not rely on writing new status to the database.
-
-
-
- Currently, temp table creation is not allowed during read only
- transactions, so in some cases existing scripts will not run correctly.
- It is possible we may relax that restriction in a later release. This is
- both a SQL Standard compliance issue and a technical issue.
-
-
-
- DROP TABLESPACE> can only succeed if the tablespace is empty.
- Some standby users may be actively using the tablespace via their
- temp_tablespaces> parameter. If there are temp files in the
- tablespace we currently cancel all active queries to ensure that temp
- files are removed, so that we can remove the tablespace and continue with
- WAL replay.
-
-
-
- Running DROP DATABASE>, ALTER DATABASE ... SET TABLESPACE>,
- or ALTER DATABASE ... RENAME> on primary will generate a log message
- that will cause all users connected to that database on the standby to be
- forcibly disconnected. This action occurs immediately, whatever the setting of
- max_standby_delay>.
-
-
-
- In normal running, if you issue DROP USER> or DROP ROLE>
- for a role with login capability while that user is still connected then
- nothing happens to the connected user - they remain connected. The user cannot
- reconnect however. This behaviour applies in recovery also, so a
- DROP USER> on the primary does not disconnect that user on the standby.
-
-
-
- Stats collector is active during recovery. All scans, reads, blocks,
- index usage etc will all be recorded normally on the standby. Replayed
- actions will not duplicate their effects on primary, so replaying an
- insert will not increment the Inserts column of pg_stat_user_tables.
- The stats file is deleted at start of recovery, so stats from primary
- and standby will differ; this is considered a feature not a bug.
-
-
-
- Autovacuum is not active during recovery, though will start normally
- at the end of recovery.
-
-
-
- Background writer is active during recovery and will perform
- restartpoints (similar to checkpoints on primary) and normal block
- cleaning activities. The CHECKPOINT> command is accepted during recovery,
- though performs a restartpoint rather than a new checkpoint.
-
-
-
-
- Hot Standby Parameter Reference
-
-
- Various parameters have been mentioned above in the
- and sections.
-
-
-
- On the primary, parameters recovery_connections> and
- vacuum_defer_cleanup_age> can be used to enable and control the
- primary server to assist the successful configuration of Hot Standby servers.
- max_standby_delay> has no effect if set on the primary.
-
-
-
- On the standby, parameters recovery_connections> and
- max_standby_delay> can be used to enable and control Hot Standby.
- standby server to assist the successful configuration of Hot Standby servers.
- vacuum_defer_cleanup_age> has no effect during recovery.
-
-
-
-
- Caveats
-
-
- At this writing, there are several limitations of Hot Standby.
- These can and probably will be fixed in future releases:
-
-
-
-
- Operations on hash indexes are not presently WAL-logged, so
- replay will not update these indexes. Hash indexes will not be
- used for query plans during recovery.
-
-
-
-
- Full knowledge of running transactions is required before snapshots
- may be taken. Transactions that take use large numbers of subtransactions
- (currently greater than 64) will delay the start of read only
- connections until the completion of the longest running write transaction.
- If this situation occurs explanatory messages will be sent to server log.
-
-
-
-
- Valid starting points for recovery connections are generated at each
- checkpoint on the master. If the standby is shutdown while the master
- is in a shutdown state it may not be possible to re-enter Hot Standby
- until the primary is started up so that it generates further starting
- points in the WAL logs. This is not considered a serious issue
- because the standby is usually switched into the primary role while
- the first node is taken down.
-
-
-
-
- At the end of recovery, AccessExclusiveLocks held by prepared transactions
- will require twice the normal number of lock table entries. If you plan
- on running either a large number of concurrent prepared transactions
- that normally take AccessExclusiveLocks, or you plan on having one
- large transaction that takes many AccessExclusiveLocks then you are
- advised to select a larger value of max_locks_per_transaction>,
- up to, but never more than twice the value of the parameter setting on
- the primary server in rare extremes. You need not consider this at all if
- your setting of max_prepared_transactions> is 0>.
-
-
-
-
-
-
-
-
-
Migration Between Releases
--- 1492,1497 ----
*** a/doc/src/sgml/high-availability.sgml
--- b/doc/src/sgml/high-availability.sgml
***************
*** 79,84 ****
--- 79,87 ----
also available.
+
+ Comparison of different solutions
+
***************
*** 450,453 **** protocol to make nodes agree on a serializable transactional order.
--- 453,1840 ----
+
+
+
+ File-based Log Shipping
+
+
+ warm standby
+
+
+
+ PITR standby
+
+
+
+ standby server
+
+
+
+ log shipping
+
+
+
+ witness server
+
+
+
+ STONITH
+
+
+
+ Continuous archiving can be used to create a high
+ availability> (HA) cluster configuration with one or more
+ standby servers> ready to take over operations if the
+ primary server fails. This capability is widely referred to as
+ warm standby> or log shipping>.
+
+
+
+ The primary and standby server work together to provide this capability,
+ though the servers are only loosely coupled. The primary server operates
+ in continuous archiving mode, while each standby server operates in
+ continuous recovery mode, reading the WAL files from the primary. No
+ changes to the database tables are required to enable this capability,
+ so it offers low administration overhead compared to some other
+ replication approaches. This configuration also has relatively low
+ performance impact on the primary server.
+
+
+
+ Directly moving WAL records from one database server to another
+ is typically described as log shipping. PostgreSQL>
+ implements file-based log shipping, which means that WAL records are
+ transferred one file (WAL segment) at a time. WAL files (16MB) can be
+ shipped easily and cheaply over any distance, whether it be to an
+ adjacent system, another system at the same site, or another system on
+ the far side of the globe. The bandwidth required for this technique
+ varies according to the transaction rate of the primary server.
+ Record-based log shipping is also possible with custom-developed
+ procedures, as discussed in .
+
+
+
+ It should be noted that the log shipping is asynchronous, i.e., the WAL
+ records are shipped after transaction commit. As a result there is a
+ window for data loss should the primary server suffer a catastrophic
+ failure: transactions not yet shipped will be lost. The length of the
+ window of data loss can be limited by use of the
+ archive_timeout parameter, which can be set as low
+ as a few seconds if required. However such a low setting will
+ substantially increase the bandwidth required for file shipping.
+ If you need a window of less than a minute or so, it's probably better
+ to consider record-based log shipping.
+
+
+
+ The standby server is not available for access, since it is continually
+ performing recovery processing. Recovery performance is sufficiently
+ good that the standby will typically be only moments away from full
+ availability once it has been activated. As a result, we refer to this
+ capability as a warm standby configuration that offers high
+ availability. Restoring a server from an archived base backup and
+ rollforward will take considerably longer, so that technique only
+ offers a solution for disaster recovery, not high availability.
+
+
+
+ Planning
+
+
+ It is usually wise to create the primary and standby servers
+ so that they are as similar as possible, at least from the
+ perspective of the database server. In particular, the path names
+ associated with tablespaces will be passed across unmodified, so both
+ primary and standby servers must have the same mount paths for
+ tablespaces if that feature is used. Keep in mind that if
+
+ is executed on the primary, any new mount point needed for it must
+ be created on the primary and all standby servers before the command
+ is executed. Hardware need not be exactly the same, but experience shows
+ that maintaining two identical systems is easier than maintaining two
+ dissimilar ones over the lifetime of the application and system.
+ In any case the hardware architecture must be the same — shipping
+ from, say, a 32-bit to a 64-bit system will not work.
+
+
+
+ In general, log shipping between servers running different major
+ PostgreSQL> release
+ levels is not possible. It is the policy of the PostgreSQL Global
+ Development Group not to make changes to disk formats during minor release
+ upgrades, so it is likely that running different minor release levels
+ on primary and standby servers will work successfully. However, no
+ formal support for that is offered and you are advised to keep primary
+ and standby servers at the same release level as much as possible.
+ When updating to a new minor release, the safest policy is to update
+ the standby servers first — a new minor release is more likely
+ to be able to read WAL files from a previous minor release than vice
+ versa.
+
+
+
+ There is no special mode required to enable a standby server. The
+ operations that occur on both primary and standby servers are
+ normal continuous archiving and recovery tasks. The only point of
+ contact between the two database servers is the archive of WAL files
+ that both share: primary writing to the archive, standby reading from
+ the archive. Care must be taken to ensure that WAL archives from separate
+ primary servers do not become mixed together or confused. The archive
+ need not be large if it is only required for standby operation.
+
+
+
+ The magic that makes the two loosely coupled servers work together is
+ simply a restore_command> used on the standby that,
+ when asked for the next WAL file, waits for it to become available from
+ the primary. The restore_command> is specified in the
+ recovery.conf> file on the standby server. Normal recovery
+ processing would request a file from the WAL archive, reporting failure
+ if the file was unavailable. For standby processing it is normal for
+ the next WAL file to be unavailable, so we must be patient and wait for
+ it to appear. For files ending in .backup> or
+ .history> there is no need to wait, and a non-zero return
+ code must be returned. A waiting restore_command> can be
+ written as a custom script that loops after polling for the existence of
+ the next WAL file. There must also be some way to trigger failover, which
+ should interrupt the restore_command>, break the loop and
+ return a file-not-found error to the standby server. This ends recovery
+ and the standby will then come up as a normal server.
+
+
+
+ Pseudocode for a suitable restore_command> is:
+
+ triggered = false;
+ while (!NextWALFileReady() && !triggered)
+ {
+ sleep(100000L); /* wait for ~0.1 sec */
+ if (CheckForExternalTrigger())
+ triggered = true;
+ }
+ if (!triggered)
+ CopyWALFileForRecovery();
+
+
+
+
+ A working example of a waiting restore_command> is provided
+ as a contrib> module named pg_standby>. It
+ should be used as a reference on how to correctly implement the logic
+ described above. It can also be extended as needed to support specific
+ configurations and environments.
+
+
+
+ PostgreSQL does not provide the system
+ software required to identify a failure on the primary and notify
+ the standby database server. Many such tools exist and are well
+ integrated with the operating system facilities required for
+ successful failover, such as IP address migration.
+
+
+
+ The method for triggering failover is an important part of planning
+ and design. One potential option is the restore_command>
+ command. It is executed once for each WAL file, but the process
+ running the restore_command> is created and dies for
+ each file, so there is no daemon or server process, and we cannot
+ use signals or a signal handler. Therefore, the
+ restore_command> is not suitable to trigger failover.
+ It is possible to use a simple timeout facility, especially if
+ used in conjunction with a known archive_timeout>
+ setting on the primary. However, this is somewhat error prone
+ since a network problem or busy primary server might be sufficient
+ to initiate failover. A notification mechanism such as the explicit
+ creation of a trigger file is ideal, if this can be arranged.
+
+
+
+ The size of the WAL archive can be minimized by using the %r>
+ option of the restore_command>. This option specifies the
+ last archive file name that needs to be kept to allow the recovery to
+ restart correctly. This can be used to truncate the archive once
+ files are no longer required, assuming the archive is writable from the
+ standby server.
+
+
+
+
+ Implementation
+
+
+ The short procedure for configuring a standby server is as follows. For
+ full details of each step, refer to previous sections as noted.
+
+
+
+ Set up primary and standby systems as nearly identical as
+ possible, including two identical copies of
+ PostgreSQL> at the same release level.
+
+
+
+
+ Set up continuous archiving from the primary to a WAL archive
+ directory on the standby server. Ensure that
+ ,
+ and
+
+ are set appropriately on the primary
+ (see ).
+
+
+
+
+ Make a base backup of the primary server (see ), and load this data onto the standby.
+
+
+
+
+ Begin recovery on the standby server from the local WAL
+ archive, using a recovery.conf> that specifies a
+ restore_command> that waits as described
+ previously (see ).
+
+
+
+
+
+
+ Recovery treats the WAL archive as read-only, so once a WAL file has
+ been copied to the standby system it can be copied to tape at the same
+ time as it is being read by the standby database server.
+ Thus, running a standby server for high availability can be performed at
+ the same time as files are stored for longer term disaster recovery
+ purposes.
+
+
+
+ For testing purposes, it is possible to run both primary and standby
+ servers on the same system. This does not provide any worthwhile
+ improvement in server robustness, nor would it be described as HA.
+
+
+
+
+ Record-based Log Shipping
+
+
+ PostgreSQL directly supports file-based
+ log shipping as described above. It is also possible to implement
+ record-based log shipping, though this requires custom development.
+
+
+
+ An external program can call the pg_xlogfile_name_offset()>
+ function (see )
+ to find out the file name and the exact byte offset within it of
+ the current end of WAL. It can then access the WAL file directly
+ and copy the data from the last known end of WAL through the current end
+ over to the standby servers. With this approach, the window for data
+ loss is the polling cycle time of the copying program, which can be very
+ small, and there is no wasted bandwidth from forcing partially-used
+ segment files to be archived. Note that the standby servers'
+ restore_command> scripts can only deal with whole WAL files,
+ so the incrementally copied data is not ordinarily made available to
+ the standby servers. It is of use only when the primary dies —
+ then the last partial WAL file is fed to the standby before allowing
+ it to come up. The correct implementation of this process requires
+ cooperation of the restore_command> script with the data
+ copying program.
+
+
+
+ Starting with PostgreSQL> version 8.5, you can use
+ streaming replication (see ) to
+ achieve the same with less effort.
+
+
+
+
+
+ Streaming Replication
+
+
+ Streaming Replication
+
+
+
+ PostgreSQL> includes a simple streaming replication
+ mechanism, which lets the standby server to stay more up-to-date than
+ file-based replication allows. The standby connects to the primary
+ and the primary starts streaming WAL records from where the standby
+ left off, and continues streaming them as they are generated, without
+ waiting for the WAL file to be filled. So with streaming replication,
+ archive_timeout> does not need to be configured.
+
+
+
+ Streaming replication relies on file-based continuous archiving for
+ making the base backup and for allowing a standby to catch up if it's
+ disconnected from the primary for long enough for the primary to
+ delete old WAL files still required by the standby.
+
+
+
+ Setup
+
+ The short procedure for configuring streaming replication is as follows.
+ For full details of each step, refer to other sections as noted.
+
+
+
+ Set up primary and standby systems as near identically as possible,
+ including two identical copies of PostgreSQL> at the
+ same release level.
+
+
+
+
+ Set up continuous archiving from the primary to a WAL archive located
+ in a directory on the standby server. Ensure that
+ ,
+ and
+
+ are set appropriately on the primary
+ (see ).
+
+
+
+
+
+ Set up connections and authentication so that the standby server can
+ successfully connect to the pseudo replication> database of
+ the primary server (see
+ ). Ensure that
+ and pg_hba.conf> are
+ configured appropriately on the primary.
+
+
+ On systems that support the keepalive socket option, setting
+ ,
+ and
+ helps you to find the
+ troubles with replication (e.g., the network outage or the failure of
+ the standby server) as soon as possible.
+
+
+
+
+ Set the maximum number of concurrent connections from the standby servers
+ (see for details).
+
+
+
+
+ Enable WAL archiving in the primary server because we need to make a base
+ backup of it later (see and
+ for details).
+
+
+
+
+ Start the PostgreSQL> server on the primary.
+
+
+
+
+ Make a base backup of the primary server (see
+ ), and load this data onto the
+ standby. Note that all files present in pg_xlog>
+ and pg_xlog/archive_status> on the standby>
+ server should be removed because they might be obsolete.
+
+
+
+
+ Set up WAL archiving, connections and authentication like the primary
+ server, because the standby server might work as a primary server after
+ failover. Ensure that your settings are consistent with the
+ future> environment after the primary and the standby
+ server are interchanged by failover. If you're setting up the standby
+ server for e.g reporting purposes, with no plans to fail over to it,
+ configure the standby accordingly.
+
+
+
+
+ Create a recovery command file recovery.conf> in the data
+ directory on the standby server.
+
+
+
+
+ standby_mode (boolean)
+
+
+ Specifies whether to start the PostgreSQL> server as
+ a standby. If this parameter is on>, the streaming
+ replication is enabled and the standby server will try to connect
+ to the primary to receive and apply WAL records continuously. The
+ default is off>, which allows only an archive recovery
+ without replication. So, streaming replication requires this
+ parameter to be explicitly set to on>.
+
+
+
+
+ primary_conninfo (string)
+
+
+ Specifies a connection string which is used for the standby server
+ to connect with the primary. This string is in the same format as
+ described in . If any option is
+ unspecified in this string, then the corresponding environment
+ variable (see ) is checked. If the
+ environment variable is not set either, then the indicated built-in
+ defaults are used.
+
+
+ The built-in replication requires that a host name (or host address)
+ or port number which the primary server listens on should be
+ specified in this string, respectively. Also ensure that a role with
+ the SUPERUSER> and LOGIN> privileges on the
+ primary is set (see
+ ). Note that
+ the password needs to be set if the primary demands password
+ authentication.
+
+
+
+
+ trigger_file (string)
+
+
+ Specifies a trigger file whose presence activates the standby.
+ If no trigger file is specified, the standby never exits
+ recovery.
+
+
+
+
+
+
+
+ Start the PostgreSQL> server on the standby. The standby
+ server will go into recovery mode and proceeds to receive WAL records
+ from the primary and apply them continuously.
+
+
+
+
+
+
+ Authentication
+
+ It's very important that the access privilege for replication are set
+ properly so that only trusted users can read the WAL stream, because it's
+ easy to extract serious information from it.
+
+
+ Only superuser is allowed to connect to the primary as the replication
+ standby. So a role with the SUPERUSER> and LOGIN>
+ privileges needs to be created in the primary.
+
+
+ Client authentication for replication is controlled by the
+ pg_hba.conf> record specifying replication> in the
+ database> field. For example, if the standby is running on
+ host IP 192.168.1.100> and the superuser's name for replication
+ is foo>, the administrator can add the following line to the
+ pg_hba.conf> file on the primary.
+
+
+ # Allow the user "foo" from host 192.168.1.100 to connect to the primary
+ # as a replication standby if the user's password is correctly supplied.
+ #
+ # TYPE DATABASE USER CIDR-ADDRESS METHOD
+ host replication foo 192.168.1.100/32 md5
+
+
+
+ The host name and port number of the primary, user name to connect as,
+ and password are specified in the recovery.conf> file or
+ the corresponding environment variable on the standby.
+ For example, if the primary is running on host IP 192.168.1.50>,
+ port 5432, the superuser's name for replication is
+ foo>, and the password is foopass>, the administrator
+ can add the following line to the recovery.conf> file on the
+ standby.
+
+
+ # The standby connects to the primary that is running on host 192.168.1.50
+ # and port 5432 as the user "foo" whose password is "foopass".
+ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
+
+
+
+
+
+
+ Failover
+
+
+ If the primary server fails then the standby server should begin
+ failover procedures.
+
+
+
+ If the standby server fails then no failover need take place. If the
+ standby server can be restarted, even some time later, then the recovery
+ process can also be immediately restarted, taking advantage of
+ restartable recovery. If the standby server cannot be restarted, then a
+ full new standby server instance should be created.
+
+
+
+ If the primary server fails and the standby server becomes the
+ new primary, and then the old primary restarts, you must have
+ a mechanism for informing old primary that it is no longer the primary. This is
+ sometimes known as STONITH (Shoot The Other Node In The Head), which is
+ necessary to avoid situations where both systems think they are the
+ primary, which will lead to confusion and ultimately data loss.
+
+
+
+ Many failover systems use just two systems, the primary and the standby,
+ connected by some kind of heartbeat mechanism to continually verify the
+ connectivity between the two and the viability of the primary. It is
+ also possible to use a third system (called a witness server) to prevent
+ some cases of inappropriate failover, but the additional complexity
+ might not be worthwhile unless it is set up with sufficient care and
+ rigorous testing.
+
+
+
+ Once failover to the standby occurs, we have only a
+ single server in operation. This is known as a degenerate state.
+ The former standby is now the primary, but the former primary is down
+ and might stay down. To return to normal operation we must
+ fully recreate a standby server,
+ either on the former primary system when it comes up, or on a third,
+ possibly new, system. Once complete the primary and standby can be
+ considered to have switched roles. Some people choose to use a third
+ server to provide backup for the new primary until the new standby
+ server is recreated,
+ though clearly this complicates the system configuration and
+ operational processes.
+
+
+
+ So, switching from primary to standby server can be fast but requires
+ some time to re-prepare the failover cluster. Regular switching from
+ primary to standby is useful, since it allows regular downtime on
+ each system for maintenance. This also serves as a test of the
+ failover mechanism to ensure that it will really work when you need it.
+ Written administration procedures are advised.
+
+
+
+
+ Hot Standby
+
+
+ Hot Standby
+
+
+
+ Hot Standby is the term used to describe the ability to connect to
+ the server and run queries while the server is in archive recovery. This
+ is useful for both log shipping replication and for restoring a backup
+ to an exact state with great precision.
+ The term Hot Standby also refers to the ability of the server to move
+ from recovery through to normal running while users continue running
+ queries and/or continue their connections.
+
+
+
+ Running queries in recovery is in many ways the same as normal running
+ though there are a large number of usage and administrative points
+ to note.
+
+
+
+ User's Overview
+
+
+ Users can connect to the database while the server is in recovery
+ and perform read-only queries. Read-only access to catalogs and views
+ will also occur as normal.
+
+
+
+ The data on the standby takes some time to arrive from the primary server
+ so there will be a measurable delay between primary and standby. Running the
+ same query nearly simultaneously on both primary and standby might therefore
+ return differing results. We say that data on the standby is eventually
+ consistent with the primary.
+ Queries executed on the standby will be correct with regard to the transactions
+ that had been recovered at the start of the query, or start of first statement,
+ in the case of serializable transactions. In comparison with the primary,
+ the standby returns query results that could have been obtained on the primary
+ at some exact moment in the past.
+
+
+
+ When a transaction is started in recovery, the parameter
+ transaction_read_only> will be forced to be true, regardless of the
+ default_transaction_read_only> setting in postgresql.conf>.
+ It can't be manually set to false either. As a result, all transactions
+ started during recovery will be limited to read-only actions only. In all
+ other ways, connected sessions will appear identical to sessions
+ initiated during normal processing mode. There are no special commands
+ required to initiate a connection at this time, so all interfaces
+ work normally without change. After recovery finishes, the session
+ will allow normal read-write transactions at the start of the next
+ transaction, if these are requested.
+
+
+
+ Read-only here means "no writes to the permanent database tables".
+ There are no problems with queries that make use of transient sort and
+ work files.
+
+
+
+ The following actions are allowed
+
+
+
+
+ Query access - SELECT, COPY TO including views and SELECT RULEs
+
+
+
+
+ Cursor commands - DECLARE, FETCH, CLOSE,
+
+
+
+
+ Parameters - SHOW, SET, RESET
+
+
+
+
+ Transaction management commands
+
+
+
+ BEGIN, END, ABORT, START TRANSACTION
+
+
+
+
+ SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
+
+
+
+
+ EXCEPTION blocks and other internal subtransactions
+
+
+
+
+
+
+
+ LOCK TABLE, though only when explicitly in one of these modes:
+ ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
+
+
+
+
+ Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
+
+
+
+
+ Plugins and extensions - LOAD
+
+
+
+
+
+
+ These actions produce error messages
+
+
+
+
+ Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
+ Note that there are no allowed actions that result in a trigger
+ being executed during recovery.
+
+
+
+
+ Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
+ This also applies to temporary tables currently because currently their
+ definition causes writes to catalog tables.
+
+
+
+
+ SELECT ... FOR SHARE | UPDATE which cause row locks to be written
+
+
+
+
+ RULEs on SELECT statements that generate DML commands.
+
+
+
+
+ LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
+ LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
+
+
+
+
+ Transaction management commands that explicitly set non-read only state
+
+
+
+ BEGIN READ WRITE,
+ START TRANSACTION READ WRITE
+
+
+
+
+ SET TRANSACTION READ WRITE,
+ SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
+
+
+
+
+ SET transaction_read_only = off
+
+
+
+
+
+
+
+ Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
+ ROLLBACK PREPARED because even read-only transactions need to write
+ WAL in the prepare phase (the first phase of two phase commit).
+
+
+
+
+ sequence update - nextval()
+
+
+
+
+ LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
+
+
+
+
+
+
+ Note that current behaviour of read only transactions when not in
+ recovery is to allow the last two actions, so there are small and
+ subtle differences in behaviour between read-only transactions
+ run on standby and during normal running.
+ It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
+ temporary tables may be lifted in a future release, if their internal
+ implementation is altered to make this possible.
+
+
+
+ If failover or switchover occurs the database will switch to normal
+ processing mode. Sessions will remain connected while the server
+ changes mode. Current transactions will continue, though will remain
+ read-only. After recovery is complete, it will be possible to initiate
+ read-write transactions.
+
+
+
+ Users will be able to tell whether their session is read-only by
+ issuing SHOW transaction_read_only. In addition a set of
+ functions allow users to
+ access information about Hot Standby. These allow you to write
+ functions that are aware of the current state of the database. These
+ can be used to monitor the progress of recovery, or to allow you to
+ write complex programs that restore the database to particular states.
+
+
+
+ In recovery, transactions will not be permitted to take any table lock
+ higher than RowExclusiveLock. In addition, transactions may never assign
+ a TransactionId and may never write WAL.
+ Any LOCK TABLE> command that runs on the standby and requests
+ a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
+
+
+
+ In general queries will not experience lock conflicts with the database
+ changes made by recovery. This is becase recovery follows normal
+ concurrency control mechanisms, known as MVCC>. There are
+ some types of change that will cause conflicts, covered in the following
+ section.
+
+
+
+
+ Handling query conflicts
+
+
+ The primary and standby nodes are in many ways loosely connected. Actions
+ on the primary will have an effect on the standby. As a result, there is
+ potential for negative interactions or conflicts between them. The easiest
+ conflict to understand is performance: if a huge data load is taking place
+ on the primary then this will generate a similar stream of WAL records on the
+ standby, so standby queries may contend for system resources, such as I/O.
+
+
+
+ There are also additional types of conflict that can occur with Hot Standby.
+ These conflicts are hard conflicts> in the sense that we may
+ need to cancel queries and in some cases disconnect sessions to resolve them.
+ The user is provided with a number of optional ways to handle these
+ conflicts, though we must first understand the possible reasons behind a conflict.
+
+
+
+
+ Access Exclusive Locks from primary node, including both explicit
+ LOCK commands and various kinds of DDL action
+
+
+
+
+ Dropping tablespaces on the primary while standby queries are using
+ those tablespaces for temporary work files (work_mem overflow)
+
+
+
+
+ Dropping databases on the primary while users are connected to that
+ database on the standby.
+
+
+
+
+ Waiting to acquire buffer cleanup locks
+
+
+
+
+ Early cleanup of data still visible to the current query's snapshot
+
+
+
+
+
+
+ Some WAL redo actions will be for DDL actions. These DDL actions are
+ repeating actions that have already committed on the primary node, so
+ they must not fail on the standby node. These DDL locks take priority
+ and will automatically *cancel* any read-only transactions that get in
+ their way, after a grace period. This is similar to the possibility of
+ being canceled by the deadlock detector, but in this case the standby
+ process always wins, since the replayed actions must not fail. This
+ also ensures that replication doesn't fall behind while we wait for a
+ query to complete. Again, we assume that the standby is there for high
+ availability purposes primarily.
+
+
+
+ An example of the above would be an Administrator on Primary server
+ runs a DROP TABLE> on a table that's currently being queried
+ in the standby server.
+ Clearly the query cannot continue if we let the DROP TABLE>
+ proceed. If this situation occurred on the primary, the DROP TABLE>
+ would wait until the query has finished. When the query is on the standby
+ and the DROP TABLE> is on the primary, the primary doesn't have
+ information about which queries are running on the standby and so the query
+ does not wait on the primary. The WAL change records come through to the
+ standby while the standby query is still running, causing a conflict.
+
+
+
+ The most common reason for conflict between standby queries and WAL redo is
+ "early cleanup". Normally, PostgreSQL> allows cleanup of old
+ row versions when there are no users who may need to see them to ensure correct
+ visibility of data (the heart of MVCC). If there is a standby query that has
+ been running for longer than any query on the primary then it is possible
+ for old row versions to be removed by either a vacuum or HOT. This will
+ then generate WAL records that, if applied, would remove data on the
+ standby that might *potentially* be required by the standby query.
+ In more technical language, the primary's xmin horizon is later than
+ the standby's xmin horizon, allowing dead rows to be removed.
+
+
+
+ Experienced users should note that both row version cleanup and row version
+ freezing will potentially conflict with recovery queries. Running a
+ manual VACUUM FREEZE> is likely to cause conflicts even on tables
+ with no updated or deleted rows.
+
+
+
+ We have a number of choices for resolving query conflicts. The default
+ is that we wait and hope the query completes. The server will wait
+ automatically until the lag between primary and standby is at most
+ max_standby_delay> seconds. Once that grace period expires,
+ we take one of the following actions:
+
+
+
+
+ If the conflict is caused by a lock, we cancel the conflicting standby
+ transaction immediately. If the transaction is idle-in-transaction
+ then currently we abort the session instead, though this may change
+ in the future.
+
+
+
+
+
+ If the conflict is caused by cleanup records we tell the standby query
+ that a conflict has occurred and that it must cancel itself to avoid the
+ risk that it silently fails to read relevant data because
+ that data has been removed. (This is regrettably very similar to the
+ much feared and iconic error message "snapshot too old"). Some cleanup
+ records only cause conflict with older queries, though some types of
+ cleanup record affect all queries.
+
+
+
+ If cancellation does occur, the query and/or transaction can always
+ be re-executed. The error is dynamic and will not necessarily occur
+ the same way if the query is executed again.
+
+
+
+
+
+
+ max_standby_delay> is set in postgresql.conf>.
+ The parameter applies to the server as a whole so if the delay is all used
+ up by a single query then there may be little or no waiting for queries that
+ follow immediately, though they will have benefited equally from the initial
+ waiting period. The server may take time to catch up again before the grace
+ period is available again, though if there is a heavy and constant stream
+ of conflicts it may seldom catch up fully.
+
+
+
+ Users should be clear that tables that are regularly and heavily updated on
+ primary server will quickly cause cancellation of longer running queries on
+ the standby. In those cases max_standby_delay> can be
+ considered somewhat but not exactly the same as setting
+ statement_timeout>.
+
+
+
+ Other remedial actions exist if the number of cancellations is unacceptable.
+ The first option is to connect to primary server and keep a query active
+ for as long as we need to run queries on the standby. This guarantees that
+ a WAL cleanup record is never generated and we don't ever get query
+ conflicts as described above. This could be done using contrib/dblink
+ and pg_sleep(), or via other mechanisms. If you do this, you should note
+ that this will delay cleanup of dead rows by vacuum or HOT and many
+ people may find this undesirable. However, we should remember that
+ primary and standby nodes are linked via the WAL, so this situation is no
+ different to the case where we ran the query on the primary node itself
+ except we have the benefit of off-loading the execution onto the standby.
+
+
+
+ It is also possible to set vacuum_defer_cleanup_age> on the primary
+ to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
+ more time for queries to execute before they are cancelled on the standby,
+ without the need for setting a high max_standby_delay>.
+
+
+
+ Three-way deadlocks are possible between AccessExclusiveLocks arriving from
+ the primary, cleanup WAL records that require buffer cleanup locks and
+ user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
+ are resolved by time-out when we exceed max_standby_delay>.
+
+
+
+ Dropping tablespaces or databases is discussed in the administrator's
+ section since they are not typical user situations.
+
+
+
+
+ Administrator's Overview
+
+
+ If there is a recovery.conf> file present the server will start
+ in Hot Standby mode by default, though recovery_connections> can
+ be disabled via postgresql.conf>, if required. The server may take
+ some time to enable recovery connections since the server must first complete
+ sufficient recovery to provide a consistent state against which queries
+ can run before enabling read only connections. Look for these messages
+ in the server logs
+
+
+ LOG: initializing recovery connections
+
+ ... then some time later ...
+
+ LOG: consistent recovery state reached
+ LOG: database system is ready to accept read only connections
+
+
+ Consistency information is recorded once per checkpoint on the primary, as long
+ as recovery_connections> is enabled (on the primary). If this parameter
+ is disabled, it will not be possible to enable recovery connections on the standby.
+ The consistent state can also be delayed in the presence of both of these conditions
+
+
+
+
+ a write transaction has more than 64 subtransactions
+
+
+
+
+ very long-lived write transactions
+
+
+
+
+ If you are running file-based log shipping ("warm standby"), you may need
+ to wait until the next WAL file arrives, which could be as long as the
+ archive_timeout> setting on the primary.
+
+
+
+ The setting of some parameters on the standby will need reconfiguration
+ if they have been changed on the primary. The value on the standby must
+ be equal to or greater than the value on the primary. If these parameters
+ are not set high enough then the standby will not be able to track work
+ correctly from recovering transactions. If these values are set too low the
+ the server will halt. Higher values can then be supplied and the server
+ restarted to begin recovery again.
+
+
+
+
+ max_connections>
+
+
+
+
+ max_prepared_transactions>
+
+
+
+
+ max_locks_per_transaction>
+
+
+
+
+
+
+ It is important that the administrator consider the appropriate setting
+ of max_standby_delay>, set in postgresql.conf>.
+ There is no optimal setting and should be set according to business
+ priorities. For example if the server is primarily tasked as a High
+ Availability server, then you may wish to lower
+ max_standby_delay> or even set it to zero, though that is a
+ very aggressive setting. If the standby server is tasked as an additional
+ server for decision support queries then it may be acceptable to set this
+ to a value of many hours (in seconds).
+
+
+
+ Transaction status "hint bits" written on primary are not WAL-logged,
+ so data on standby will likely re-write the hints again on the standby.
+ Thus the main database blocks will produce write I/Os even though
+ all users are read-only; no changes have occurred to the data values
+ themselves. Users will be able to write large sort temp files and
+ re-generate relcache info files, so there is no part of the database
+ that is truly read-only during hot standby mode. There is no restriction
+ on the use of set returning functions, or other users of tuplestore/tuplesort
+ code. Note also that writes to remote databases will still be possible,
+ even though the transaction is read-only locally.
+
+
+
+ The following types of administrator command are not accepted
+ during recovery mode
+
+
+
+
+ Data Definition Language (DDL) - e.g. CREATE INDEX
+
+
+
+
+ Privilege and Ownership - GRANT, REVOKE, REASSIGN
+
+
+
+
+ Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
+
+
+
+
+
+
+ Note again that some of these commands are actually allowed during
+ "read only" mode transactions on the primary.
+
+
+
+ As a result, you cannot create additional indexes that exist solely
+ on the standby, nor can statistics that exist solely on the standby.
+ If these administrator commands are needed they should be executed
+ on the primary so that the changes will propagate through to the
+ standby.
+
+
+
+ pg_cancel_backend()> will work on user backends, but not the
+ Startup process, which performs recovery. pg_stat_activity does not
+ show an entry for the Startup process, nor do recovering transactions
+ show as active. As a result, pg_prepared_xacts is always empty during
+ recovery. If you wish to resolve in-doubt prepared transactions
+ then look at pg_prepared_xacts on the primary and issue commands to
+ resolve those transactions there.
+
+
+
+ pg_locks will show locks held by backends as normal. pg_locks also shows
+ a virtual transaction managed by the Startup process that owns all
+ AccessExclusiveLocks held by transactions being replayed by recovery.
+ Note that Startup process does not acquire locks to
+ make database changes and thus locks other than AccessExclusiveLocks
+ do not show in pg_locks for the Startup process, they are just presumed
+ to exist.
+
+
+
+ check_pgsql> will work, but it is very simple.
+ check_postgres> will also work, though many some actions
+ could give different or confusing results.
+ e.g. last vacuum time will not be maintained for example, since no
+ vacuum occurs on the standby (though vacuums running on the primary do
+ send their changes to the standby).
+
+
+
+ WAL file control commands will not work during recovery
+ e.g. pg_start_backup>, pg_switch_xlog> etc..
+
+
+
+ Dynamically loadable modules work, including pg_stat_statements.
+
+
+
+ Advisory locks work normally in recovery, including deadlock detection.
+ Note that advisory locks are never WAL logged, so it is not possible for
+ an advisory lock on either the primary or the standby to conflict with WAL
+ replay. Nor is it possible to acquire an advisory lock on the primary
+ and have it initiate a similar advisory lock on the standby. Advisory
+ locks relate only to a single server on which they are acquired.
+
+
+
+ Trigger-based replication systems such as Slony>,
+ Londiste> and Bucardo> won't run on the
+ standby at all, though they will run happily on the primary server as
+ long as the changes are not sent to standby servers to be applied.
+ WAL replay is not trigger-based so you cannot relay from the
+ standby to any system that requires additional database writes or
+ relies on the use of triggers.
+
+
+
+ New oids cannot be assigned, though some UUID> generators may still
+ work as long as they do not rely on writing new status to the database.
+
+
+
+ Currently, temp table creation is not allowed during read only
+ transactions, so in some cases existing scripts will not run correctly.
+ It is possible we may relax that restriction in a later release. This is
+ both a SQL Standard compliance issue and a technical issue.
+
+
+
+ DROP TABLESPACE> can only succeed if the tablespace is empty.
+ Some standby users may be actively using the tablespace via their
+ temp_tablespaces> parameter. If there are temp files in the
+ tablespace we currently cancel all active queries to ensure that temp
+ files are removed, so that we can remove the tablespace and continue with
+ WAL replay.
+
+
+
+ Running DROP DATABASE>, ALTER DATABASE ... SET TABLESPACE>,
+ or ALTER DATABASE ... RENAME> on primary will generate a log message
+ that will cause all users connected to that database on the standby to be
+ forcibly disconnected. This action occurs immediately, whatever the setting of
+ max_standby_delay>.
+
+
+
+ In normal running, if you issue DROP USER> or DROP ROLE>
+ for a role with login capability while that user is still connected then
+ nothing happens to the connected user - they remain connected. The user cannot
+ reconnect however. This behaviour applies in recovery also, so a
+ DROP USER> on the primary does not disconnect that user on the standby.
+
+
+
+ Stats collector is active during recovery. All scans, reads, blocks,
+ index usage etc will all be recorded normally on the standby. Replayed
+ actions will not duplicate their effects on primary, so replaying an
+ insert will not increment the Inserts column of pg_stat_user_tables.
+ The stats file is deleted at start of recovery, so stats from primary
+ and standby will differ; this is considered a feature not a bug.
+
+
+
+ Autovacuum is not active during recovery, though will start normally
+ at the end of recovery.
+
+
+
+ Background writer is active during recovery and will perform
+ restartpoints (similar to checkpoints on primary) and normal block
+ cleaning activities. The CHECKPOINT> command is accepted during recovery,
+ though performs a restartpoint rather than a new checkpoint.
+
+
+
+
+ Hot Standby Parameter Reference
+
+
+ Various parameters have been mentioned above in the
+ and sections.
+
+
+
+ On the primary, parameters recovery_connections> and
+ vacuum_defer_cleanup_age> can be used to enable and control the
+ primary server to assist the successful configuration of Hot Standby servers.
+ max_standby_delay> has no effect if set on the primary.
+
+
+
+ On the standby, parameters recovery_connections> and
+ max_standby_delay> can be used to enable and control Hot Standby.
+ standby server to assist the successful configuration of Hot Standby servers.
+ vacuum_defer_cleanup_age> has no effect during recovery.
+
+
+
+
+ Caveats
+
+
+ At this writing, there are several limitations of Hot Standby.
+ These can and probably will be fixed in future releases:
+
+
+
+
+ Operations on hash indexes are not presently WAL-logged, so
+ replay will not update these indexes. Hash indexes will not be
+ used for query plans during recovery.
+
+
+
+
+ Full knowledge of running transactions is required before snapshots
+ may be taken. Transactions that take use large numbers of subtransactions
+ (currently greater than 64) will delay the start of read only
+ connections until the completion of the longest running write transaction.
+ If this situation occurs explanatory messages will be sent to server log.
+
+
+
+
+ Valid starting points for recovery connections are generated at each
+ checkpoint on the master. If the standby is shutdown while the master
+ is in a shutdown state it may not be possible to re-enter Hot Standby
+ until the primary is started up so that it generates further starting
+ points in the WAL logs. This is not considered a serious issue
+ because the standby is usually switched into the primary role while
+ the first node is taken down.
+
+
+
+
+ At the end of recovery, AccessExclusiveLocks held by prepared transactions
+ will require twice the normal number of lock table entries. If you plan
+ on running either a large number of concurrent prepared transactions
+ that normally take AccessExclusiveLocks, or you plan on having one
+ large transaction that takes many AccessExclusiveLocks then you are
+ advised to select a larger value of max_locks_per_transaction>,
+ up to, but never more than twice the value of the parameter setting on
+ the primary server in rare extremes. You need not consider this at all if
+ your setting of max_prepared_transactions> is 0>.
+
+
+
+
+
+
+
+
+
+
+ Incrementally Updated Backups
+
+
+ incrementally updated backups
+
+
+
+ change accumulation
+
+
+
+ In a warm standby configuration, it is possible to offload the expense of
+ taking periodic base backups from the primary server; instead base backups
+ can be made by backing
+ up a standby server's files. This concept is generally known as
+ incrementally updated backups, log change accumulation, or more simply,
+ change accumulation.
+
+
+
+ If we take a file system backup of the standby server's data
+ directory while it is processing
+ logs shipped from the primary, we will be able to reload that backup and
+ restart the standby's recovery process from the last restart point.
+ We no longer need to keep WAL files from before the standby's restart point.
+ If we need to recover, it will be faster to recover from the incrementally
+ updated backup than from the original base backup.
+
+
+
+ Since the standby server is not live>, it is not possible to
+ use pg_start_backup()> and pg_stop_backup()>
+ to manage the backup process; it will be up to you to determine how
+ far back you need to keep WAL segment files to have a recoverable
+ backup. You can do this by running pg_controldata>
+ on the standby server to inspect the control file and determine the
+ current checkpoint WAL location, or by using the
+ log_checkpoints> option to print values to the standby's
+ server log.
+
+
+