pg_probackup

pg_probackup — manage backup and recovery of Postgres Pro database clusters

Synopsis

pg_probackup version

pg_probackup help [command]

pg_probackup init -B backup_dir

pg_probackup add-instance -B backup_dir -D data_dir --instance instance_name

pg_probackup del-instance -B backup_dir --instance instance_name

pg_probackup set-config -B backup_dir --instance instance_name [option...]

pg_probackup set-backup -B backup_dir --instance instance_name -i backup_id [option...]

pg_probackup show-config -B backup_dir --instance instance_name [--format=format]

pg_probackup show -B backup_dir [option...]

pg_probackup backup -B backup_dir --instance instance_name -b backup_mode [option...]

pg_probackup restore -B backup_dir --instance instance_name [option...]

pg_probackup checkdb -B backup_dir --instance instance_name -D data_dir [option...]

pg_probackup validate -B backup_dir [option...]

pg_probackup merge -B backup_dir --instance instance_name -i backup_id [option...]

pg_probackup delete -B backup_dir --instance instance_name { -i backup_id | --delete-wal | --delete-expired | --merge-expired } [option...]

pg_probackup archive-push -B backup_dir --instance instance_name --wal-file-path wal_file_path --wal-file-name wal_file_name [option...]

pg_probackup archive-get -B backup_dir --instance instance_name --wal-file-path wal_file_path --wal-file-name wal_file_name [option...]

Description

pg_probackup is a utility to manage backup and recovery of Postgres Pro database clusters. It is designed to perform periodic backups of the Postgres Pro instance that enable you to restore the server in case of a failure. pg_probackup supports Postgres Pro 9.5 or higher.

Overview

As compared to other backup solutions, pg_probackup offers the following benefits that can help you implement different backup strategies and deal with large amounts of data:

  • Incremental backup: with three different incremental modes, you can plan the backup strategy in accordance with your data flow. Incremental backups allow you to save disk space and speed up backup as compared to taking full backups. It is also faster to restore the cluster by applying incremental backups than by replaying WAL files.

  • Validation: automatic data consistency checks and on-demand backup validation without actual data recovery.

  • Verification: on-demand verification of Postgres Pro instance with the checkdb command.

  • Retention: managing WAL archive and backups in accordance with retention policy. You can configure retention policy based on recovery time or the number of backups to keep, as well as specify time to live (TTL) for a particular backup. Expired backups can be merged or deleted.

  • Parallelization: running backup, restore, merge, delete, validate, and checkdb processes on multiple parallel threads.

  • Compression: storing backup data in a compressed state to save disk space.

  • Deduplication: saving disk space by not copying unchanged non-data files, such as _vm or _fsm.

  • Remote operations: backing up Postgres Pro instance located on a remote system or restoring a backup remotely.

  • Backup from standby: avoiding extra load on master by taking backups from a standby server.

  • External directories: backing up files and directories located outside of the Postgres Pro data directory (PGDATA), such as scripts, configuration files, logs, or SQL dump files.

  • Backup catalog: getting the list of backups and the corresponding meta information in plain text or JSON formats.

  • Archive catalog: getting the list of all WAL timelines and the corresponding meta information in plain text or JSON formats.

  • Partial restore: restoring only the specified databases.

To manage backup data, pg_probackup creates a backup catalog. This is a directory that stores all backup files with additional meta information, as well as WAL archives required for point-in-time recovery. You can store backups for different instances in separate subdirectories of a single backup catalog.

Using pg_probackup, you can take full or incremental backups:

  • FULL backups contain all the data files required to restore the database cluster.

  • Incremental backups operate at the page level, only storing the data that has changed since the previous backup. It allows you to save disk space and speed up the backup process as compared to taking full backups. It is also faster to restore the cluster by applying incremental backups than by replaying WAL files. pg_probackup supports the following modes of incremental backups:

    • DELTA backup. In this mode, pg_probackup reads all data files in the data directory and copies only those pages that have changed since the previous backup. This mode can impose read-only I/O pressure equal to a full backup.

    • PAGE backup. In this mode, pg_probackup scans all WAL files in the archive from the moment the previous full or incremental backup was taken. Newly created backups contain only the pages that were mentioned in WAL records. This requires all the WAL files since the previous backup to be present in the WAL archive. If the size of these files is comparable to the total size of the database cluster files, speedup is smaller, but the backup still takes less space. You have to configure WAL archiving as explained in Setting up continuous WAL archiving to make PAGE backups.

    • PTRACK backup. In this mode, Postgres Pro tracks page changes on the fly. Continuous archiving is not necessary for it to operate. Each time a relation page is updated, this page is marked in a special PTRACK bitmap for this relation. As one page requires just one bit in the PTRACK fork, such bitmaps are quite small. Tracking implies some minor overhead on the database server operation, but speeds up incremental backups significantly.

pg_probackup can take only physical online backups, and online backups require WAL for consistent recovery. So regardless of the chosen backup mode (FULL, PAGE or DELTA), any backup taken with pg_probackup must use one of the following WAL delivery modes:

  • ARCHIVE. Such backups rely on continuous archiving to ensure consistent recovery. This is the default WAL delivery mode.

  • STREAM. Such backups include all the files required to restore the cluster to a consistent state at the time the backup was taken. Regardless of continuous archiving having been set up or not, the WAL segments required for consistent recovery are streamed via replication protocol during backup and included into the backup files. That's why such backups are called autonomous, or standalone.

Limitations

pg_probackup currently has the following limitations:

  • pg_probackup only supports Postgres Pro 9.5 and higher.

  • The remote mode is not supported on Windows systems.

  • On Unix systems, for Postgres Pro 10 or higher, a backup can be made only by the same OS user that has started the Postgres Pro server. For example, if Postgres Pro server is started by user postgres, the backup command must also be run by user postgres. To satisfy this requirement when taking backups in the remote mode using SSH, you must set --remote-user option to postgres.

  • For PostgreSQL 9.5, functions pg_create_restore_point(text) and pg_switch_xlog() can be executed only if the backup role is a superuser, so backup of a cluster with low amount of WAL traffic by a non-superuser role can take longer than the backup of the same cluster by a superuser role.

  • The Postgres Pro server from which the backup was taken and the restored server must be compatible by the block_size and wal_block_size parameters and have the same major release number. Depending on cluster configuration, Postgres Pro itself may apply additional restrictions, such as CPU architecture or libc/libicu versions.

  • All backups in the incremental chain must belong to the same timeline. For example, if you have taken incremental backups on a standby server that gets promoted, you have to take another FULL backup.

Installation and Setup

Once you have pg_probackup installed, complete the following setup:

  • Initialize the backup catalog.

  • Add a new backup instance to the backup catalog.

  • Configure the database cluster to enable pg_probackup backups.

  • Optionally, configure SSH for running pg_probackup operations in the remote mode.

Initializing the Backup Catalog

pg_probackup stores all WAL and backup files in the corresponding subdirectories of the backup catalog.

To initialize the backup catalog, run the following command:

pg_probackup init -B backup_dir

where backup_dir is the path to the backup catalog. If the backup_dir already exists, it must be empty. Otherwise, pg_probackup returns an error.

The user launching pg_probackup must have full access to the backup_dir directory.

pg_probackup creates the backup_dir backup catalog, with the following subdirectories:

  • wal/ — directory for WAL files.

  • backups/ — directory for backup files.

Once the backup catalog is initialized, you can add a new backup instance.

Adding a New Backup Instance

pg_probackup can store backups for multiple database clusters in a single backup catalog. To set up the required subdirectories, you must add a backup instance to the backup catalog for each database cluster you are going to back up.

To add a new backup instance, run the following command:

pg_probackup add-instance -B backup_dir -D data_dir --instance instance_name [remote_options]

where:

  • data_dir is the data directory of the cluster you are going to back up. To set up and use pg_probackup, write access to this directory is required.

  • instance_name is the name of the subdirectories that will store WAL and backup files for this cluster.

  • remote_options are optional parameters that need to be specified only if data_dir is located on a remote system.

pg_probackup creates the instance_name subdirectories under the backups/ and wal/ directories of the backup catalog. The backups/instance_name directory contains the pg_probackup.conf configuration file that controls pg_probackup settings for this backup instance. If you run this command with the remote_options, the specified parameters will be added to pg_probackup.conf.

For details on how to fine-tune pg_probackup configuration, see the section called “Configuring pg_probackup.

The user launching pg_probackup must have full access to backup_dir directory and at least read-only access to data_dir directory. If you specify the path to the backup catalog in the BACKUP_PATH environment variable, you can omit the corresponding option when running pg_probackup commands.

Note

For Postgres Pro 11 or higher, it is recommended to use the group access feature, so that backup can be done by any OS user in the same group as the cluster owner. In this case, the user should have read permissions for the cluster directory.

Configuring the Database Cluster

Although pg_probackup can be used by a superuser, it is recommended to create a separate role with the minimum permissions required for the chosen backup strategy. In these configuration instructions, the backup role is used as an example.

To perform a backup, the following permissions for role backup are required only in the database used for connection to the Postgres Pro server:

For PostgreSQL 9.5:

BEGIN;
CREATE ROLE backup WITH LOGIN;
GRANT USAGE ON SCHEMA pg_catalog TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.current_setting(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_is_in_recovery() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_start_backup(text, boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_stop_backup() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_create_restore_point(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_switch_xlog() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_current() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_current_snapshot() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_snapshot_xmax(txid_snapshot) TO backup;
COMMIT;

For Postgres Pro 9.6:

BEGIN;
CREATE ROLE backup WITH LOGIN;
GRANT USAGE ON SCHEMA pg_catalog TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.current_setting(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_is_in_recovery() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_start_backup(text, boolean, boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_stop_backup(boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_create_restore_point(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_switch_xlog() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_last_xlog_replay_location() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_current() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_current_snapshot() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_snapshot_xmax(txid_snapshot) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_control_checkpoint() TO backup;
COMMIT;

For Postgres Pro 10 or higher:

BEGIN;
CREATE ROLE backup WITH LOGIN;
GRANT USAGE ON SCHEMA pg_catalog TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.current_setting(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_is_in_recovery() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_start_backup(text, boolean, boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_stop_backup(boolean, boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_create_restore_point(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_switch_wal() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_last_wal_replay_lsn() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_current() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_current_snapshot() TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.txid_snapshot_xmax(txid_snapshot) TO backup;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_control_checkpoint() TO backup;
COMMIT;

In the pg_hba.conf file, allow connection to the database cluster on behalf of the backup role.

Since pg_probackup needs to read cluster files directly, pg_probackup must be started by (or connected to, if used in the remote mode) the OS user that has read access to all files and directories inside the data directory (PGDATA) you are going to back up.

Depending on whether you plan to take standalone or archive backups, Postgres Pro cluster configuration will differ, as specified in the sections below. To back up the database cluster from a standby server, run pg_probackup in the remote mode, or create PTRACK backups, additional setup is required.

For details, see the sections Setting up STREAM Backups, Setting up continuous WAL archiving, Setting up Backup from Standby, Configuring the Remote Mode, Setting up Partial Restore, and Setting up PTRACK Backups.

Setting up STREAM Backups

To set up the cluster for STREAM backups, complete the following steps:

  • Grant the REPLICATION privilege to the backup role:

    ALTER ROLE backup WITH REPLICATION;
    
  • In the pg_hba.conf file, allow replication on behalf of the backup role.

  • Make sure the parameter max_wal_senders is set high enough to leave at least one session available for the backup process.

  • Set the parameter wal_level to be higher than minimal.

If you are planning to take PAGE backups in the STREAM mode or perform PITR with STREAM backups, you still have to configure WAL archiving, as explained in the section Setting up continuous WAL archiving.

Once these steps are complete, you can start taking FULL, PAGE, DELTA, and PTRACK backups in the STREAM WAL mode.

Setting up Continuous WAL Archiving

Making backups in PAGE backup mode, performing PITR and making backups with ARCHIVE WAL delivery mode require continuous WAL archiving to be enabled. To set up continuous archiving in the cluster, complete the following steps:

  • Make sure the wal_level parameter is higher than minimal.

  • If you are configuring archiving on master, archive_mode must be set to on or always. To perform archiving on standby, set this parameter to always.

  • Set the archive_command parameter, as follows:

    archive_command = 'install_dir/pg_probackup archive-push -B backup_dir --instance instance_name --wal-file-path=%p --wal-file-name=%f [remote_options]'
    

where install_dir is the installation directory of the pg_probackup version you are going to use, backup_dir and instance_name refer to the already initialized backup catalog instance for this database cluster, and remote_options only need to be specified to archive WAL on a remote host. For details about all possible archive-push parameters, see the section archive-push.

Once these steps are complete, you can start making backups in the ARCHIVE WAL mode, backups in the PAGE backup mode, as well as perform PITR.

You can view the current state of the WAL archive using the show command. For details, see the section called “Viewing WAL Archive Information”.

If you are planning to make PAGE backups and/or backups with ARCHIVE WAL mode from a standby server that generates a small amount of WAL traffic, without long waiting for WAL segment to fill up, consider setting the archive_timeout Postgres Pro parameter on master. The value of this parameter should be slightly lower than the --archive-timeout setting (5 minutes by default), so that there is enough time for the rotated segment to be streamed to standby and sent to WAL archive before the backup is aborted because of --archive-timeout.

Note

Instead of using the archive-push command provided by pg_probackup, you can use any other tool to set up continuous archiving as long as it delivers WAL segments into backup_dir/wal/instance_name directory. If compression is used, it should be gzip, and .gz suffix in filename is mandatory.

Note

Instead of configuring continuous archiving by setting the archive_mode and archive_command parameters, you can opt for using the pg_receivewal utility. In this case, pg_receivewal -D directory option should point to backup_dir/wal/instance_name directory. pg_probackup supports WAL compression that can be done by pg_receivewal. Zero Data Loss archive strategy can be achieved only by using pg_receivewal.

Setting up Backup from Standby

For Postgres Pro 9.6 or higher, pg_probackup can take backups from a standby server. This requires the following additional setup:

Once these steps are complete, you can start taking FULL, PAGE, DELTA, or PTRACK backups with appropriate WAL delivery mode: ARCHIVE or STREAM, from the standby server.

Backup from the standby server has the following limitations:

  • If the standby is promoted to the master during backup, the backup fails.

  • All WAL records required for the backup must contain sufficient full-page writes. This requires you to enable full_page_writes on the master, and not to use tools like pg_compresslog as archive_command to remove full-page writes from WAL files.

Setting up Cluster Verification

Logical verification of a database cluster requires the following additional setup. Role backup is used as an example:

  • Install the amcheck or amcheck_next extension in every database of the cluster:

    CREATE EXTENSION amcheck;
    
  • Grant the following permissions to the backup role in every database of the cluster:

GRANT SELECT ON TABLE pg_catalog.pg_am TO backup;
GRANT SELECT ON TABLE pg_catalog.pg_class TO backup;
GRANT SELECT ON TABLE pg_catalog.pg_database TO backup;
GRANT SELECT ON TABLE pg_catalog.pg_namespace TO backup;
GRANT SELECT ON TABLE pg_catalog.pg_extension TO backup;
GRANT EXECUTE ON FUNCTION bt_index_check(oid) TO backup;
GRANT EXECUTE ON FUNCTION bt_index_check(oid, bool) TO backup;

Setting up Partial Restore

If you are planning to use partial restore, complete the following additional step:

  • Grant the read-only access to pg_catalog.pg_database to the backup role only in the database used for connection to Postgres Pro server:

    GRANT SELECT ON TABLE pg_catalog.pg_database TO backup;
    

Configuring the Remote Mode

pg_probackup supports the remote mode that allows to perform backup, restore and WAL archiving operations remotely. In this mode, the backup catalog is stored on a local system, while Postgres Pro instance to backup and/or to restore is located on a remote system. Currently the only supported remote protocol is SSH.

Set up SSH

If you are going to use pg_probackup in remote mode via SSH, complete the following steps:

  • Install pg_probackup on both systems: backup_host and db_host.

  • For communication between the hosts set up the passwordless SSH connection between backup user on backup_host and postgres user on db_host:

    [backup@backup_host] ssh-copy-id postgres@db_host
    
  • If you are going to rely on continuous WAL archiving, set up passwordless SSH connection between postgres user on db_host and backup user on backup_host:

    [postgres@db_host] ssh-copy-id backup@backup_host
    

where:

  • backup_host is the system with backup catalog.

  • db_host is the system with Postgres Pro cluster.

  • backup is the OS user on backup_host used to run pg_probackup.

  • postgres is the OS user on db_host used to start the Postgres Pro cluster. For Postgres Pro 11 or higher a more secure approach can be used thanks to group access feature.

pg_probackup in the remote mode via SSH works as follows:

  • Only the following commands can be launched in the remote mode: add-instance, backup, restore, archive-push, archive-get.

  • When started in the remote mode, the main pg_probackup process on the local system connects to the remote system via SSH and launches one or more agent processes on the remote system, which are called remote agents. The number of remote agents is equal to the -j/--threads setting.

  • The main pg_probackup process uses remote agents to access remote files and transfer data between local and remote systems.

  • Remote agents try to minimize the network traffic and the number of round-trips between hosts.

  • The main process is usually started on backup_host and connects to db_host, but in case of archive-push and archive-get commands the main process is started on db_host and connects to backup_host.

  • Once data transfer is complete, remote agents are terminated and SSH connections are closed.

  • If an error condition is encountered by a remote agent, then all agents are terminated and error details are reported by the main pg_probackup process, which exits with an error.

  • Compression is always done on db_host, while decompression is always done on backup_host.

Note

You can impose additional restrictions on SSH settings to protect the system in the event of account compromise.

Setting up PTRACK Backups

The PTRACK backup mode can be used only for Postgres Pro Standard and Postgres Pro Enterprise installations, or patched vanilla PostgreSQL. Links to PTRACK patches can be found here.

If you are going to use PTRACK backups, complete the following additional steps:

  • Set the ptrack_enable parameter to on.

  • Grant the rights to execute PTRACK functions to the backup role in every database of the cluster:

    GRANT EXECUTE ON FUNCTION pg_catalog.pg_ptrack_clear() TO backup;
    GRANT EXECUTE ON FUNCTION pg_catalog.pg_ptrack_get_and_clear(oid, oid) TO backup;
    
  • The backup role must have access to all the databases of the cluster.

Usage

Creating a Backup

To create a backup, run the following command:

pg_probackup backup -B backup_dir --instance instance_name -b backup_mode

Where backup_mode can take one of the following values:

  • FULL — creates a full backup that contains all the data files of the cluster to be restored.

  • DELTA — reads all data files in the data directory and creates an incremental backup for pages that have changed since the previous backup.

  • PAGE — creates an incremental backup based on the WAL files that have been generated since the previous full or incremental backup was taken. Only changed blocks are read from data files.

  • PTRACK — creates an incremental backup tracking page changes on the fly.

When restoring a cluster from an incremental backup, pg_probackup relies on the parent full backup and all the incremental backups between them, which is called the backup chain. You must create at least one full backup before taking incremental ones.

ARCHIVE Mode

ARCHIVE is the default WAL delivery mode.

For example, to make a FULL backup in ARCHIVE mode, run:

pg_probackup backup -B backup_dir --instance instance_name -b FULL

ARCHIVE backups rely on continuous archiving to get WAL segments required to restore the cluster to a consistent state at the time the backup was taken.

When a backup is taken, pg_probackup ensures that WAL files containing WAL records between Start LSN and Stop LSN actually exist in backup_dir/wal/instance_name directory. pg_probackup also ensures that WAL records between Start LSN and Stop LSN can be parsed. This precaution eliminates the risk of silent WAL corruption.

STREAM Mode

STREAM is the optional WAL delivery mode.

For example, to make a FULL backup in the STREAM mode, add the --stream flag to the command from the previous example:

pg_probackup backup -B backup_dir --instance instance_name -b FULL --stream --temp-slot

The optional --temp-slot flag ensures that the required segments remain available if the WAL is rotated before the backup is complete.

Unlike backups in ARCHIVE mode, STREAM backups include all the WAL segments required to restore the cluster to a consistent state at the time the backup was taken.

During backup pg_probackup streams WAL files containing WAL records between Start LSN and Stop LSN to backup_dir/backups/instance_name/backup_id/database/pg_wal directory. To eliminate the risk of silent WAL corruption, pg_probackup also checks that WAL records between Start LSN and Stop LSN can be parsed.

Even if you are using continuous archiving, STREAM backups can still be useful in the following cases:

  • STREAM backups can be restored on the server that has no file access to WAL archive.

  • STREAM backups enable you to restore the cluster state at the point in time for which WAL files in archive are no longer available.

  • Backup in STREAM mode can be taken from a standby of a server that generates small amount of WAL traffic, without long waiting for WAL segment to fill up.

Page Validation

If data_checksums are enabled in the database cluster, pg_probackup uses this information to check correctness of data files during backup. While reading each page, pg_probackup checks whether the calculated checksum coincides with the checksum stored in the page header. This guarantees that the Postgres Pro instance and the backup itself have no corrupt pages. Note that pg_probackup reads database files directly from the filesystem, so under heavy write load during backup it can show false-positive checksum mismatches because of partial writes. If a page checksum mismatch occurs, the page is re-read and checksum comparison is repeated.

A page is considered corrupt if checksum comparison has failed more than 100 times. In this case, the backup is aborted.

Even if data checksums are not enabled, pg_probackup always performs sanity checks for page headers.

External Directories

To back up a directory located outside of the data directory, use the optional --external-dirs parameter that specifies the path to this directory. If you would like to add more than one external directory, you can provide several paths separated by colons on Linux systems or semicolons on Windows systems.

For example, to include /etc/dir1 and /etc/dir2 directories into the full backup of your instance_name instance that will be stored under the backup_dir directory on Linux, run:

pg_probackup backup -B backup_dir --instance instance_name -b FULL --external-dirs=/etc/dir1:/etc/dir2

For example, to include C:\dir1 and C:\dir2 directories into the full backup of your instance_name instance that will be stored under the backup_dir directory on Windows, run:

pg_probackup backup -B backup_dir --instance instance_name -b FULL --external-dirs=C:\dir1;C:\dir2

pg_probackup creates a separate subdirectory in the backup directory for each external directory. Since external directories included into different backups do not have to be the same, when you are restoring the cluster from an incremental backup, only those directories that belong to this particular backup will be restored. Any external directories stored in the previous backups will be ignored.

To include the same directories into each backup of your instance, you can specify them in the pg_probackup.conf configuration file using the set-config command with the --external-dirs option.

Performing Cluster Verification

To verify that Postgres Pro database cluster is not corrupt, run the following command:

pg_probackup checkdb [-B backup_dir [--instance instance_name]] [-D data_dir] [connection_options]

This command performs physical verification of all data files located in the specified data directory by running page header sanity checks, as well as block-level checksum verification if checksums are enabled. If a corrupt page is detected, checkdb continues cluster verification until all pages in the cluster are validated.

By default, similar page validation is performed automatically while a backup is taken by pg_probackup. The checkdb command enables you to perform such page validation on demand, without taking any backup copies, even if the cluster is not backed up using pg_probackup at all.

To perform cluster verification, pg_probackup needs to connect to the cluster to be verified. In general, it is enough to specify the backup instance of this cluster for pg_probackup to determine the required connection options. However, if -B and --instance options are ommitted, you have to provide connection options and data_dir via environment variables or command-line options.

Physical verification cannot detect logical inconsistencies, missing or nullified blocks and entire files, or similar anomalies. Extensions amcheck and amcheck_next provide a partial solution to these problems.

If you would like, in addition to physical verification, to verify all indexes in all databases using these extensions, you can specify the --amcheck flag when running the checkdb command:

pg_probackup checkdb -D data_dir --amcheck [connection_options]

You can skip physical verification by specifying the --skip-block-validation flag. In this case, you can omit backup_dir and data_dir options, only connection options are mandatory:

pg_probackup checkdb --amcheck --skip-block-validation [connection_options]

Logical verification can be done more thoroughly with the --heapallindexed flag by checking that all heap tuples that should be indexed are actually indexed, but at the higher cost of CPU, memory, and I/O consumption.

Validating a Backup

pg_probackup calculates checksums for each file in a backup during the backup process. The process of checking checksums of backup data files is called the backup validation. By default, validation is run immediately after the backup is taken and right before the restore, to detect possible backup corruption.

If you would like to skip backup validation, you can specify the --no-validate flag when running backup and restore commands.

To ensure that all the required backup files are present and can be used to restore the database cluster, you can run the validate command with the exact recovery target options you are going to use for recovery.

For example, to check that you can restore the database cluster from a backup copy up to transaction ID 4242, run this command:

pg_probackup validate -B backup_dir --instance instance_name --recovery-target-xid=4242

If validation completes successfully, pg_probackup displays the corresponding message. If validation fails, you will receive an error message with the exact time, transaction ID, and LSN up to which the recovery is possible.

If you specify backup_id via -i/--backup-id option, then only the backup copy with specified backup ID will be validated. If backup_id is specified with recovery target options, the validate command will check whether it is possible to restore the specified backup to the specified recovery target.

For example, to check that you can restore the database cluster from a backup copy with the PT8XFX backup ID up to the specified timestamp, run this command:

pg_probackup validate -B backup_dir --instance instance_name -i PT8XFX --recovery-target-time='2017-05-18 14:18:11+03'

If you specify the backup_id of an incremental backup, all its parents starting from FULL backup will be validated.

If you omit all the parameters, all backups are validated.

Restoring a Cluster

To restore the database cluster from a backup, run the restore command with at least the following options:

pg_probackup restore -B backup_dir --instance instance_name -i backup_id

where:

  • backup_dir is the backup catalog that stores all backup files and meta information.

  • instance_name is the backup instance for the cluster to be restored.

  • backup_id specifies the backup to restore the cluster from. If you omit this option, pg_probackup uses the latest valid backup available for the specified instance. If you specify an incremental backup to restore, pg_probackup automatically restores the underlying full backup and then sequentially applies all the necessary increments.

If the cluster to restore contains tablespaces, pg_probackup restores them to their original location by default. To restore tablespaces to a different location, use the --tablespace-mapping/-T option. Otherwise, restoring the cluster on the same host will fail if tablespaces are in use, because the backup would have to be written to the same directories.

When using the --tablespace-mapping/-T option, you must provide absolute paths to the old and new tablespace directories. If a path happens to contain an equals sign (=), escape it with a backslash. This option can be specified multiple times for multiple tablespaces. For example:

pg_probackup restore -B backup_dir --instance instance_name -D data_dir -j 4 -i backup_id -T tablespace1_dir=tablespace1_newdir -T tablespace2_dir=tablespace2_newdir

Once the restore command is complete, start the database service.

If you are restoring a STREAM backup, the restore is complete at once, with the cluster returned to a self-consistent state at the point when the backup was taken. For ARCHIVE backups, Postgres Pro replays all available archived WAL segments, so the cluster is restored to the latest state possible. You can change this behavior by using the recovery target options with the restore command. Note that using the recovery target options when restoring STREAM backup is possible if the WAL archive is available at least starting from the time the STREAM backup was taken.

To restore the cluster on a remote host, see the section Using pg_probackup in the Remote Mode.

Note

By default, the restore command validates the specified backup before restoring the cluster. If you run regular backup validations and would like to save time when restoring the cluster, you can specify the --no-validate flag to skip validation and speed up the recovery.

Partial Restore

If you have enabled partial restore before taking backups, you can restore only some of the databases using partial restore options with the restore commands.

To restore the specified databases only, run the restore command with the following options:

pg_probackup restore -B backup_dir --instance instance_name --db-include=database_name

The --db-include option can be specified multiple times. For example, to restore only databases db1 and db2, run the following command:

pg_probackup restore -B backup_dir --instance instance_name --db-include=db1 --db-include=db2

To exclude one or more databases from restore, use the --db-exclude option:

pg_probackup restore -B backup_dir --instance instance_name --db-exclude=database_name

The --db-exclude option can be specified multiple times. For example, to exclude the databases db1 and db2 from restore, run the following command:

pg_probackup restore -B backup_dir --instance instance_name -i backup_id --db-exclude=db1 --db-exclude=db2

Partial restore relies on lax behavior of Postgres Pro recovery process toward truncated files. For recovery to work properly, files of excluded databases are restored as files of zero size. After the Postgres Pro cluster is successfully started, you must drop the excluded databases using DROP DATABASE command.

Note

The template0 and template1 databases are always restored.

Performing Point-in-Time (PITR) Recovery

If you have enabled continuous WAL archiving before taking backups, you can restore the cluster to its state at an arbitrary point in time (recovery target) using recovery target options with the restore and validate commands.

If -i/--backup-id option is omitted, pg_probackup automatically chooses the backup that is the closest to the specified recovery target and starts the restore process, otherwise pg_probackup will try to restore backup_id to the specified recovery target.

  • To restore the cluster state at the exact time, specify the --recovery-target-time option, in the timestamp format. For example:

    pg_probackup restore -B backup_dir --instance instance_name --recovery-target-time='2017-05-18 14:18:11+03'
    
  • To restore the cluster state up to a specific transaction ID, use the --recovery-target-xid option:

    pg_probackup restore -B backup_dir --instance instance_name --recovery-target-xid=687
    
  • To restore the cluster state up to the specific LSN, use --recovery-target-lsn option:

    pg_probackup restore -B backup_dir --instance instance_name --recovery-target-lsn=16/B374D848
    
  • To restore the cluster state up to the specific named restore point, use --recovery-target-name option:

    pg_probackup restore -B backup_dir --instance instance_name --recovery-target-name='before_app_upgrade'
    
  • To restore the backup to the latest state available in the WAL archive, use --recovery-target option with latest value:

    pg_probackup restore -B backup_dir --instance instance_name --recovery-target='latest'
    
  • To restore the cluster to the earliest point of consistency, use --recovery-target option with the immediate value:

    pg_probackup restore -B backup_dir --instance instance_name --recovery-target='immediate'
    

Using pg_probackup in the Remote Mode

pg_probackup supports the remote mode that allows to perform backup and restore operations remotely via SSH. In this mode, the backup catalog is stored on a local system, while Postgres Pro instance to be backed up is located on a remote system. You must have pg_probackup installed on both systems.

Note

pg_probackup relies on passwordless SSH connection for communication between the hosts.

The typical workflow is as follows:

For example, to create an archive full backup of a Postgres Pro cluster located on a remote system with host address 192.168.0.2 on behalf of the postgres user via SSH connection through port 2302, run:

pg_probackup backup -B backup_dir --instance instance_name -b FULL --remote-user=postgres --remote-host=192.168.0.2 --remote-port=2302

To restore the latest available backup on a remote system with host address 192.168.0.2 on behalf of the postgres user via SSH connection through port 2302, run:

pg_probackup restore -B backup_dir --instance instance_name --remote-user=postgres --remote-host=192.168.0.2 --remote-port=2302

Restoring an ARCHIVE backup or performing PITR in the remote mode require additional information: destination address, port and username for establishing an SSH connection from the host with database to the host with the backup catalog. This information will be used by the restore_command to copy WAL segments from the archive to the Postgres Pro pg_wal directory.

To solve this problem, you can use Remote WAL Archive Options.

For example, to restore latest backup on remote system using remote mode through SSH connection to user postgres on host with address 192.168.0.2 via port 2302 and user backup on backup catalog host with address 192.168.0.3 via port 2303, run:

pg_probackup restore -B backup_dir --instance instance_name --remote-user=postgres --remote-host=192.168.0.2 --remote-port=2302 --archive-host=192.168.0.3 --archive-port=2303 --archive-user=backup

Provided arguments will be used to construct the restore_command in recovery.conf:

restore_command = 'install_dir/pg_probackup archive-get -B backup_dir --instance instance_name --wal-file-path=%p --wal-file-name=%f --remote-host=192.168.0.3 --remote-port=2303 --remote-user=backup'

Alternatively, you can use the --restore-command option to provide the entire restore_command:

pg_probackup restore -B backup_dir --instance instance_name --remote-user=postgres --remote-host=192.168.0.2 --remote-port=2302 --restore-command='install_dir/pg_probackup archive-get -B backup_dir --instance instance_name --wal-file-path=%p --wal-file-name=%f --remote-host=192.168.0.3 --remote-port=2303 --remote-user=backup'

Note

The remote mode is currently unavailable for Windows systems.

Running pg_probackup on Parallel Threads

backup, restore, merge, delete, checkdb and validate processes can be executed on several parallel threads. This can significantly speed up pg_probackup operation given enough resources (CPU cores, disk, and network bandwidth).

Parallel execution is controlled by the -j/--threads command-line option. For example, to create a backup using four parallel threads, run:

pg_probackup backup -B backup_dir --instance instance_name -b FULL -j 4

Note

Parallel restore applies only to copying data from the backup catalog to the data directory of the cluster. When Postgres Pro server is started, WAL records need to be replayed, and this cannot be done in parallel.

Configuring pg_probackup

Once the backup catalog is initialized and a new backup instance is added, you can use the pg_probackup.conf configuration file located in the backup_dir/backups/instance_name directory to fine-tune pg_probackup configuration.

For example, backup and checkdb commands use a regular Postgres Pro connection. To avoid specifying connection options each time on the command line, you can set them in the pg_probackup.conf configuration file using the set-config command.

Note

It is not recommended to edit pg_probackup.conf manually.

Initially, pg_probackup.conf contains the following settings:

  • PGDATA — the path to the data directory of the cluster to back up.

  • system-identifier — the unique identifier of the Postgres Pro instance.

Additionally, you can define remote, retention, logging, and compression settings using the set-config command:

pg_probackup set-config -B backup_dir --instance instance_name
[--external-dirs=external_directory_path] [remote_options] [connection_options] [retention_options] [logging_options]

To view the current settings, run the following command:

pg_probackup show-config -B backup_dir --instance instance_name

You can override the settings defined in pg_probackup.conf when running pg_probackup commands via the corresponding environment variables and/or command line options.

Specifying Connection Settings

If you define connection settings in the pg_probackup.conf configuration file, you can omit connection options in all the subsequent pg_probackup commands. However, if the corresponding environment variables are set, they get higher priority. The options provided on the command line overwrite both environment variables and configuration file settings.

If nothing is given, the default values are taken. By default pg_probackup tries to use local connection via Unix domain socket (localhost on Windows) and tries to get the database name and the user name from the PGUSER environment variable or the current OS user name.

Managing the Backup Catalog

With pg_probackup, you can manage backups from the command line:

Viewing Backup Information

To view the list of existing backups for every instance, run the command:

pg_probackup show -B backup_dir

pg_probackup displays the list of all the available backups. For example:

BACKUP INSTANCE 'node'
======================================================================================================================================
 Instance  Version  ID      Recovery time           Mode    WAL Mode  TLI  Time    Data   WAL  Zratio  Start LSN   Stop LSN    Status
======================================================================================================================================
 node      10       PYSUE8  2019-10-03 15:51:48+03  FULL    ARCHIVE   1/0   16s  9047kB  16MB    4.31  0/12000028  0/12000160  OK
 node      10       P7XDQV  2018-04-29 05:32:59+03  DELTA   STREAM    1/1   11s    19MB  16MB    1.00  0/15000060  0/15000198  OK
 node      10       P7XDJA  2018-04-29 05:28:36+03  PTRACK  STREAM    1/1   21s    32MB  32MB    1.00  0/13000028  0/13000198  OK
 node      10       P7XDHU  2018-04-29 05:27:59+03  PAGE    STREAM    1/1   15s    33MB  16MB    1.00  0/11000028  0/110001D0  OK
 node      10       P7XDHB  2018-04-29 05:27:15+03  FULL    STREAM    1/0   11s    39MB  16MB    1.00  0/F000028   0/F000198   OK

For each backup, the following information is provided:

  • Instance — the instance name.

  • VersionPostgres Pro major version.

  • ID — the backup identifier.

  • Recovery time — the earliest moment for which you can restore the state of the database cluster.

  • Mode — the method used to take this backup. Possible values: FULL, PAGE, DELTA, PTRACK.

  • WAL Mode — WAL delivery mode. Possible values: STREAM and ARCHIVE.

  • TLI — timeline identifiers of the current backup and its parent.

  • Time — the time it took to perform the backup.

  • Data — the size of the data files in this backup. This value does not include the size of WAL files. For STREAM backups, the total size of the backup can be calculated as Data + WAL.

  • WAL — the uncompressed size of WAL files that need to be applied during recovery for the backup to reach a consistent state.

  • Zratio — compression ratio calculated as uncompressed-bytes / data-bytes.

  • Start LSN — WAL log sequence number corresponding to the start of the backup process. REDO point for Postgres Pro recovery process to start from.

  • Stop LSN — WAL log sequence number corresponding to the end of the backup process. Consistency point for Postgres Pro recovery process.

  • Status — backup status. Possible values:

    • OK — the backup is complete and valid.

    • DONE — the backup is complete, but was not validated.

    • RUNNING — the backup is in progress.

    • MERGING — the backup is being merged.

    • DELETING — the backup files are being deleted.

    • CORRUPT — some of the backup files are corrupt.

    • ERROR — the backup was aborted because of an unexpected error.

    • ORPHAN — the backup is invalid because one of its parent backups is corrupt or missing.

You can restore the cluster from the backup only if the backup status is OK or DONE.

To get more detailed information about the backup, run the show command with the backup ID:

pg_probackup show -B backup_dir --instance instance_name -i backup_id

The sample output is as follows:

#Configuration
backup-mode = FULL
stream = false
compress-alg = zlib
compress-level = 1
from-replica = false

#Compatibility
block-size = 8192
wal-block-size = 8192
checksum-version = 1
program-version = 2.1.3
server-version = 10

#Result backup info
timelineid = 1
start-lsn = 0/04000028
stop-lsn = 0/040000f8
start-time = '2017-05-16 12:57:29'
end-time = '2017-05-16 12:57:31'
recovery-xid = 597
recovery-time = '2017-05-16 12:57:31'
expire-time = '2020-05-16 12:57:31'
data-bytes = 22288792
wal-bytes = 16777216
uncompressed-bytes = 39961833
pgdata-bytes = 39859393
status = OK
parent-backup-id = 'PT8XFX'
primary_conninfo = 'user=backup passfile=/var/lib/pgsql/.pgpass port=5432 sslmode=disable sslcompression=1 target_session_attrs=any'

Detailed output has additional attributes:

  • compress-alg — compression algorithm used during backup. Possible values: zlib, pglz, none.

  • compress-level — compression level used during backup.

  • from-replica — was this backup taken on standby? Possible values: 1, 0.

  • block-size — the block_size setting of Postgres Pro cluster at the backup start.

  • checksum-version — are data_checksums enabled in the backed up Postgres Pro cluster? Possible values: 1, 0.

  • program-version — full version of pg_probackup binary used to create the backup.

  • start-time — the backup start time.

  • end-time — the backup end time.

  • expire-time — the point in time when a pinned backup can be removed by retention purge.

  • uncompressed-bytes — the size of data files before adding page headers and applying compression. You can evaluate the effectiveness of compression by comparing uncompressed-bytes to data-bytes if compression if used.

  • pgdata-bytes — the size of Postgres Pro cluster data files at the time of backup. You can evaluate the effectiveness of an incremental backup by comparing pgdata-bytes to uncompressed-bytes.

  • recovery-xid — transaction ID at the backup end time.

  • parent-backup-id — ID of the parent backup. Available only for incremental backups.

  • primary_conninfolibpq connection parameters used to connect to the Postgres Pro cluster to take this backup. The password is not included.

You can also get the detailed information about the backup in the JSON format:

pg_probackup show -B backup_dir --instance instance_name --format=json -i backup_id

The sample output is as follows:

[
  {
      "instance": "node",
      "backups": [
          {
              "id": "PT91HZ",
              "parent-backup-id": "PT8XFX",
              "backup-mode": "DELTA",
              "wal": "ARCHIVE",
              "compress-alg": "zlib",
              "compress-level": 1,
              "from-replica": false,
              "block-size": 8192,
              "xlog-block-size": 8192,
              "checksum-version": 1,
              "program-version": "2.1.3",
              "server-version": "10",
              "current-tli": 16,
              "parent-tli": 2,
              "start-lsn": "0/8000028",
              "stop-lsn": "0/8000160",
              "start-time": "2019-06-17 18:25:11+03",
              "end-time": "2019-06-17 18:25:16+03",
              "recovery-xid": 0,
              "recovery-time": "2019-06-17 18:25:15+03",
              "data-bytes": 106733,
              "wal-bytes": 16777216,
              "primary_conninfo": "user=backup passfile=/var/lib/pgsql/.pgpass port=5432 sslmode=disable sslcompression=1 target_session_attrs=any",
              "status": "OK"
          }
      ]
  }
]

Viewing WAL Archive Information

To view the information about WAL archive for every instance, run the command:

pg_probackup show -B backup_dir [--instance instance_name] --archive

pg_probackup displays the list of all the available WAL files grouped by timelines. For example:

ARCHIVE INSTANCE 'node'
===================================================================================================================
 TLI  Parent TLI  Switchpoint  Min Segno         Max Segno         N segments  Size    Zratio  N backups  Status
===================================================================================================================
 5    1           0/B000000    000000000000000B  000000000000000C  2           685kB   48.00   0          OK
 4    3           0/18000000   0000000000000018  000000000000001A  3           648kB   77.00   0          OK
 3    2           0/15000000   0000000000000015  0000000000000017  3           648kB   77.00   0          OK
 2    1           0/B000108    000000000000000B  0000000000000015  5           892kB   94.00   1          DEGRADED
 1    0           0/0          0000000000000001  000000000000000A  10          8774kB  19.00   1          OK

For each timeline, the following information is provided:

  • TLI — timeline identifier.

  • Parent TLI — identifier of the timeline from which this timeline branched off.

  • Switchpoint — LSN of the moment when the timeline branched off from its parent timeline.

  • Min Segno — the first WAL segment belonging to the timeline.

  • Max Segno — the last WAL segment belonging to the timeline.

  • N segments — number of WAL segments belonging to the timeline.

  • Size — the size that files take on disk.

  • Zratio — compression ratio calculated as N segments * wal_segment_size * wal_block_size / Size.

  • N backups — number of backups belonging to the timeline. To get the details about backups, use the JSON format.

  • Status — status of the WAL archive for this timeline. Possible values:

    • OK — all WAL segments between Min Segno and Max Segno are present.

    • DEGRADED — some WAL segments between Min Segno and Max Segno are missing. To find out which files are lost, view this report in the JSON format.

To get more detailed information about the WAL archive in the JSON format, run the command:

pg_probackup show -B backup_dir [--instance instance_name] --archive --format=json

The sample output is as follows:

[
  {
      "instance": "replica",
      "timelines": [
          {
              "tli": 5,
              "parent-tli": 1,
              "switchpoint": "0/B000000",
              "min-segno": "000000000000000B",
              "max-segno": "000000000000000C",
              "n-segments": 2,
              "size": 685320,
              "zratio": 48.00,
              "closest-backup-id": "PXS92O",
              "status": "OK",
              "lost-segments": [],
              "backups": []
          },
          {
              "tli": 4,
              "parent-tli": 3,
              "switchpoint": "0/18000000",
              "min-segno": "0000000000000018",
              "max-segno": "000000000000001A",
              "n-segments": 3,
              "size": 648625,
              "zratio": 77.00,
              "closest-backup-id": "PXS9CE",
              "status": "OK",
              "lost-segments": [],
              "backups": []
          },
          {
              "tli": 3,
              "parent-tli": 2,
              "switchpoint": "0/15000000",
              "min-segno": "0000000000000015",
              "max-segno": "0000000000000017",
              "n-segments": 3,
              "size": 648911,
              "zratio": 77.00,
              "closest-backup-id": "PXS9CE",
              "status": "OK",
              "lost-segments": [],
              "backups": []
          },
          {
              "tli": 2,
              "parent-tli": 1,
              "switchpoint": "0/B000108",
              "min-segno": "000000000000000B",
              "max-segno": "0000000000000015",
              "n-segments": 5,
              "size": 892173,
              "zratio": 94.00,
              "closest-backup-id": "PXS92O",
              "status": "DEGRADED",
              "lost-segments": [
                  {
                      "begin-segno": "000000000000000D",
                      "end-segno": "000000000000000E"
                  },
                  {
                      "begin-segno": "0000000000000010",
                      "end-segno": "0000000000000012"
                  }
              ],
              "backups": [
                  {
                      "id": "PXS9CE",
                      "backup-mode": "FULL",
                      "wal": "ARCHIVE",
                      "compress-alg": "none",
                      "compress-level": 1,
                      "from-replica": "false",
                      "block-size": 8192,
                      "xlog-block-size": 8192,
                      "checksum-version": 1,
                      "program-version": "2.1.5",
                      "server-version": "10",
                      "current-tli": 2,
                      "parent-tli": 0,
                      "start-lsn": "0/C000028",
                      "stop-lsn": "0/C000160",
                      "start-time": "2019-09-13 21:43:26+03",
                      "end-time": "2019-09-13 21:43:30+03",
                      "recovery-xid": 0,
                      "recovery-time": "2019-09-13 21:43:29+03",
                      "data-bytes": 104674852,
                      "wal-bytes": 16777216,
                      "primary_conninfo": "user=backup passfile=/var/lib/pgsql/.pgpass port=5432 sslmode=disable sslcompression=1 target_session_attrs=any",
                      "status": "OK"
                  }
              ]
          },
          {
              "tli": 1,
              "parent-tli": 0,
              "switchpoint": "0/0",
              "min-segno": "0000000000000001",
              "max-segno": "000000000000000A",
              "n-segments": 10,
              "size": 8774805,
              "zratio": 19.00,
              "closest-backup-id": "",
              "status": "OK",
              "lost-segments": [],
              "backups": [
                  {
                      "id": "PXS92O",
                      "backup-mode": "FULL",
                      "wal": "ARCHIVE",
                      "compress-alg": "none",
                      "compress-level": 1,
                      "from-replica": "true",
                      "block-size": 8192,
                      "xlog-block-size": 8192,
                      "checksum-version": 1,
                      "program-version": "2.1.5",
                      "server-version": "10",
                      "current-tli": 1,
                      "parent-tli": 0,
                      "start-lsn": "0/4000028",
                      "stop-lsn": "0/6000028",
                      "start-time": "2019-09-13 21:37:36+03",
                      "end-time": "2019-09-13 21:38:45+03",
                      "recovery-xid": 0,
                      "recovery-time": "2019-09-13 21:37:30+03",
                      "data-bytes": 25987319,
                      "wal-bytes": 50331648,
                      "primary_conninfo": "user=backup passfile=/var/lib/pgsql/.pgpass port=5432 sslmode=disable sslcompression=1 target_session_attrs=any",
                      "status": "OK"
                  }
              ]
          }
      ]
  },
  {
      "instance": "master",
      "timelines": [
          {
              "tli": 1,
              "parent-tli": 0,
              "switchpoint": "0/0",
              "min-segno": "0000000000000001",
              "max-segno": "000000000000000B",
              "n-segments": 11,
              "size": 8860892,
              "zratio": 20.00,
              "status": "OK",
              "lost-segments": [],
              "backups": [
                  {
                      "id": "PXS92H",
                      "parent-backup-id": "PXS92C",
                      "backup-mode": "PAGE",
                      "wal": "ARCHIVE",
                      "compress-alg": "none",
                      "compress-level": 1,
                      "from-replica": "false",
                      "block-size": 8192,
                      "xlog-block-size": 8192,
                      "checksum-version": 1,
                      "program-version": "2.1.5",
                      "server-version": "10",
                      "current-tli": 1,
                      "parent-tli": 1,
                      "start-lsn": "0/4000028",
                      "stop-lsn": "0/50000B8",
                      "start-time": "2019-09-13 21:37:29+03",
                      "end-time": "2019-09-13 21:37:31+03",
                      "recovery-xid": 0,
                      "recovery-time": "2019-09-13 21:37:30+03",
                      "data-bytes": 1328461,
                      "wal-bytes": 33554432,
                      "primary_conninfo": "user=backup passfile=/var/lib/pgsql/.pgpass port=5432 sslmode=disable sslcompression=1 target_session_attrs=any",
                      "status": "OK"
                  },
                  {
                      "id": "PXS92C",
                      "backup-mode": "FULL",
                      "wal": "ARCHIVE",
                      "compress-alg": "none",
                      "compress-level": 1,
                      "from-replica": "false",
                      "block-size": 8192,
                      "xlog-block-size": 8192,
                      "checksum-version": 1,
                      "program-version": "2.1.5",
                      "server-version": "10",
                      "current-tli": 1,
                      "parent-tli": 0,
                      "start-lsn": "0/2000028",
                      "stop-lsn": "0/2000160",
                      "start-time": "2019-09-13 21:37:24+03",
                      "end-time": "2019-09-13 21:37:29+03",
                      "recovery-xid": 0,
                      "recovery-time": "2019-09-13 21:37:28+03",
                      "data-bytes": 24871902,
                      "wal-bytes": 16777216,
                      "primary_conninfo": "user=backup passfile=/var/lib/pgsql/.pgpass port=5432 sslmode=disable sslcompression=1 target_session_attrs=any",
                      "status": "OK"
                  }
              ]
          }
      ]
  }
]

Most fields are consistent with the plain format, with some exceptions:

  • The size is in bytes.

  • The closest-backup-id attribute contains the ID of the most recent valid backup that belongs to one of the previous timelines. You can use this backup to perform point-in-time recovery to this timeline. If such a backup does not exist, this string is empty.

  • The lost-segments array provides with information about intervals of missing segments in DEGRADED timelines. In OK timelines, the lost-segments array is empty.

  • The backups array lists all backups belonging to the timeline. If the timeline has no backups, this array is empty.

Configuring Retention Policy

With pg_probackup, you can set retention policies for backups and WAL archive. All policies can be combined together in any way.

Backup Retention Policy

By default, all backup copies created with pg_probackup are stored in the specified backup catalog. To save disk space, you can configure retention policy and periodically clean up redundant backup copies accordingly.

To configure retention policy, set one or more of the following variables in the pg_probackup.conf file via set-config:

--retention-redundancy=redundancy

Specifies the number of full backup copies to keep in the backup catalog.

--retention-window=window

Defines the earliest point in time for which pg_probackup can complete the recovery. This option is set in the number of days from the current moment. For example, if retention-window=7, pg_probackup must delete all backup copies that are older than seven days, with all the corresponding WAL files.

If both --retention-redundancy and --retention-window options are set, pg_probackup keeps backup copies that satisfy at least one condition. For example, if you set --retention-redundancy=2 and --retention-window=7, pg_probackup purges the backup catalog to keep only two full backup copies and all backups that are newer than seven days:

pg_probackup set-config -B backup_dir --instance instance_name --retention-redundancy=2 --retention-window=7

To clean up the backup catalog in accordance with retention policy, run:

pg_probackup delete -B backup_dir --instance instance_name --delete-expired

pg_probackup deletes all backup copies that do not conform to the defined retention policy.

If you would like to also remove the WAL files that are no longer required for any of the backups, add the --delete-wal flag:

pg_probackup delete -B backup_dir --instance instance_name --delete-expired --delete-wal

Note

Alternatively, you can use the --delete-expired, --merge-expired, --delete-wal flags and the --retention-window and --retention-redundancy options together with the backup command to remove and merge the outdated backup copies once the new backup is created.

You can set or override the current retention policy by specifying --retention-redundancy and --retention-window options directly when running delete or backup commands:

pg_probackup delete -B backup_dir --instance instance_name --delete-expired --retention-window=7 --retention-redundancy=2

Since incremental backups require that their parent full backup and all the preceding incremental backups are available, if any of such backups expire, they still cannot be removed while at least one incremental backup in this chain satisfies the retention policy. To avoid keeping expired backups that are still required to restore an active incremental one, you can merge them with this backup using the --merge-expired flag when running backup or delete commands.

Suppose you have backed up the node instance in the backup_dir directory, with the --retention-window option set to 7, and you have the following backups available on April 10, 2019:

BACKUP INSTANCE 'node'
===================================================================================================================================
 Instance  Version  ID      Recovery time           Mode   WAL     TLI  Time    Data   WAL  Zratio  Start LSN   Stop LSN    Status
===================================================================================================================================
 node      10       P7XDHR  2019-04-10 05:27:15+03  FULL   STREAM  1/0   11s   200MB  16MB     1.0  0/18000059  0/18000197  OK
 node      10       P7XDQV  2019-04-08 05:32:59+03  PAGE   STREAM  1/0   11s    19MB  16MB     1.0  0/15000060  0/15000198  OK
 node      10       P7XDJA  2019-04-03 05:28:36+03  DELTA  STREAM  1/0   21s    32MB  16MB     1.0  0/13000028  0/13000198  OK
 -------------------------------------------------------retention window--------------------------------------------------------
 node      10       P7XDHU  2019-04-02 05:27:59+03  PAGE   STREAM  1/0   31s    33MB  16MB     1.0  0/11000028  0/110001D0  OK
 node      10       P7XDHB  2019-04-01 05:27:15+03  FULL   STREAM  1/0   11s   200MB  16MB     1.0  0/F000028   0/F000198   OK
 node      10       P7XDFT  2019-03-29 05:26:25+03  FULL   STREAM  1/0   11s   200MB  16MB     1.0  0/D000028   0/D000198   OK

Even though P7XDHB and P7XDHU backups are outside the retention window, they cannot be removed as it invalidates the succeeding incremental backups P7XDJA and P7XDQV that are still required, so, if you run the delete command with the --delete-expired flag, only the P7XDFT full backup will be removed.

With the --merge-expired option, the P7XDJA backup is merged with the underlying P7XDHU and P7XDHB backups and becomes a full one, so there is no need to keep these expired backups anymore:

pg_probackup delete -B backup_dir --instance node --delete-expired --merge-expired
pg_probackup show -B backup_dir
BACKUP INSTANCE 'node'
==================================================================================================================================
 Instance  Version  ID      Recovery time           Mode  WAL     TLI  Time    Data   WAL  Zratio  Start LSN   Stop LSN    Status
==================================================================================================================================
 node      10       P7XDHR  2019-04-10 05:27:15+03  FULL  STREAM  1/0   11s   200MB  16MB     1.0  0/18000059  0/18000197  OK
 node      10       P7XDQV  2019-04-08 05:32:59+03  PAGE  STREAM  1/0   11s    19MB  16MB     1.0  0/15000060  0/15000198  OK
 node      10       P7XDJA  2019-04-03 05:28:36+03  FULL  STREAM  1/0   21s    32MB  16MB     1.0  0/13000028  0/13000198  OK

The Time field for the merged backup displays the time required for the merge.

Backup Pinning

If you need to keep certain backups longer than the established retention policy allows, you can pin them for arbitrary time. For example:

pg_probackup set-backup -B backup_dir --instance instance_name -i backup_id --ttl=30d

This command sets the expiration time of the specified backup to 30 days starting from the time indicated in its recovery-time attribute.

You can also explicitly set the expiration time for a backup using the --expire-time option. For example:

pg_probackup set-backup -B backup_dir --instance instance_name -i backup_id --expire-time='2020-01-01 00:00:00+03'

Alternatively, you can use the --ttl and --expire-time options with the backup command to pin the newly created backup:

pg_probackup backup -B backup_dir --instance instance_name -b FULL --ttl=30d
pg_probackup backup -B backup_dir --instance instance_name -b FULL --expire-time='2020-01-01 00:00:00+03'

To check if the backup is pinned, run the show command:

pg_probackup show -B backup_dir --instance instance_name -i backup_id

If the backup is pinned, the expire-time attribute displays its expiration time:

...
recovery-time = '2017-05-16 12:57:31'
expire-time = '2020-01-01 00:00:00+03'
data-bytes = 22288792
...

Only pinned backups have the expire-time attribute in the backup metadata.

Note

A pinned incremental backup implicitly pins all its parent backups.

You can unpin the backup by setting the --ttl option to zero using the set-backup command. For example:

pg_probackup set-backup -B backup_dir --instance instance_name -i backup_id --ttl=0

WAL Archive Retention Policy

By default, pg_probackup purges only redundant WAL segments that cannot be applied to any of the backups in the backup catalog. To save disk space, you can configure WAL archive retention policy, which allows to keep WAL of limited depth measured in backups per timeline.

Suppose you have backed up the node instance in the backup_dir directory and configured continuous WAL archiving:

pg_probackup show -B backup_dir --instance node
BACKUP INSTANCE 'node'
====================================================================================================================================
 Instance  Version  ID      Recovery Time           Mode   WAL Mode  TLI  Time   Data   WAL  Zratio  Start LSN   Stop LSN    Status
====================================================================================================================================
 node      11       PZ9442  2019-10-12 10:43:21+03  DELTA  STREAM    1/0   10s  121kB  16MB    1.00  0/46000028  0/46000160  OK
 node      11       PZ943L  2019-10-12 10:43:04+03  FULL   STREAM    1/0   10s  180MB  32MB    1.00  0/44000028  0/44000160  OK
 node      11       PZ7YR5  2019-10-11 19:49:56+03  DELTA  STREAM    1/1   10s  112kB  32MB    1.00  0/41000028  0/41000160  OK
 node      11       PZ7YMP  2019-10-11 19:47:16+03  DELTA  STREAM    1/1   10s  376kB  32MB    1.00  0/3E000028  0/3F0000B8  OK
 node      11       PZ7YK2  2019-10-11 19:45:45+03  FULL   STREAM    1/0   11s  180MB  16MB    1.00  0/3C000028  0/3C000198  OK
 node      11       PZ7YFO  2019-10-11 19:43:04+03  FULL   STREAM    1/0   10s   30MB  16MB    1.00  0/2000028   0/200ADD8   OK

You can check the state of the WAL archive by running the show command with the --archive flag:

pg_probackup show -B backup_dir --instance node --archive
ARCHIVE INSTANCE 'node'
===============================================================================================================
 TLI  Parent TLI  Switchpoint  Min Segno         Max Segno         N segments  Size  Zratio  N backups  Status
===============================================================================================================
 1    0           0/0          0000000000000001  0000000000000047  71          36MB  31.00   6          OK

WAL purge without --wal-depth cannot achieve much, only one segment is removed:

pg_probackup delete -B backup_dir --instance node --delete-wal
ARCHIVE INSTANCE 'node'
===============================================================================================================
 TLI  Parent TLI  Switchpoint  Min Segno         Max Segno         N segments  Size  Zratio  N backups  Status
===============================================================================================================
 1    0           0/0          0000000000000002  0000000000000047  70          34MB  32.00   6          OK

If you would like, for example, to keep only those WAL segments that can be applied to the last valid backup, use the --wal-depth option:

pg_probackup delete -B backup_dir --instance node --delete-wal --wal-depth=1
ARCHIVE INSTANCE 'node'
================================================================================================================
 TLI  Parent TLI  Switchpoint  Min Segno         Max Segno         N segments  Size   Zratio  N backups  Status
================================================================================================================
 1    0           0/0          0000000000000046  0000000000000047  2           143kB  228.00  6          OK

Alternatively, you can use the --wal-depth option with the backup command:

pg_probackup backup -B backup_dir --instance node -b DELTA --wal-depth=1 --delete-wal
ARCHIVE INSTANCE 'node'
===============================================================================================================
 TLI  Parent TLI  Switchpoint  Min Segno         Max Segno         N segments  Size  Zratio  N backups  Status
===============================================================================================================
 1    0           0/0          0000000000000048  0000000000000049  1           72kB  228.00  7          OK

Merging Backups

As you take more and more incremental backups, the total size of the backup catalog can substantially grow. To save disk space, you can merge incremental backups to their parent full backup by running the merge command, specifying the backup ID of the most recent incremental backup you would like to merge:

pg_probackup merge -B backup_dir --instance instance_name -i backup_id

This command merges the specified incremental backup to its parent full backup, together with all incremental backups between them. Once the merge is complete, the incremental backups are removed as redundant. Thus, the merge operation is virtually equivalent to retaking a full backup and removing all the outdated backups, but it allows to save much time, especially for large data volumes, as well as I/O and network traffic if you are using pg_probackup in the remote mode.

Before the merge, pg_probackup validates all the affected backups to ensure that they are valid. You can check the current backup status by running the show command with the backup ID:

pg_probackup show -B backup_dir --instance instance_name -i backup_id

If the merge is still in progress, the backup status is displayed as MERGING. The merge is idempotent, so you can restart the merge if it was interrupted.

Deleting Backups

To delete a backup that is no longer required, run the following command:

pg_probackup delete -B backup_dir --instance instance_name -i backup_id

This command will delete the backup with the specified backup_id, together with all the incremental backups that descend from backup_id, if any. This way you can delete some recent incremental backups, retaining the underlying full backup and some of the incremental backups that follow it.

To delete obsolete WAL files that are not necessary to restore any of the remaining backups, use the --delete-wal flag:

pg_probackup delete -B backup_dir --instance instance_name --delete-wal

To delete backups that are expired according to the current retention policy, use the --delete-expired flag:

pg_probackup delete -B backup_dir --instance instance_name --delete-expired

Expired backups cannot be removed while at least one incremental backup that satisfies the retention policy is based on them. If you would like to minimize the number of backups still required to keep incremental backups valid, specify the --merge-expired flag when running this command:

pg_probackup delete -B backup_dir --instance instance_name --delete-expired --merge-expired

In this case, pg_probackup searches for the oldest incremental backup that satisfies the retention policy and merges this backup with the underlying full and incremental backups that have already expired, thus making it a full backup. Once the merge is complete, the remaining expired backups are deleted.

Before merging or deleting backups, you can run the delete command with the --dry-run flag, which displays the status of all the available backups according to the current retention policy, without performing any irreversible actions.

Command-Line Reference

Commands

This section describes pg_probackup commands. Optional parameters are enclosed in square brackets. For detailed parameter descriptions, see the section Options.

version

pg_probackup version

Prints pg_probackup version.

help

pg_probackup help [command]

Displays the synopsis of pg_probackup commands. If one of the pg_probackup commands is specified, shows detailed information about the options that can be used with this command.

init

pg_probackup init -B backup_dir [--help]

Initializes the backup catalog in backup_dir that will store backup copies, WAL archive, and meta information for the backed up database clusters. If the specified backup_dir already exists, it must be empty. Otherwise, pg_probackup displays a corresponding error message.

For details, see the section Initializing the Backup Catalog.

add-instance

pg_probackup add-instance -B backup_dir -D data_dir --instance instance_name [--help]

Initializes a new backup instance inside the backup catalog backup_dir and generates the pg_probackup.conf configuration file that controls pg_probackup settings for the cluster with the specified data_dir data directory.

For details, see the section Adding a New Backup Instance.

del-instance

pg_probackup del-instance -B backup_dir --instance instance_name [--help]

Deletes all backups and WAL files associated with the specified instance.

set-config

pg_probackup set-config -B backup_dir --instance instance_name
[--help] [--pgdata=pgdata-path]
[--retention-redundancy=redundancy][--retention-window=window][--wal-depth=wal_depth]
[--compress-algorithm=compression_algorithm] [--compress-level=compression_level]
[-d dbname] [-h host] [-p port] [-U username]
[--archive-timeout=timeout] [--external-dirs=external_directory_path]
[--restore-command=cmdline]
[remote_options] [remote_wal_archive_options] [logging_options]

Adds the specified connection, compression, retention, logging, and external directory settings into the pg_probackup.conf configuration file, or modifies the previously defined values.

For all available settings, see the Options section.

It is not recommended to edit pg_probackup.conf manually.

set-backup

pg_probackup set-backup -B backup_dir --instance instance_name -i backup_id
{--ttl=ttl | --expire-time=time} [--help]

Sets the provided backup-specific settings into the backup.control configuration file, or modifies the previously defined values.

For all available settings, see the section Pinning Options.

show-config

pg_probackup show-config -B backup_dir --instance instance_name [--format=plain|json]

Displays the contents of the pg_probackup.conf configuration file located in the backup_dir/backups/instance_name directory. You can specify the --format=json option to get the result in the JSON format. By default, configuration settings are shown as plain text.

To edit pg_probackup.conf, use the set-config command.

show

pg_probackup show -B backup_dir
[--help] [--instance instance_name [-i backup_id | --archive]] [--format=plain|json]

Shows the contents of the backup catalog. If instance_name and backup_id are specified, shows detailed information about this backup. If the --archive option is specified, shows the contents of WAL archive of the backup catalog.

By default, the contents of the backup catalog is shown as plain text. You can specify the --format=json option to get the result in the JSON format.

For details on usage, see the sections Managing the Backup Catalog and Viewing WAL Archive Information.

backup

pg_probackup backup -B backup_dir -b backup_mode --instance instance_name
[--help] [-j num_threads] [--progress]
[-C] [--stream [-S slot_name] [--temp-slot]] [--backup-pg-log]
[--no-validate] [--skip-block-validation]
[-w --no-password] [-W --password]
[--archive-timeout=timeout] [--external-dirs=external_directory_path]
[connection_options] [compression_options] [remote_options]
[retention_options] [pinning_options] [logging_options]

Creates a backup copy of the Postgres Pro instance.

-b mode
--backup-mode=mode

Specifies the backup mode to use. Possible values are:

  • FULL — creates a full backup that contains all the data files of the cluster to be restored.

  • DELTA — reads all data files in the data directory and creates an incremental backup for pages that have changed since the previous backup.

  • PAGE — creates an incremental PAGE backup based on the WAL files that have changed since the previous full or incremental backup was taken.

  • PTRACK — creates an incremental PTRACK backup tracking page changes on the fly.

-C
--smooth-checkpoint

Spreads out the checkpoint over a period of time. By default, pg_probackup tries to complete the checkpoint as soon as possible.

--stream

Makes a STREAM backup, which includes all the necessary WAL files by streaming them from the database server via replication protocol.

--temp-slot

Creates a temporary physical replication slot for streaming WAL from the backed up Postgres Pro instance. It ensures that all the required WAL segments remain available if WAL is rotated while the backup is in progress. This flag can only be used together with the --stream flag. The default slot name is pg_probackup_slot, which can be changed using the --slot/-S option.

-S slot_name
--slot=slot_name

Specifies the replication slot for WAL streaming. This option can only be used together with the --stream flag.

--backup-pg-log

Includes the log directory into the backup. This directory usually contains log messages. By default, log directory is excluded.

-E external_directory_path
--external-dirs=external_directory_path

Includes the specified directory into the backup. This option is useful to back up scripts, SQL dump files, and configuration files located outside of the data directory. If you would like to back up several external directories, separate their paths by a colon on Unix and a semicolon on Windows.

--archive-timeout=wait_time

Sets the timeout for WAL segment archiving and streaming, in seconds. By default, pg_probackup waits 300 seconds.

--skip-block-validation

Disables block-level checksum verification to speed up the backup process.

--no-validate

Skips automatic validation after the backup is taken. You can use this flag if you validate backups regularly and would like to save time when running backup operations.

Additionally, connection options, retention options, pinning options, remote mode options, compression options, logging options, and common options can be used.

For details on usage, see the section Creating a Backup.

restore

pg_probackup restore -B backup_dir --instance instance_name
[--help] [-D data_dir] [-i backup_id]
[-j num_threads] [--progress]
[-T OLDDIR=NEWDIR] [--external-mapping=OLDDIR=NEWDIR] [--skip-external-dirs]
[-R | --restore-as-replica] [--no-validate] [--skip-block-validation] [--force]
[--restore-command=cmdline]
[recovery_target_options] [logging_options] [remote_options]
[partial_restore_options] [remote_wal_archive_options]

Restores the Postgres Pro instance from a backup copy located in the backup_dir backup catalog. If you specify a recovery target option, pg_probackup finds the closest backup and restores it to the specified recovery target. If neither the backup ID nor recovery target options are provided, pg_probackup uses the most recent backup to perform the recovery.

-R
--restore-as-replica

Writes a minimal recovery.conf in the output directory to facilitate setting up a standby server. The password is not included. If the replication connection requires a password, you must specify the password manually.

-T OLDDIR=NEWDIR
--tablespace-mapping=OLDDIR=NEWDIR

Relocates the tablespace from the OLDDIR to the NEWDIR directory at the time of recovery. Both OLDDIR and NEWDIR must be absolute paths. If the path contains the equals sign (=), escape it with a backslash. This option can be specified multiple times for multiple tablespaces.

--external-mapping=OLDDIR=NEWDIR

Relocates an external directory included into the backup from the OLDDIR to the NEWDIR directory at the time of recovery. Both OLDDIR and NEWDIR must be absolute paths. If the path contains the equals sign (=), escape it with a backslash. This option can be specified multiple times for multiple directories.

--skip-external-dirs

Skip external directories included into the backup with the --external-dirs option. The contents of these directories will not be restored.

--skip-block-validation

Disables block-level checksum verification to speed up validation. During automatic validation before the restore only file-level checksums will be verified.

--no-validate

Skips backup validation. You can use this flag if you validate backups regularly and would like to save time when running restore operations.

--restore-command=cmdline

Sets the restore_command parameter to the specified command. For example: --restore-command='cp /mnt/server/archivedir/%f "%p"'

--force

Allows to ignore an invalid status of the backup. You can use this flag if you need to restore the Postgres Pro cluster from a corrupt or an invalid backup. Use with caution.

Additionally, recovery target options, remote mode options, remote WAL archive options, logging options, partial restore options, and common options can be used.

For details on usage, see the section Restoring a Cluster.

checkdb

pg_probackup checkdb
[-B backup_dir] [--instance instance_name] [-D data_dir]
[--help] [-j num_threads] [--progress]
[--skip-block-validation] [--amcheck] [--heapallindexed]
[connection_options] [logging_options]

Verifies the Postgres Pro database cluster correctness by detecting physical and logical corruption.

--amcheck

Performs logical verification of indexes for the specified Postgres Pro instance if no corruption was found while checking data files. You must have the amcheck extension or the amcheck_next extension installed in the database to check its indexes. For databases without amcheck, index verification will be skipped.

--skip-block-validation

Skip validation of data files. You can use this flag only together with the --amcheck flag, so that only logical verification of indexes is performed.

--heapallindexed

Checks that all heap tuples that should be indexed are actually indexed. You can use this flag only together with the --amcheck flag.

This check is only possible if you are using the amcheck extension of version 2.0 or higher, or the amcheck_next extension of any version.

Additionally, connection options and logging options can be used.

For details on usage, see the section Verifying a Cluster.

validate

pg_probackup validate -B backup_dir
[--help] [--instance instance_name] [-i backup_id]
[-j num_threads] [--progress]
[--skip-block-validation]
[recovery_target_options] [logging_options]

Verifies that all the files required to restore the cluster are present and are not corrupt. If instance_name is not specified, pg_probackup validates all backups available in the backup catalog. If you specify the instance_name without any additional options, pg_probackup validates all the backups available for this backup instance. If you specify the instance_name with a recovery target option and/or a backup_id, pg_probackup checks whether it is possible to restore the cluster using these options.

For details, see the section Validating a Backup.

merge

pg_probackup merge -B backup_dir --instance instance_name -i backup_id
[--help] [-j num_threads] [--progress]
[logging_options]

Merges the specified incremental backup to its parent full backup, together with all incremental backups between them, if any. As a result, the full backup takes in all the merged data, and the incremental backups are removed as redundant.

For details, see the section Merging Backups.

delete

pg_probackup delete -B backup_dir --instance instance_name
[--help] [-j num_threads] [--progress]
[--retention-redundancy=redundancy][--retention-window=window][--wal-depth=wal_depth]
[--delete-wal] {-i backup_id | --delete-expired [--merge-expired] | --merge-expired}
[--dry-run]
[logging_options]

Deletes backup with specified backup_id or launches the retention purge of backups and archived WAL that do not satisfy the current retention policies.

For details, see the sections Deleting Backups, Retention Options and Configuring Retention Policy.

archive-push

pg_probackup archive-push -B backup_dir --instance instance_name
--wal-file-path=wal_file_path --wal-file-name=wal_file_name
[--help] [--compress] [--compress-algorithm=compression_algorithm]
[--compress-level=compression_level] [--overwrite]
[remote_options] [logging_options]

Copies WAL files into the corresponding subdirectory of the backup catalog and validates the backup instance by instance_name and system-identifier. If parameters of the backup instance and the cluster do not match, this command fails with the following error message: Refuse to push WAL segment segment_name into archive. Instance parameters mismatch. For each WAL file moved to the backup catalog, you will see the following message in the Postgres Pro log file: pg_probackup archive-push completed successfully.

If the files to be copied already exist in the backup catalog, pg_probackup computes and compares their checksums. If the checksums match, archive-push skips the corresponding file and returns a successful execution code. Otherwise, archive-push fails with an error. If you would like to replace WAL files in the case of checksum mismatch, run the archive-push command with the --overwrite flag.

The files are copied to a temporary file with the .part suffix. After the copy is done, atomic rename is performed. This algorithm ensures that a failed archive-push will not stall continuous archiving and that concurrent archiving from multiple sources into a single WAL archive have no risk of archive corruption. WAL segments copied to the archive are synced to disk.

You can use archive-push in the archive_command Postgres Pro parameter to set up continuous WAL archiving.

For details, see sections Archiving Options and Compression Options.

archive-get

pg_probackup archive-get -B backup_dir --instance instance_name --wal-file-path=wal_file_path --wal-file-name=wal_file_name
[--help] [remote_options] [logging_options]

Copies WAL files from the corresponding subdirectory of the backup catalog to the cluster's write-ahead log location. This command is automatically set by pg_probackup as part of the restore_command in recovery.conf when restoring backups using a WAL archive. You do not need to set it manually.

Options

This section describes command-line options for pg_probackup commands. If the option value can be derived from an environment variable, this variable is specified below the command-line option, in the uppercase. Some values can be taken from the pg_probackup.conf configuration file located in the backup catalog.

For details, see the section called “Configuring pg_probackup.

If an option is specified using more than one method, command-line input has the highest priority, while the pg_probackup.conf settings have the lowest priority.

Common Options

The list of general options.

-B directory
--backup-path=directory
BACKUP_PATH

Specifies the absolute path to the backup catalog. Backup catalog is a directory where all backup files and meta information are stored. Since this option is required for most of the pg_probackup commands, you are recommended to specify it once in the BACKUP_PATH environment variable. In this case, you do not need to use this option each time on the command line.

-D directory
--pgdata=directory
PGDATA

Specifies the absolute path to the data directory of the database cluster. This option is mandatory only for the add-instance command. Other commands can take its value from the PGDATA environment variable, or from the pg_probackup.conf configuration file.

-i backup_id
--backup-id=backup_id

Specifies the unique identifier of the backup.

-j num_threads
--threads=num_threads

Sets the number of parallel threads for backup, restore, merge, validate, and checkdb processes.

--progress

Shows the progress of operations.

--help

Shows detailed information about the options that can be used with this command.

Recovery Target Options

If continuous WAL archiving is configured, you can use one of these options together with restore or validate commands to specify the moment up to which the database cluster must be restored or validated.

--recovery-target=immediate|latest

Defines when to stop the recovery:

  • The immediate value stops the recovery after reaching the consistent state of the specified backup, or the latest available backup if the -i/--backup_id option is omitted. This is the default behavior for STREAM backups.

  • The latest value continues the recovery until all WAL segments available in the archive are applied. This is the default behavior for ARCHIVE backups.

--recovery-target-timeline=timeline

Specifies a particular timeline to be used for recovery. By default, the timeline of the specified backup is used.

--recovery-target-lsn=lsn

Specifies the LSN of the write-ahead log location up to which recovery will proceed. Can be used only when restoring a database cluster of major version 10 or higher.

--recovery-target-name=recovery_target_name

Specifies a named savepoint up to which to restore the cluster.

--recovery-target-time=time

Specifies the timestamp up to which recovery will proceed.

--recovery-target-xid=xid

Specifies the transaction ID up to which recovery will proceed.

--recovery-target-inclusive=boolean

Specifies whether to stop just after the specified recovery target (true), or just before the recovery target (false). This option can only be used together with --recovery-target-name, --recovery-target-time, --recovery-target-lsn or --recovery-target-xid options. The default depends on the recovery_target_inclusive parameter.

--recovery-target-action=pause|promote|shutdown

Specifies recovery_target_action the server should take when the recovery target is reached.

Default: pause

Retention Options

You can use these options together with backup and delete commands.

For details on configuring retention policy, see the section Configuring Retention Policy.

--retention-redundancy=redundancy

Specifies the number of full backup copies to keep in the data directory. Must be a non-negative integer. The zero value disables this setting.

Default: 0

--retention-window=window

Number of days of recoverability. Must be a non-negative integer. The zero value disables this setting.

Default: 0

--wal-depth=wal_depth

Number of latest valid backups on every timeline that must retain the ability to perform PITR. Must be a non-negative integer. The zero value disables this setting.

Default: 0

--delete-wal

Deletes WAL files that are no longer required to restore the cluster from any of the existing backups.

--delete-expired

Deletes backups that do not conform to the retention policy defined in the pg_probackup.conf configuration file.

--merge-expired

Merges the oldest incremental backup that satisfies the requirements of retention policy with its parent backups that have already expired.

--dry-run

Displays the current status of all the available backups, without deleting or merging expired backups, if any.

Pinning Options

You can use these options together with backup and set-backup commands.

For details on backup pinning, see the section Backup Pinning.

--ttl=ttl

Specifies the amount of time the backup should be pinned. Must be a non-negative integer. The zero value unpins the already pinned backup. Supported units: ms, s, min, h, d (s by default).

Example: --ttl=30d

--expire-time=time

Specifies the timestamp up to which the backup will stay pinned. Must be an ISO-8601 complaint timestamp.

Example: --expire-time='2020-01-01 00:00:00+03'

Logging Options

You can use these options with any command.

--log-level-console=log_level

Controls which message levels are sent to the console log. Valid values are verbose, log, info, warning, error and off. Each level includes all the levels that follow it. The later the level, the fewer messages are sent. The off level disables console logging.

Default: info

Note

All console log messages are going to stderr, so the output of show and show-config commands does not mingle with log messages.

--log-level-file=log_level

Controls which message levels are sent to a log file. Valid values are verbose, log, info, warning, error, and off. Each level includes all the levels that follow it. The later the level, the fewer messages are sent. The off level disables file logging.

Default: off

--log-filename=log_filename

Defines the filenames of the created log files. The filenames are treated as a strftime pattern, so you can use %-escapes to specify time-varying filenames.

Default: pg_probackup.log

For example, if you specify the pg_probackup-%u.log pattern, pg_probackup generates a separate log file for each day of the week, with %u replaced by the corresponding decimal number: pg_probackup-1.log for Monday, pg_probackup-2.log for Tuesday, and so on.

This option takes effect if file logging is enabled by the --log-level-file option.

--error-log-filename=error_log_filename

Defines the filenames of log files for error messages only. The filenames are treated as a strftime pattern, so you can use %-escapes to specify time-varying filenames.

Default: none

For example, if you specify the error-pg_probackup-%u.log pattern, pg_probackup generates a separate log file for each day of the week, with %u replaced by the corresponding decimal number: error-pg_probackup-1.log for Monday, error-pg_probackup-2.log for Tuesday, and so on.

This option is useful for troubleshooting and monitoring.

--log-directory=log_directory

Defines the directory in which log files will be created. You must specify the absolute path. This directory is created lazily, when the first log message is written.

Default: $BACKUP_PATH/log/

--log-rotation-size=log_rotation_size

Maximum size of an individual log file. If this value is reached, the log file is rotated once a pg_probackup command is launched, except help and version commands. The zero value disables size-based rotation. Supported units: kB, MB, GB, TB (kB by default).

Default: 0

--log-rotation-age=log_rotation_age

Maximum lifetime of an individual log file. If this value is reached, the log file is rotated once a pg_probackup command is launched, except help and version commands. The time of the last log file creation is stored in $BACKUP_PATH/log/log_rotation. The zero value disables time-based rotation. Supported units: ms, s, min, h, d (min by default).

Default: 0

Connection Options

You can use these options together with backup and checkdb commands.

All libpq environment variables are supported.

-d dbname
--pgdatabase=dbname
PGDATABASE

Specifies the name of the database to connect to. The connection is used only for managing backup process, so you can connect to any existing database. If this option is not provided on the command line, PGDATABASE environment variable, or the pg_probackup.conf configuration file, pg_probackup tries to take this value from the PGUSER environment variable, or from the current user name if PGUSER variable is not set.

-h host
--pghost=host
PGHOST

Specifies the host name of the system on which the server is running. If the value begins with a slash, it is used as a directory for the Unix domain socket.

Default: localhost

-p port
--pgport=port
PGPORT

Specifies the TCP port or the local Unix domain socket file extension on which the server is listening for connections.

Default: 5432

-U username
--pguser=username
PGUSER

User name to connect as.

-w
--no-password

Disables a password prompt. If the server requires password authentication and a password is not available by other means such as a .pgpass file or PGPASSWORD environment variable, the connection attempt will fail. This flag can be useful in batch jobs and scripts where no user is present to enter a password.

-W
--password

Forces a password prompt.

Compression Options

You can use these options together with backup and archive-push commands.

--compress-algorithm=compression_algorithm

Defines the algorithm to use for compressing data files. Possible values are zlib, pglz, and none. If set to zlib or pglz, this option enables compression. By default, compression is disabled. For the archive-push command, the pglz compression algorithm is not supported.

Default: none

--compress-level=compression_level

Defines compression level (0 through 9, 0 being no compression and 9 being best compression). This option can be used together with the --compress-algorithm option.

Default: 1

--compress

Alias for --compress-algorithm=zlib and --compress-level=1.

Archiving Options

These options can be used with the archive-push command in the archive_command setting and the archive-get command in the restore_command setting.

Additionally, remote mode options and logging options can be used.

--wal-file-path=wal_file_path

Provides the path to the WAL file in archive_command and restore_command. Use the %p variable as the value for this option for correct processing.

--wal-file-name=wal_file_name

Provides the name of the WAL file in archive_command and restore_command. Use the %f variable as the value for this option for correct processing.

--overwrite

Overwrites archived WAL file. Use this flag together with the archive-push command if the specified subdirectory of the backup catalog already contains this WAL file and it needs to be replaced with its newer copy. Otherwise, archive-push reports that a WAL segment already exists, and aborts the operation. If the file to replace has not changed, archive-push skips this file regardless of the --overwrite flag.

Remote Mode Options

This section describes the options related to running pg_probackup operations remotely via SSH. These options can be used with add-instance, set-config, backup, restore, archive-push, and archive-get commands.

For details on configuring and using the remote mode, see the section called “Configuring the Remote Mode” and the section called “Using pg_probackup in the Remote Mode”.

--remote-proto=proto

Specifies the protocol to use for remote operations. Currently only the SSH protocol is supported. Possible values are:

  • ssh enables the remote mode via SSH. This is the default value.

  • none explicitly disables the remote mode.

You can omit this option if the --remote-host option is specified.

--remote-host=destination

Specifies the remote host IP address or hostname to connect to.

--remote-port=port

Specifies the remote host port to connect to.

Default: 22

--remote-user=username

Specifies remote host user for SSH connection. If you omit this option, the current user initiating the SSH connection is used.

--remote-path=path

Specifies pg_probackup installation directory on the remote system.

--ssh-options=ssh_options

Provides a string of SSH command-line options. For example, the following options can be used to set keep-alive for SSH connections opened by pg_probackup: --ssh-options='-o ServerAliveCountMax=5 -o ServerAliveInterval=60'. For the full list of possible options, see ssh_config manual page.

Remote WAL Archive Options

This section describes the options used to provide the arguments for remote mode options in archive-get used in the restore_command command when restoring ARCHIVE backups or performing PITR.

--archive-host=destination

Provides the argument for the --remote-host option in the archive-get command.

--archive-port=port

Provides the argument for the --remote-port option in the archive-get command.

Default: 22

--archive-user=username

Provides the argument for the --remote-user option in the archive-get command. If you omit this option, the user that has started the Postgres Pro cluster is used.

Default: Postgres Pro user

Partial Restore Options

This section describes the options for partial cluster restore. These options can be used with the restore command.

--db-exclude=dbname

Specifies the name of the database to exclude from restore. All other databases in the cluster will be restored as usual, including template0 and template1. This option can be specified multiple times for multiple databases.

--db-include=dbname

Specifies the name of the database to restore from a backup. All other databases in the cluster will not be restored, with the exception of template0 and template1. This option can be specified multiple times for multiple databases.

Replica Options

This section describes the options related to taking a backup from standby.

Note

Starting from pg_probackup 2.0.24, backups can be taken from standby without connecting to the master server, so these options are no longer required. In lower versions, pg_probackup had to connect to the master to determine recovery time — the earliest moment for which you can restore a consistent state of the database cluster.

--master-db=dbname

Deprecated. Specifies the name of the database on the master server to connect to. The connection is used only for managing the backup process, so you can connect to any existing database. Can be set in the pg_probackup.conf using the set-config command.

Default: postgres, the default Postgres Pro database

--master-host=host

Deprecated. Specifies the host name of the system on which the master server is running.

--master-port=port

Deprecated. Specifies the TCP port or the local Unix domain socket file extension on which the master server is listening for connections.

Default: 5432, the Postgres Pro default port

--master-user=username

Deprecated. User name to connect as.

Default: postgres, the Postgres Pro default user name

--replica-timeout=timeout

Deprecated. Wait time for WAL segment streaming via replication, in seconds. By default, pg_probackup waits 300 seconds. You can also define this parameter in the pg_probackup.conf configuration file using the set-config command.

Default: 300 sec

How-To

All examples below assume the remote mode of operations via SSH. If you are planning to run backup and restore operation locally, skip the Setup passwordless SSH connection step and omit all --remote-* options.

Examples are based on Ubuntu 18.04, Postgres Pro 11, and pg_probackup 2.2.0.

  • backup_host — host with backup catalog.

  • backupman — user on backup_host running all pg_probackup operations.

  • /mnt/backups — directory on backup_host where backup catalog is stored.

  • postgres_host — host with Postgres Pro cluster.

  • postgres — user on postgres_host that has started the Postgres Pro cluster.

  • /var/lib/postgresql/11/mainPostgres Pro data directory on postgres_host.

  • backupdb — database used for connection to Postgres Pro cluster.

Minimal Setup

This scenario illustrates setting up standalone FULL and DELTA backups.

  1. Set up passwordless SSH connection from backup_host to postgres_host:

    [backupman@backup_host] ssh-copy-id postgres@postgres_host
    
  2. Configure your Postgres Pro cluster.

    For security purposes, it is recommended to use a separate database for backup operations.

    postgres=#
    CREATE DATABASE backupdb;
    

    Connect to the backupdb database, create the probackup role, and grant the following permissions to this role:

    backupdb=#
    BEGIN;
    CREATE ROLE probackup WITH LOGIN REPLICATION;
    GRANT USAGE ON SCHEMA pg_catalog TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.current_setting(text) TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.pg_is_in_recovery() TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.pg_start_backup(text, boolean, boolean) TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.pg_stop_backup(boolean, boolean) TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.pg_create_restore_point(text) TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.pg_switch_wal() TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.pg_last_wal_replay_lsn() TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.txid_current() TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.txid_current_snapshot() TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.txid_snapshot_xmax(txid_snapshot) TO probackup;
    GRANT EXECUTE ON FUNCTION pg_catalog.pg_control_checkpoint() TO probackup;
    COMMIT;
    
  3. Initialize the backup catalog:

    [backupman@backup_host]$ pg_probackup-11 init -B /mnt/backups
    INFO: Backup catalog '/mnt/backups' successfully inited
    
  4. Add instance pg-11 to the backup catalog:

    [backupman@backup_host]$ pg_probackup-11 add-instance -B /mnt/backups --instance 'pg-11' --remote-host=postgres_host --remote-user=postgres -D /var/lib/postgresql/11/main
    INFO: Instance 'node' successfully inited
    
  5. Take a FULL backup:

    [backupman@backup_host] pg_probackup-11 backup -B /mnt/backups --instance 'pg-11' -b FULL --stream --remote-host=postgres_host --remote-user=postgres -U probackup -d backupdb
    INFO: Backup start, pg_probackup version: 2.2.0, instance: node, backup ID: PZ7YK2, backup mode: FULL, wal mode: STREAM, remote: true, compress-algorithm: none, compress-level: 1
    INFO: Start transferring data files
    INFO: Data files are transferred
    INFO: wait for pg_stop_backup()
    INFO: pg_stop backup() successfully executed
    INFO: Validating backup PZ7YK2
    INFO: Backup PZ7YK2 data files are valid
    INFO: Backup PZ7YK2 resident size: 196MB
    INFO: Backup PZ7YK2 completed
    
  6. Let's take a look at the backup catalog:

    [backupman@backup_host] pg_probackup-11 backup -B /mnt/backups --instance 'pg-11'
    
    BACKUP INSTANCE 'pg-11'
    ==================================================================================================================================
     Instance  Version  ID      Recovery Time           Mode   WAL Mode  TLI  Time   Data   WAL  Zratio  Start LSN  Stop LSN   Status
    ==================================================================================================================================
     node      11       PZ7YK2  2019-10-11 19:45:45+03  FULL  STREAM    1/0   11s  180MB  16MB    1.00  0/3C000028  0/3C000198  OK
    
  7. Take an incremental backup in the DELTA mode:

    [backupman@backup_host] pg_probackup-11 backup -B /mnt/backups --instance 'pg-11' -b delta --stream --remote-host=postgres_host --remote-user=postgres -U probackup -d backupdb
    INFO: Backup start, pg_probackup version: 2.2.0, instance: node, backup ID: PZ7YMP, backup mode: DELTA, wal mode: STREAM, remote: true, compress-algorithm: none, compress-level: 1
    INFO: Parent backup: PZ7YK2
    INFO: Start transferring data files
    INFO: Data files are transferred
    INFO: wait for pg_stop_backup()
    INFO: pg_stop backup() successfully executed
    INFO: Validating backup PZ7YMP
    INFO: Backup PZ7YMP data files are valid
    INFO: Backup PZ7YMP resident size: 32MB
    INFO: Backup PZ7YMP completed
    
  8. Let's add some parameters to pg_probackup configuration file, so that you can omit them from the command line:

    [backupman@backup_host] pg_probackup-11 set-config -B /mnt/backups --instance 'pg-11' --remote-host=postgres_host --remote-user=postgres -U probackup -d backupdb
    
  9. Take another incremental backup in the DELTA mode, omitting some of the previous parameters:

    [backupman@backup_host] pg_probackup-11 backup -B /mnt/backups --instance 'pg-11' -b delta --stream
    INFO: Backup start, pg_probackup version: 2.2.0, instance: node, backup ID: PZ7YR5, backup mode: DELTA, wal mode: STREAM, remote: true, compress-algorithm: none, compress-level: 1
    INFO: Parent backup: PZ7YMP
    INFO: Start transferring data files
    INFO: Data files are transferred
    INFO: wait for pg_stop_backup()
    INFO: pg_stop backup() successfully executed
    INFO: Validating backup PZ7YR5
    INFO: Backup PZ7YR5 data files are valid
    INFO: Backup PZ7YR5 resident size: 32MB
    INFO: Backup PZ7YR5 completed
    
  10. Let's take a look at the instance configuration:

    [backupman@backup_host] pg_probackup-11 show-config -B /mnt/backups --instance 'pg-11'
    
    # Backup instance information
    pgdata = /var/lib/postgresql/11/main
    system-identifier = 6746586934060931492
    xlog-seg-size = 16777216
    # Connection parameters
    pgdatabase = backupdb
    pghost = postgres_host
    pguser = probackup
    # Replica parameters
    replica-timeout = 5min
    # Archive parameters
    archive-timeout = 5min
    # Logging parameters
    log-level-console = INFO
    log-level-file = OFF
    log-filename = pg_probackup.log
    log-rotation-size = 0
    log-rotation-age = 0
    # Retention parameters
    retention-redundancy = 0
    retention-window = 0
    wal-depth = 0
    # Compression parameters
    compress-algorithm = none
    compress-level = 1
    # Remote access parameters
    remote-proto = ssh
    remote-host = postgres_host
    

    Note that we are getting the default values for other options that were not overwritten by the set-config command.

  11. Let's take a look at the backup catalog:

    [backupman@backup_host] pg_probackup-11 backup -B /mnt/backups --instance 'pg-11'
    
    ====================================================================================================================================
     Instance  Version  ID      Recovery Time           Mode   WAL Mode  TLI  Time   Data   WAL  Zratio  Start LSN   Stop LSN    Status
    ====================================================================================================================================
     node      11       PZ7YR5  2019-10-11 19:49:56+03  DELTA  STREAM    1/1   10s  112kB  32MB    1.00  0/41000028  0/41000160  OK
     node      11       PZ7YMP  2019-10-11 19:47:16+03  DELTA  STREAM    1/1   10s  376kB  32MB    1.00  0/3E000028  0/3F0000B8  OK
     node      11       PZ7YK2  2019-10-11 19:45:45+03  FULL   STREAM    1/0   11s  180MB  16MB    1.00  0/3C000028  0/3C000198  OK
    

Versioning

pg_probackup follows semantic versioning.

Authors

Postgres Professional, Moscow, Russia.

Credits

pg_probackup utility is based on pg_arman, which was originally written by NTT and then developed and maintained by Michael Paquier.