Checking Data Integrity

Page Validation

If data checksums are enabled in the database cluster, pg_probackup3 uses this information to check correctness of data files during backup. While reading each page, pg_probackup3 checks whether the calculated checksum coincides with the checksum stored in the page header. This guarantees that the Postgres Pro instance and the backup itself have no corrupt pages. Note that pg_probackup3 reads database files directly from the filesystem, so under heavy write load during backup it can show false-positive checksum mismatches because of partial writes. If a page checksum mismatch occurs, the page is re-read and checksum comparison is repeated.

A page is considered corrupt if checksum comparison has failed more than 300 times. In this case, the backup is aborted.

Even if data checksums are not enabled, pg_probackup3 always performs sanity checks for page headers.

Validating a Backup

pg_probackup3 calculates checksums for each file in a backup during the backup process. The process of checking checksums of backup data files is called the backup validation. By default, validation is run immediately after the backup is taken and right before the restore, to detect possible backup corruption.

Note

The backup validation includes checking checksums for CFS files.

If you would like to skip backup validation, you can specify the --no-validate flag when running backup and restore commands.

For example, to check that you can restore the database cluster from a backup copy up to transaction ID 4242, run this command:

pg_probackup3 validate -B backup_dir --instance=instance_name --recovery-target-xid=4242

If validation completes successfully, pg_probackup3 displays the corresponding message. If validation fails, you will receive an error message with the exact time, transaction ID, and LSN up to which the recovery is possible.

If you specify backup_id via -i/--backup-id option, then only the backup copy with specified backup ID will be validated. If backup_id is specified with recovery target options, the validate command will check whether it is possible to restore the specified backup to the specified recovery target.

For example, to check that you can restore the database cluster from a backup copy with the SBOL6P backup ID up to the specified timestamp, run this command:

pg_probackup3 validate -B backup_dir --instance=instance_name -i SBOL6P --recovery-target-time="2024-04-10 18:18:26+03"

If you specify the backup_id of an incremental backup, all its parents starting from FULL backup will be validated.

If you omit all the parameters, all backups are validated.