G.5. pgpro_validate — check data integrity of a Postgres Pro instance

G.5.1. Overview

The pgpro_validate utility provides tools for checking the integrity of the Postgres Pro instance files and objects, performing the following types of operations:

  • database block checksum verification

  • base directory verification

  • file system integrity checks (global, pg_xact, pg_wal, etc.)

  • index integrity checks

  • referential integrity checks of system catalogs (pg_depend, pg_shdepend, etc.)

  • table verification

  • constraint verification (NOT NULL, FOREIGN KEY, UNIQUE)

  • file access permission verification

  • tablespace file verification

pgpro_validate operates in read-only mode and does not modify the database cluster. The utility can run locally on the host with the PGDATA directory or in remote mode via SSH, where the local instance launches pgpro_validate on a remote host and streams its output.

G.5.2. Installation

pgpro_validate is provided with Postgres Pro Standard as a separate pre-built package pgpro-validate-std-15 (for the detailed installation instructions, refer to Chapter 16).

G.5.3. Considerations and Limitations

Before you start working with pgpro_validate, note the operation specifics described below.

G.5.3.1. General

  • For the following commands that require a database connection, the Postgres Pro server must be running: amcheck, constraints, database_files, index, integrity, molotilka, and tablespace. These commands also require the database user to have superuser privileges or appropriate roles to read system catalogs and user tables.

    Commands that only access the file system — checksum, permissions, postgres_files — can run while the server is stopped, provided that files are not modified during the check.

  • The user running pgpro_validate must have read permissions for all files in PGDATA.

  • For the amcheck command, the amcheck or amcheck_next extension must be installed in each target database. Databases without the extension are skipped with a warning.

  • The molotilka command requires the pageinspect extension. On the first startup, pgpro_validate attempts to install it automatically (CREATE EXTENSION IF NOT EXISTS pageinspect), so the database user must have appropriate privileges or the extension must be preinstalled.

G.5.3.2. CFS (Compressed File System)

  • pgpro_validate supports database directory-level checks for CFS files using the following commands: amcheck, constraints, database_files, index, integrity, molotilka, and tablespace.

  • The database_files command takes into account the *.cfm files and does not report them as orphaned.

  • The permissions and postgres_files commands only check the file system metadata.

  • The checksum dir and checksum file commands do not work with CFS.

G.5.3.3. Remote Mode

  • Remote SSH connections do not support interactive password entry. Key-based authentication must be set up in advance.

  • The database password is passed to the remote host via the standard input of an SSH session and exported on the remote side as the PGPASSWORD environment variable. The password does not appear in command-line arguments or ps output.

  • Major versions of the local and remote pgpro_validate instances must match. Minor version mismatches produce a warning but the check continues.

  • In plain output format, remote process output is streamed. In JSON format, output is buffered and validated as JSON, and a synthetic {"error": ..., "exit_code": ...} object is returned on error.

G.5.4. Commands

G.5.4.1. No Command

pgpro_validate [common_options] [remote_options]

Runs all applicable checks sequentially.

Checks requiring a database connection run only when a connection string is provided, and those requiring PGDATA access run only when the --pgdata option is set. Otherwise, the checks are skipped with an informational message.

Example:

pgpro_validate \
    --pgdata=/var/lib/postgresql/15/data \
    --username=postgres \
    --dbname=postgres

G.5.4.2. amcheck

pgpro_validate [common_options] [remote_options] amcheck [--all] [--check-db=database_name]
[--checkunique] [--exclude-schema=schema_name] [--heapallindexed]

Performs logical B-tree index verification using the amcheck extension.

This command requires only a database connection. The --pgdata option is not used.

--check-db=database_name

A comma-separated list of databases in which to check indexes. If this option is omitted, all databases are checked.

Unlike index and constraints, amcheck uses the --check-db option rather than --db.

--exclude-schema=schema_name

A comma-separated list of schemas to exclude from the check.

--heapallindexed

Enable additional check to ensure that all row versions visible according to the current snapshot are present in the index.

--checkunique

Additionally check unique constraint enforcement.

-a
--all

Enable all available checks (--heapallindexed and --checkunique).

Example:

pgpro_validate amcheck --all \
    --dbname=postgres \
    --username=postgres

G.5.4.3. checksum dir

pgpro_validate [common_options] [remote_options] checksum [--percent=percent] dir --pgdata=datadir [dirpath]

Verifies block checksums of all relation files in the specified directory.

This command does not require a database connection but requires the --pgdata option.

dirpath is an optional path to the directory to check. If this argument is omitted, the PGDATA/base directory is checked.

The --percent option specifies the percentage of randomly selected blocks to check within each file. Possible values — from 0 to 100. The default value is 100 (all blocks).

Example:

pgpro_validate checksum --percent=20 dir \
    --pgdata=/var/lib/postgresql/15/data

G.5.4.4. checksum file

pgpro_validate [common_options] [remote_options] checksum [--percent=percent] file --pgdata=datadir
[--bstart=number] [--bend=number] path

Validates block checksums of a single relation file.

This command does not require a database connection but requires the --pgdata option, even if path points to a file outside PGDATA.

path is a path to the file to check. Must be a regular file (not a directory or symbolic link).

--bstart=number

The starting block number (inclusive).

--bend=number

The ending block number (inclusive). If this option is not specified, all blocks to the end of the file are checked.

--percent=percent

The percentage of randomly selected blocks from the specified range. Possible values — from 0 to 100. The default value is 100.

Example:

pgpro_validate checksum file \
    --pgdata=/var/lib/postgresql/15/data \
    --bstart=0 --bend=1024 \
    /var/lib/postgresql/15/data/base/16384/2619

G.5.4.5. constraints

pgpro_validate [common_options] [remote_options] constraints [--db=database_name] [-t=table_name]

Checks integrity constraints (NOT NULL, FOREIGN KEY, UNIQUE) for user tables.

--db=database_name

A comma-separated list of databases in which to check constraints. If this option is omitted, all databases are checked.

-t=table_name
--table=table_name

A comma-separated list of tables to check. If this option is omitted, all user tables are checked.

Example:

pgpro_validate constraints \
    --db=mydb \
    --table=public.orders,public.customers \
    --username=postgres

G.5.4.6. database_files

pgpro_validate [common_options] [remote_options] database_files --pgdata=datadir

Compares files in the base and tablespace directories (pg_tblspc) with entries in the pg_class system catalog. It detects orphaned files that have no corresponding relations, as well as relations for which the expected files are missing.

This command requires both access to the PGDATA directory and a database connection.

Example:

pgpro_validate database_files \
    --pgdata=/var/lib/postgresql/15/data \
    --username=postgres

G.5.4.7. index

pgpro_validate [common_options] [remote_options] index [--db=database_name]

Checks index format version and metadata consistency.

--db=database_name

A comma-separated list of databases. If this option is omitted, all databases are checked.

Example:

pgpro_validate index \
    --db=mydb \
    --username=postgres

G.5.4.8. integrity

pgpro_validate [common_options] [remote_options] integrity [--select-from-relation[=true|false]]

Checks referential integrity of system catalogs: consistency of pg_depend, pg_shdepend, pg_class, pg_namespace, pg_authid, etc.

--select-from-relation[=true|false]

Enable an additional check that executes the SELECT command for each found relation to verify readability.

Example:

pgpro_validate integrity \
    --dbname=postgres \
    --username=postgres \
    --select-from-relation

G.5.4.9. molotilka

pgpro_validate [common_options] [remote_options] molotilka --pgdata=datadir [--percent=percent] [-t=table_name]

Performs a full page scan of tables and detects corrupted pages.

--percent=percent

The percentage of randomly selected pages to scan. Possible values — from 0 to 100. The default value is 100.

-t=table_name
--table=table_name

A comma-separated list of tables to scan. If this option is omitted, all user tables are scanned.

Example:

pgpro_validate molotilka --percent=10 \
    --pgdata=/var/lib/postgresql/15/data \
    --dbname=mydb \
    --username=postgres

G.5.4.10. permissions

pgpro_validate [common_options] [remote_options] permissions --pgdata=datadir

Checks the permissions, owner, and group of all files and directories in PGDATA against the expected Postgres Pro values.

This command does not require a database connection but requires the --pgdata option.

permissions runs in parallel. Use the --jobs option to set the number of threads.

Example:

pgpro_validate permissions --jobs=4 \
    --pgdata=/var/lib/postgresql/15/data

G.5.4.11. postgres_files

pgpro_validate [common_options] [remote_options] postgres_files --pgdata=datadir

Checks for the presence of the fixed set of files and directories that Postgres Pro expects to see in PGDATA, for example, PG_VERSION, postgresql.conf, pg_xact, pg_wal.

This command does not require a database connection but requires the --pgdata option.

Example:

pgpro_validate postgres_files \
    --pgdata=/var/lib/postgresql/15/data

G.5.4.12. tablespace

pgpro_validate [common_options] [remote_options] tablespace --pgdata=datadir

Checks tablespace directories: correctness of symbolic links in PGDATA/pg_tblspc, existence of target directories, presence of subdirectories matching the server version, and consistency with pg_tablespace.

This command requires both access to the PGDATA directory and a database connection.

Example:

pgpro_validate tablespace \
    --pgdata=/var/lib/postgresql/15/data \
    --dbname=postgres

G.5.5. Common Options

The following options apply to all commands, except where explicitly noted otherwise.

-d=database_name
--dbname=database_name

The database name to connect to. The default value is postgres.

-h=host
--hostname=host

The host name or address of the Postgres Pro server.

-p=port
--port=port

The port of the Postgres Pro server. The default value is 5432.

-U=username
--username=username

The database user name. The default value is postgres.

-W=password
--password=password

The database user password. If this option is not specified, the PGPASSWORD environment variable or a .pgpass file is used.

-D=datadir
--pgdata=datadir

The path to the PGDATA data directory.

--pgdata does not apply to the amcheck command.

--tmp=path

The path to the temporary directory. /tmp is used by default.

-j=num_threads
--jobs=num_threads

The number of parallel threads for the check. The value must be a positive integer. It defaults to the number of available CPU cores (runtime.NumCPU()). If the specified value exceeds the number of cores, the command fails with the error exceed number of CPU cores.

--format={json|plain}

The output format. Plain format is used by default. If the option is set to json, the standard log is disabled, and the result is written as a single JSON document to the standard output.

--report-size=integer

The maximum number of error messages to output for a single check.

--log-level={error|warn|info|debug}

The logging level.

-?
--help

Print the command-line help.

-v
--version

Print the pgpro_validate version.

G.5.6. Remote Mode Options

pgpro_validate supports running checks on a remote server via SSH. The local instance launches pgpro_validate on the remote host with the passed parameters.

--remote-host=hostname

The address of the remote host. Specifying this option activates remote mode.

--remote-path=path

The path to the pgpro_validate binary on the remote host. This option is required in remote mode.

--remote-port=port

The SSH port. The default value is 22.

-u=username
--remote-user=username

The SSH user.

--ssh-key=path

The path to the private SSH key file.

--ssh-config=path

The path to the SSH configuration file.

--ssh-options=options

Additional SSH options, separated by a semicolon. For example: StrictHostKeyChecking=no;UserKnownHostsFile=/dev/null.

--remote-version

Print the pgpro_validate version on the remote host and the local utility version, then exit. If the --remote-host option is not set, the command exits with an error.

G.5.6.1. Example

Example of a full remote server check:

pgpro_validate \
    --remote-host=db-server \
    --remote-path=/usr/bin/pgpro_validate \
    --remote-user=dba \
    --ssh-key=/home/dba/.ssh/id_db_server \
    --pgdata=/var/lib/postgresql/15/data \
    --username=postgres \
    --dbname=mydb