G.2. pgpro_datactl — manage Postgres Pro Enterprise data files

G.2.1. Overview

The pgpro_datactl utility provides tools for managing Postgres Pro Enterprise data files, including a module for CFS (Compressed File System) operations. This module offers the following functionalities:

  • Retrieving metadata from compressed files, including their compression algorithm and location.

  • Unpacking CFS files for further analysis.

  • Repacking compressed files and changing their compression level.

  • Preventing potential failures caused by corrupted files.

G.2.2. Installation

pgpro_datactl is provided with Postgres Pro Enterprise as a separate pre-built package pgpro-datactl-ent-15 (for the detailed installation instructions, see Chapter 17).

G.2.3. Commands

pgpro_datactl supports the following commands:

G.2.3.1. Important Notes

Before running these commands, pay attention to the following operation specifics:

  • The repack and unpack operations require the Postgres Pro Enterprise cluster to be stopped. This is necessary to prevent data corruption and ensure file consistency.

  • When using the --in-place option with the repack command, note that:

    • The tablespace will be overwritten directly during the operation.

    • All the existing data in the target tablespace will be replaced.

G.2.3.2. estimate

pgpro_datactl estimate --source=source_path [--log-level=logging_level] [--help]

Estimates the compression ratios for different compression algorithms.

-s=source_path
--source=source_path

Specifies the path to an uncompressed Postgres Pro tablespace directory.

--log-level=logging_level

Sets the logging level. Possible values: debug, info, warning, and error.

Example of estimating compression for a tablespace:

pgpro_datactl estimate --source /data/tablespace/PG_17_202409081

Example of estimating compression with debug logging:

pgpro_datactl estimate --source /path/to/tablespace --log-level debug

Example output:

Source file or directory: /data/tablespace/PG_17_202409081
Compression ratios and timings:
  pglz  : ratio=1.85, time=12.34 block/ms
  zlib  : ratio=2.41, time=3.21 block/ms
  lz4   : ratio=1.92, time=45.67 block/ms
  zstd  : ratio=2.58, time=8.91 block/ms

Best ratio: "zstd" with 2.58
Best speed: "lz4" with 45.67 block/ms
  • ratio is the compression ratio (original size / compressed size).

  • time is the compression speed in blocks per millisecond.

The best algorithm by compression ratio and by speed is displayed at the end.

G.2.3.3. ground

pgpro_datactl ground --source=source_path --block-num=block_number
[--calg=compression_algorithm] [--log-level=logging_level] [--help]

Replaces the specified corrupted blocks, which may appear during decompression, with zeros.

-s=source_path
--source=source_path

Specifies the path to the required file.

--block-num=block_number

Specifies the number of the block that must be replaced.

-c=compression_algorithm
--calg=compression_algorithm

Specifies the current compression algorithm. If the value does not match the CFS tablespace compression algorithm, the operation fails.

--log-level=logging_level

Sets the logging level. Possible values: debug, info, warning, and error.

Example:

pgpro_datactl ground -s /path/to/archive.cfs --block-num 4 --calg pglz

In this example, ground replaces the block number 4 with a zero and uses the pglz algorithm to compress it.

G.2.3.4. info

pgpro_datactl info --source=source_path [--log-level=logging_level] [--help]

Analyzes the file and displays the following information:

  • The file physical size, virtual size, and utilized file space in bytes.

  • Whether the garbage collector (GC) is active.

  • Whether the *.cfm file is present and accessible.

-s=source_path
--source=source_path

Specifies the path to the required file.

--log-level=logging_level

Sets the logging level. Possible values: debug, info, warning, and error.

Example output:

Physical size: 10485760
Virtual size: 9437184
Used size: 7864320
GC active: Yes

G.2.3.5. probe

pgpro_datactl probe --source=source_path [--log-level=logging_level] [--help]

Analyzes the specified file and identifies the following:

  • Compression type: zlib or zstd compression algorithm.

  • Fragmentation level: Analyzes the corresponding *.cfm file (if available) and reports the percentage of unused storage space in the physical file.

  • Whether the file belongs to CFS: Checks the presence of an associated *.cfm file, which indicates whether the file belongs to a CFS archive.

-s=source_path
--source=source_path

Specifies the path to the required file.

--log-level=logging_level

Sets the logging level. Possible values: debug, info, warning, and error.

Example output:

Probing path: /data/sample.dat
Has cfm file: Yes
pg_compression: zstd
Actual compression: zstd
Fragmentation: 5%

In this example, the file is compressed with the zstd algorithm, is part of a CFS archive, and has a fragmentation level of five percent.

G.2.3.6. repack

pgpro_datactl repack --source=source_path --t-calg=target_compression_algorithm --target=target_path
[--calg=compression_algorithm] [--clevel=compression_level]
[--zero-on-error] [--log-level=logging_level] [--in-place] [--help]

Changes the compression algorithm and/or compression level of CFS files.

Returns the following information:

  • Compression algorithm ID

  • Compression level

  • Copy status: 1 (complete) or 0 (incomplete)

If interrupted, the process can resume from the point of failure by re-executing the command with identical compression options (both --t-calg and --calg values), leveraging per-datafile .status files to maintain the operation state.

-s=source_path
--source=source_path

Specifies the path to a compressed file or directory.

Note

The path must be in the directory named PG_version_date that contains the pg_compression file.

--t-calg=target_compression_algorithm

Specifies the target compression algorithm. If omitted, the files will be decompressed (unpacked).

-t=target_path
--target=target_path

Specifies the output location for recompressed files.

-c=compression_algorithm
--calg=compression_algorithm

Specifies the current compression algorithm. If omitted, the repack command will take this value from the pg_compression file.

--log-level=logging_level

Sets the logging level. Possible values: debug, info, warning, and error.

--clevel=compression_level

Specifies the compression level for the chosen target compression algorithm. Possible values depend on the compression algorithm specified:

  • 0–9 for zlib

  • 1 for pglz

  • 0–12 for lz4

  • -131072–22 for zstd

The value of 0 sets the default compression level, which is 1, for all compression algorithms.

Default: 1

--zero-on-error

If this option is specified, any corrupted blocks are replaced with zeros, preventing the process termination or error messages.

--in-place

If this option is specified while target_path is omitted, the recompressed files will be placed in the source directory.

Warning

Using this option will remove all the source files from the source directory, thus should be considered carefully.

G.2.3.7. stat

pgpro_datactl stat --source=source_path --cfs
[--log-level=logging_level] [--per-file] [--help]

Collects tablespace statistics and outputs the results as a CSV file.

This command does not modify any files (read-only operation). No locks are acquired.

-s=source_path
--source=source_path

Specifies the path to a compressed file or directory.

Note

The path must be in the directory named PG_version_date that contains the pg_compression file.

--cfs

Specifies the tablespace type.

Note

Currently only the CFS type support is implemented.

--log-level=logging_level

Sets the logging level. Possible values: debug, info, warning, and error.

--per-file

Sets the detailed statistics (block size and number of blocks) to be displayed for each file. The result is saved to a file with the cfs_per_file_stat_ prefix.

G.2.3.8. unpack

pgpro_datactl unpack --source=source_path --target=target_path
[--calg=compression_algorithm] [--zero-on-error] [--log-level=logging_level] [--help]

Unpacks CFS files.

-s=source_path
--source=source_path

Specifies the path to a compressed file or directory.

Note

The path must be in the directory named PG_version_date that contains the pg_compression file.

-t=target_path
--target=target_path

Specifies the path to a directory where the unpacked files will be placed.

If the source and target directories are the same, files are unpacked with a .dec extension.

-c=compression_algorithm
--calg=compression_algorithm

Specifies the compression algorithm used. If omitted, the unpack command will take this value from the pg_compression file.

--zero-on-error

If this option is specified, any corrupted blocks are replaced with zeros, preventing the process termination or error messages.

--log-level=logging_level

Sets the logging level. Possible values: debug, info, warning, and error.

Example:

pgpro_datactl unpack -s /path/to/archive.cfs -t /path/to/destination -c zstd

In this example, unpack extracts archive.cfs compressed with the zstd algorithm into the directory /path/to/destination.

G.2.4. pgpro_datactl Error Codes

Table G.2 displays the error code ranges defined in pgpro_datactl.

According to the SQL standard, the first two characters of an error code denote the class of errors, while the last three characters indicate a specific condition within that class.

Table G.2. pgpro_datactl Error Code Ranges

Error CodeCondition
00XXXSuccess
10XXXWarning
20XXXCommand-line error
30XXXDatabase error
40XXXSource data error
50XXXTarget data error
60XXXFilesystem/IO error
70XXXRuntime error

Table G.3 lists all specific error codes and the actions that should be taken if these errors occur.

pgpro_datactl uses the PRODCTL-XXXXX format. Warnings return a nonzero exit code but allow the utility to progress. Errors stop the current operation.

Table G.3. pgpro_datactl Error Codes

Error CodeMessageCauseAction
PRODCTL-00000SuccessThe operation completed successfully.No action required.
PRODCTL-10001File not foundAn optional input path is missing.Check the path. If intentional, the warning can be ignored.
PRODCTL-10002Failed to read fileA partial read, permission error, or transient IO issue occurred.Retry the operation. Check user permissions and disk status.
PRODCTL-10003Invalid parameterA noncritical argument is out of range or malformed.Enter a valid value, or a default value will be used to proceed.
PRODCTL-10004Failed to allocate memoryThe system was low on memory during a noncritical step.Reduce concurrency or data size.
PRODCTL-10005Failed to create cwdThe current or working directory could not be established.Check the path, permissions, and available disk space.
PRODCTL-10006File contains invalid contentAn optional file failed validation.Replace or skip the file.
PRODCTL-10007Failed to remove fileThe file is locked, or permission was denied.Close any handles to the file and adjust permissions.
PRODCTL-10008Parameter required but not setAn optional subtask required a value but used the default instead.Provide the missing option to avoid the fallback behavior.
PRODCTL-20001Required parameter is missingA mandatory command-line option is absent.Supply the required option and rerun the command.
PRODCTL-20002Invalid parameterA command-line option has an invalid value.Enter a valid value according to the documentation.
PRODCTL-30001Database connection failedA network, authentication, or DSN error occurred.Check authentication, network connectivity, and SSL configuration.
PRODCTL-30002Database query failedAn SQL error, timeout, or permission issue occurred.Check the SQL and logs. Fix schema or permission issues.
PRODCTL-40001Failed to read source fileAn IO error occurred on input.Make sure the path exists and the user has access to it. Retry.
PRODCTL-40002Source file not foundA required input file is missing.Provide the correct file or fix the path.
PRODCTL-40003Failed to construct source pathThe root path, environment variable, or template is invalid.Fix the configuration and path template.
PRODCTL-40004File contains invalid contentRequired input failed validation, or a checksum mismatch occurred.Replace the file or fix its format.
PRODCTL-50001Target file write failedA permission, disk quota, or disk error occurred.Free up disk space, fix permissions, and retry.
PRODCTL-50002Target file not foundThe expected output path, or its parent directory is missing.Create the necessary directories, verify the target path.
PRODCTL-60001File remove failedPermission was denied, or the file is in use.Release any locks on the file, adjust permissions.
PRODCTL-60002Open file failedThe file is missing, or access was denied.Make sure the file exists, check permissions.
PRODCTL-60003Open stream failedA pipe, socket, or file handle was not available.Validate the endpoint, check permissions.
PRODCTL-60004File write failedA short write or general IO error occurred.Retry. Check the disk and filesystem health.
PRODCTL-60005File fsync failedFlushing data to persistent storage failed.Inspect the disk, check mount options, review kernel logs (dmesg).
PRODCTL-60006File create failedThe parent directory is missing, or permissions are incorrect.Make sure the parent directory exists, fix umask or permissions.
PRODCTL-60007File rename failedThe source file does not exist, or permissions are insufficient.Make sure the source file exists, verify permissions. Retry.
PRODCTL-60008Failed to map fileA memory-mapping error occurred, or the address space is low.Switch to buffered IO or reduce the mapped data size.
PRODCTL-60009Failed to stat fileThe path is invalid, or permissions are insufficient.Check the path and user permissions.
PRODCTL-70001Memory allocation failedThe system ran out of memory on a critical path.Reduce parallelism or increase system memory limits.
PRODCTL-70002Failed to create directoryThe parent directory is missing, or permissions are insufficient.Create a parent directory, check permissions.
PRODCTL-70003Failed to open directoryThe path is not a directory, or access was denied.Make sure the path is a directory with proper access.