F.45. pg_transfer

The pg_transfer extension enables quick transfer of tables between Postgres Pro Enterprise instances.

It may sometimes be required to load large volumes of data into a database, for example, when consolidating data from regional servers into a central one. If your data needs to be uploaded to a server under heavy load, you can use a temporary database on another server to accumulate the data and then transfer it all at once when the main server load is minimal.

If you use pg_dump and pg_restore applications to transfer data, the load on the receiving server is usually higher than that on the sending server. Since the data is loaded using INSERT or COPY commands, it creates a significant impact on the disk subsystem. Besides, you will have to re-build indexes in the target database once all the data is loaded, which will also contribute to the server load.

To achieve a much higher load speed for read-only data, you can use the pg_transfer extension while doing dump/restore, which allows to copy data files directly, without using COPY/INSERT commands. pg_transfer also provides auxiliary functions for pg_dump and pg_restore to transfer tables together with pre-built indexes and collected statistics, so you can avoid extra load on the target server incurred by statistics re-collection.

Note

Postgres Pro Enterprise configuration and the architectures of source and target servers must provide binary-compatible file formats. When restoring data, pg_transfer checks that source and target servers have the same alignment, page and segment sizes, etc.

F.45.1. Installation

pg_transfer is included into Postgres Pro Enterprise. To enable pg_transfer, create its extension in your database using the following SQL command:

CREATE EXTENSION pg_transfer;

You must create pg_transfer extension in both source and target databases.

F.45.2. Usage

To transfer data using the pg_transfer module, do the following:

  1. Prepare source and target systems for data transfer

    Before the actual data transfer, you first need to transfer data schema to the target server:

    1. In the source database, mark the table to be transferred as read-only:

      ALTER TABLE table_name SET CONSTANT;
      

    2. Run the VACUUM (ANALYZE) command to remove dead tuples and refresh statistics:

      VACUUM (ANALYZE) table_name;
      

    3. Take a logical dump of the data schema on the source server and restore it on the target server:

      pg_dump -Fc -t table_name --schema-only -f transfer_dir/archive.out old_database
      pg_restore -d new_database --schema-only transfer_dir/archive.out
      

  2. Prepare TOAST identifiers for the transfer

    This step is only required if the tables to be transferred have toasted values.

    1. Determine TOAST identifiers (reltoastid) in the new database:

      psql new_database -c "SELECT reltoastrelid FROM pg_class WHERE relname='table_name';"
      

    2. Using the received reltoastid identifiers, prepare the table for the transfer and force data flush to disk:

      psql -d old_database -c "SELECT pg_transfer_freeze('table_name'::regclass::oid, reltoastrelid::oid);"
      

  3. Transfer the data to the target system

    1. Copy the data into a separate directory using pg_dump utility:

      pg_dump -Fc -t table_name --transfer-dir transfer_dir/ -f transfer_dir/archive.out old_database
      

    2. Restore the data in the target database:

      pg_restore -d new_database --data-only --transfer-dir transfer_dir/ --copy-mode-transfer transfer_dir/archive.out
      

      When source and target databases are located in the same file system, the --copy-mode-transfer option must be specified at least once (for either pg_dump or pg_restore command) to get an independent copy of data. When restoring data on the primary server, --generate-wal option must be specified for pg_restore for changes to be replicated to a standby server.

F.45.3. Compatibility

The pg_transfer extension is only supported on Linux systems.

F.45.4. Authors

Postgres Professional, Moscow, Russia