pg_repack

pg_repack — utility and Postgres Pro Enterprise extension to reorganize tables

Synopsis

pg_repack [option...] [dbname]

Description

pg_repack is a Postgres Pro Enterprise extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

pg_repack is a fork of the previous https://github.com/reorg/pg_reorg project.

You can choose one of the following methods to reorganize data:

  • Online CLUSTER (ordered by cluster index)

  • Ordering by specified columns

  • Online VACUUM FULL (packing rows only)

  • Rebuild or relocate only the indexes of a table

Note

Only superusers can use the utility.

Note

Target table must have a PRIMARY KEY, or at least a UNIQUE total index on a NOT NULL column.

Installation

On Linux systems, pg_repack is provided together with Postgres Pro Enterprise as a separate pre-built package pg-repack-ent-13 and requires the postgrespro-ent-13-server package to be installed with all the dependencies. For the list of available packages and detailed installation instructions, see Chapter 17. On Windows systems, pg_repack is automatically installed as part of Postgres Pro.

Once you have pg_repack installed, load the pg_repack extension in the database you want to process, as follows:

$ psql -c "CREATE EXTENSION pg_repack" -d your_database

You can later remove pg_repack from a Postgres Pro installation using DROP EXTENSION pg_repack.

If you are upgrading from a previous version of pg_repack, just drop the old version from the database as explained above and install the new version.

Options

Reorganization Options

-a
--all

Attempt to repack all the databases of the cluster. Databases where the pg_repack extension is not installed will be skipped.

-t table
--table=table

Reorganize the specified table(s) only. Multiple tables may be reorganized by writing multiple -t switches. By default, all eligible tables in the target databases are reorganized.

-I table
--parent-table=table

Reorganize both the specified table(s) and its inheritors. Multiple table hierarchies may be reorganized by writing multiple -I switches.

-c schema
--schema=schema

Repack the tables in the specified schema(s) only. Multiple schemas may be repacked by writing multiple -c switches. Can be used in conjunction with --tablespace to move tables to a different tablespace.

-o column[, ...]
--order-by=column[, ...]

Perform an online CLUSTER ordered by the specified columns.

-n
--no-order

Perform an online VACUUM FULL. Since version 1.2 this is the default for non-clustered tables.

-N
--dry-run

Show what would be repacked and exit.

-j num_jobs
--jobs=num_jobs

Create the specified number of extra connections to Postgres Pro, and use these extra connections to parallelize the rebuild of indexes on each table. Parallel index builds are only supported for full-table repacks, not with --index or --only-indexes options. If your server has extra cores and disk I/O available, this can be a useful way to speed up pg_repack.

-s tablespace
--tablespace=tablespace

Move the repacked tables to the specified tablespace: essentially an online version of ALTER TABLE ... SET TABLESPACE. The tables' indexes are left in the original tablespace unless --moveidx is specified too.

-S
--moveidx

Also move the indexes of the repacked tables to the tablespace specified by the --tablespace option.

-i index
--index=index

Repack the specified index(es) only. Multiple indexes may be repacked by writing multiple -i switches. May be used in conjunction with --tablespace to move the index(es) to a different tablespace.

-x
--only-indexes

Repack only the indexes of the specified table(s), which must be specified with the --table option.

-T secs
--wait-timeout=secs

pg_repack needs to take an exclusive lock at the end of the reorganization. This setting controls how many seconds pg_repack will wait to acquire this lock. If the lock cannot be taken after this duration and --no-kill-backend option is not specified, pg_repack will forcibly cancel the conflicting queries. If you are using Postgres Pro or PostgreSQL version 8.4 or newer, pg_repack will fall back to using pg_terminate_backend() to disconnect any remaining backends after this timeout has passed twice. The default is 60 seconds.

-D
--no-kill-backend

Skip repacking a table if the lock cannot be taken for duration specified in --wait-timeout, instead of cancelling conflicting queries. The default is false.

-Z
--no-analyze

Disable ANALYZE after a full-table reorganization. If not specified, ANALYZE is executed after the reorganization.

-k
--no-superuser-check

Skip the superuser checks in the client. This setting is useful for using pg_repack on platforms that support running it as non-superusers.

-C
--exclude-extension

Skip tables that belong to the specified extension(s). Some extensions may heavily depend on such tables at planning time etc.

--switch-threshold

Switch tables when that many tuples are left in the log table. This setting can be used to avoid the inability to catch up with write-heavy tables.

Connection Options

Options to connect to servers. You cannot use --all and --dbname or --table or --parent-table together.

-a
--all

Reorganize all databases.

[-d] dbname
[--dbname=]dbname

Specifies the name of the database to be reorganized. If this is not specified and -a (or --all) is not used, the database name is read from the environment variable PGDATABASE. If that is not set, the user name specified for the connection is used.

-h host
--host=host

Specifies the host name of the machine on which the server is running. If the value begins with a slash, it is used as the directory for the Unix domain socket.

-p port
--port=port

Specifies the TCP port or local Unix domain socket file extension on which the server is listening for connections.

-U username
--username=username

User name to connect as.

-w
--no-password

Never issue a password prompt. If the server requires password authentication and a password is not available by other means such as a .pgpass file, the connection attempt will fail. This option can be useful in batch jobs and scripts where no user is present to enter a password.

-W
--password

Force pg_repack to prompt for a password before connecting to a database.

This option is never essential, since pg_repack will automatically prompt for a password if the server demands password authentication. However, pg_repack will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing -W to avoid the extra connection attempt.

Generic Options

-e
--echo

Echo the commands that pg_repack generates and sends to the server.

-E level
--elevel=level

Choose the output message level from DEBUG, INFO, NOTICE, WARNING, ERROR, LOG, FATAL, and PANIC. The default is INFO.

--help

Show help about pg_repack command line arguments, and exit.

-V
--version

Print the pg_repack version and exit.

Environment

PGDATABASE
PGHOST
PGPORT
PGUSER

Default connection parameters

This utility, like most other Postgres Pro utilities, also uses the environment variables supported by libpq (see Section 36.14).

Examples

Perform an online CLUSTER of all the clustered tables in the database test, and perform an online VACUUM FULL of all the non-clustered tables:

$ pg_repack test

Perform an online VACUUM FULL on the tables foo and bar in the database test (an eventual cluster index is ignored):

$ pg_repack --no-order --table foo --table bar test

Move all indexes of table foo to tablespace tbs:

$ pg_repack -d test --table foo --only-indexes --tablespace tbs

Move the specified index to tablespace tbs:

$ pg_repack -d test --index idx --tablespace tbs

Diagnostics

Error messages are reported when pg_repack fails. The following list shows the cause of errors.

You need to cleanup by hand after fatal errors. To cleanup, just remove pg_repack from the database and install it again.

For Postgres Pro or PostgreSQL 9.1 and newer execute:

DROP EXTENSION pg_repack CASCADE

in the database where the error occurred, followed by

CREATE EXTENSION pg_repack

For previous versions load the script $SHAREDIR/contrib/uninstall_pg_repack.sql into the database where the error occurred and then load $SHAREDIR/contrib/pg_repack.sql again.


INFO: database "db" skipped:
pg_repack VER is not installed in the database:
pg_repack is not installed in the database when the --all option is specified.

Create the pg_repack extension in the database.


ERROR: pg_repack VER is not installed in the database:
pg_repack is not installed in the database specified by --dbname

Create the pg_repack extension in the database.


ERROR: program 'pg_repack V1' does not match database library 'pg_repack V2':
There is a mismatch between the pg_repack binary and the database library
(.so or .dll).

The mismatch could be due to the wrong binary in the PATH or the wrong database being addressed. Check the program directory and the database; if they are what expected you may need to repeat pg_repack installation.


ERROR: extension 'pg_repack V1' required, found 'pg_repack V2':
The SQL extension found in the database does not match the version required by the pg_repack program.

You should drop the extension from the database and reload it as described in the section called “Installation”.


ERROR: relation "table" must have a primary key or not-null unique keys:
The target table doesn't have a PRIMARY KEY or any UNIQUE constraints defined.

Define a PRIMARY KEY or a UNIQUE constraint on the table.


ERROR: query failed: ERROR: column "col" does not exist:
The target table doesn't have columns specified by --order-by option.

Specify existing columns.


WARNING: the table "tbl" already has a trigger called z_repack_trigger:
The trigger was probably installed during a previous attempt
to run pg_repack on the table which was interrupted
and for some reason failed to clean up the temporary objects.

You can remove all the temporary objects by dropping and re-creating the extension: see the section called “Installation” for the details.


WARNING: trigger "trg" conflicting on table "tbl":
The target table has a trigger whose name follows z_repack_trigger
in alphabetical order.

The z_repack_trigger should be the last BEFORE trigger to fire. Please rename your trigger so that it sorts alphabetically before pg_repack one; you can use:

ALTER TRIGGER zzz_my_trigger
 ON sometable RENAME TO yyy_my_trigger;


ERROR: Another pg_repack command may be running on the table. Please try again later.

There is a chance of deadlock when two concurrent pg_repack commands are run on the same table. So, try to run the command after some time.


WARNING: Cannot create index "schema"."index_xxxxx", already exists
DETAIL: An invalid index may have been left behind by a previous
pg_repack on the table which was interrupted. Please use DROP INDEX
"schema"."index_xxxxx" to remove this index and try again.

A temporary index apparently created by pg_repack has been left behind, and we do not want to risk dropping this index ourselves. If the index was in fact created by an old pg_repack job which didn't get cleaned up, you should just use DROP INDEX and try the repack command again.

Restrictions

pg_repack comes with the following restrictions.

Temporary tables

pg_repack cannot reorganize temporary tables.

GiST indexes

pg_repack cannot cluster tables by GiST indexes.

DDL commands

You will not be able to perform DDL commands of the target table(s) except VACUUM or ANALYZE while pg_repack is working. pg_repack will hold an ACCESS SHARE lock on the target table during a full-table repack, to enforce this restriction.

If you are using version 1.1.8 or earlier, you must not attempt to perform any DDL commands on the target table(s) while pg_repack is running. In many cases pg_repack would fail and rollback correctly, but there were some cases in these earlier versions which could result in data corruption.