2.4. Analyzing and Vacuuming #

Shardman databases require periodic maintenance, known as vacuuming.

2.4.1. Autovacuum Daemon #

For many installations, it is sufficient to let vacuuming be performed by the autovacuum daemon. As in PostgreSQL installations, the autovacuum daemon will issue ANALYZE commands automatically whenever the content of a table has changed sufficiently. The ANALYZE command transfers statistics for foreign partitions from remote nodes.

Rebalance process can move or copy data between cluster nodes. After this operation, all transferred objects are analyzed automatically. As usual, local statistics are gathered, and remote statistics are fetched from foreign servers.

2.4.2. Analyzing the Whole Database #

When running for the entire database, the ANALYZE command calculates statistics for local partitions of sharded tables on the current node and for global tables. It also transfers statistics for foreign partitions available on remote nodes. However, note that remote nodes may lack the statistics.

Still, running ANALYZE on each node may not give you full cluster-wide statistics, as they may have not been calculated on some nodes. Also, global tables are analyzed multiple times (for each node).

Instead, use the shardman.global_analyze() function to perform cluster-wide analysis. First, this function executes ANALYZE on each node for all local partitions of sharded tables, then sends these statistics to other nodes. Next, it selects one node per a global table and runs ANALYZE of this table on the selected node. Gathered statistics are broadcast to all other nodes in the cluster.

2.4.3. Analyzing Global Tables #

When ANALYZE is run on a global table, only statistics for the corresponding local table on the current node are updated.

2.4.4. Analyzing Sharded Tables #

When ANALYZE is run on a sharded table, statistics on its local partitions are updated, statistics for foreign partitions are transferred from remote nodes, if remote nodes have it. If statistics are missing on remote nodes, they are not calculated. If you run ANALYZE for the local table, only its statistics are updated.

When ANALYZE is run on a foreign table directly and a remote node does not have any statistics for the corresponding local table, this local table is analyzed remotely. Then statistics are transferred from the remote node.

2.4.5. Vacuuming #

The VACUUM command running for the whole database performs vacuuming on the current node.

You can broadcast this command to perform cluster-wide vacuuming on all nodes. Set the shardman.broadcast_ddl configuration parameter to on, run VACUUM, and set shardman.broadcast_ddl to off.

When VACUUM is run on a sharded or global table, the statement is broadcast automatically. On all nodes, the command vacuums all table partitions of the sharded table and all local tables of the global table.