7.1. Table Types #

In a distributed database managed by Shardman the following special table types are used: sharded tables and global tables.

7.1.1. Sharded Tables #

Sharded tables are just usual PostgreSQL partitioned tables where a few partitions, making up a shard, are regular local tables and the other partitions are foreign tables available from remote servers via postgres_fdw. Sharded tables are registered in the shardman.sharded_tables dictionary. Use the CREATE TABLE statement with the distributed_by parameter to create a sharded table. Several sharded tables can be created as colocated. This means that they have the same number of partitions and that their partitions corresponding to the same sharding key should reside together. During a rebalance, Shardman management utilities ensure that corresponding partitions of colocated tables are moved to the same node. (Such a rebalance happens, for example, when a new node is added to the cluster). Colocation is necessary to ensure that joins of several tables are propagated to the node where the actual data resides. To define one sharded table colocated with another one, first, create one table and then use the colocate_with parameter of the CREATE TABLE statement while creating the second table. Chains of colocated tables are not supported, all related tables should be marked as colocated to one of the tables instead. Note that colocate_with property is symmetric and transitive.

7.1.1.1. Partitions #

A sharded table consists of several partitions. Some of them are regular tables, and others are foreign tables. By default, the number of partitions is determined by the shardman.num_parts parameter, but it can be overwritten by the num_parts CREATE TABLE parameter. Most of DDL operations are restricted on partitions of a sharded table. You should modify the parent table instead.

The number of partitions in a sharded table is defined when it is created and cannot be changed afterwards. When new nodes are added to the cluster, some partitions are moved from existing nodes to the new ones to balance the load. So, to allow scaling of clusters, the initial number of partitions should be high enough, but not too high since an extremely large number of partitions significantly slows down query planning. For example, if you expect the number of nodes in your cluster to grow by 4 times at a maximum, create sharded tables with the number of partitions equal to 4 * N, where N is the number of nodes. A cluster becomes unable to scale when the number of cluster nodes reaches the number of partitions in the sharded table with the minimal number of them.

7.1.1.2. Subpartitions #

Partitions of a sharded table can be partitioned by range. In this case, each partition of a sharded table is a partitioned table consisting only of regular or only of foreign subpartitions. All subpartitions of a partition are located on the same node. Use the partition_by CREATE TABLE parameter to specify a column that should be used as a subpartition key column and the partition_bounds parameter to set bounds of the second-level table partitions. New subpartitions can be added or removed from a table as necessary. So you can omit the partition_bounds parameter during table creation and create partitions later using the shardman.create_subpart() function. Other subpartition management functions allow you to drop, detach or attach subpartitions of a sharded table. Subpartition management is cluster-wide.

7.1.2. Global Tables #

Global tables are available to all nodes of a cluster. Now a global table is a set of regular tables synchronized by triggers. The main use case for a global table is to store a relatively rarely updated set of data that is used by all cluster nodes. When a sharded table is joined to a global table, joins between sharded table partitions and the global table can be performed on nodes where individual partitions reside. The implementation of trigger-based replication requires a non-deferrable primary key on a global table to be defined. Currently when a global table is modified, an after-statement trigger fires and propagates changes to other nodes of the cluster via foreign tables. When new nodes are added to a cluster, global table data is transferred to the new nodes via logical replication. When some nodes are removed from a cluster, global tables get locked for writes for a brief time. Use the global CREATE TABLE parameter to create a global table. Global tables are registered in the shardman.global_tables dictionary. Partitioned global tables are not supported.

7.1.3. Distributed DDL #

Shardman extension allows creating several kinds of global objects. These are sharded and global tables, roles and tablespaces. The list of operations allowed on global objects is limited particularly to protect consistency of a global schema. For the same reason, most operations on global objects are cluster-wide. The list of cluster-wide operations includes:

  • CREATE for sharded and global tables, global roles and tablespaces or indexes on sharded or global tables.

  • DROP for sharded and global tables, global roles and tablespaces or indexes on sharded or global tables.

  • ALTER TABLE for sharded and global tables.

  • ALTER TABLESPACE for global tablespaces.

  • ALTER ROLE for global roles.

  • RENAME for sharded and global tables or indexes on them.

  • SET CONSTRAINTS ALL inside a transaction block.

These configuration settings control execution of the distributed DDL: shardman.broadcast_ddl and shardman.sync_schema. The first one can be used for a cluster-wide broadcast of all regular DDL operations (for example, creating schemas or functions). The second one controls broadcasting of statements related to global objects and should never be turned off without consulting the Postgres Pro Shardman support team.

pdf