53.1. Table Types #
In a distributed database managed by Postgres Pro Shardman the following special table types are used: unmanaged regular tables that are not managed on the cluster level, and cluster-wide managed tables, which can be sharded or internal.
53.1.1. Sharded Tables #
Sharded tables are distributed regular PostgreSQL partitioned tables where a few partitions, making up a shard, are regular unmanaged tables and the other partitions are foreign tables available from remote servers. Sharded tables are registered in the shardman.sharded_tables dictionary. Use the CREATE TABLE statement with the distributed_by parameter to create a sharded table. Several sharded tables can be created as colocated. This means that they have the same number of partitions and that their partitions corresponding to the same sharding key should reside together. During a rebalance, Postgres Pro Shardman management utilities ensure that corresponding partitions of colocated tables are moved to the same node. (Such a rebalance happens, for example, when a new node is added to the cluster). Colocation is necessary to ensure that joins of several tables are propagated to the node where the actual data resides. To define one sharded table colocated with another one, first, create one table and then use the colocate_with parameter of the CREATE TABLE statement while creating the second table. Chains of colocated tables are not supported, all related tables should be marked as colocated to one of the tables instead. Note that colocate_with property is symmetric and transitive.
Note
Mind some limitations for distributed systems. Temporary tables cannot participate in distributed transactions. Regular Postgres Pro Shardman can be joined with sharded tables on local nodes, yet they cannot have common keys or indexes with them.
53.1.1.1. Partitions #
A sharded table consists of several partitions. Some of them are regular tables, and others are foreign tables. By default, the number of partitions is determined by the shardman.num_parts parameter, but it can be overwritten by the num_parts CREATE TABLE parameter. Most of DDL operations are restricted on partitions of a sharded table. You should modify the parent table instead.
The number of partitions in a sharded table is defined when it is created and cannot be changed afterwards. When new nodes are added to the cluster, some partitions are moved from existing nodes to the new ones to balance the load. So, to allow scaling of clusters, the initial number of partitions should be high enough, but not too high since an extremely large number of partitions significantly slows down query planning. For example, if you expect the number of nodes in your cluster to grow by 4 times at a maximum, create sharded tables with the number of partitions equal to 4 * N, where N is the number of nodes. A cluster becomes unable to scale when the number of cluster nodes reaches the number of partitions in the sharded table with the minimal number of them.
53.1.1.2. Replicated Tables #
Replicated tables are available to all nodes of a cluster. A replicated table is a set of regular tables synchronized by triggers. The main use case for a replicated table is to store a relatively rarely updated set of data that is used by all cluster nodes. When a sharded table is joined to a replicated table, joins between sharded table partitions and the replicated table can be performed on nodes where individual partitions reside. The implementation of trigger-based replication requires a non-deferrable primary key on a replicated table to be defined. Currently when a replicated table is modified, an after-statement trigger fires and propagates changes to other nodes of the cluster via foreign tables. When new nodes are added to a cluster, replicated table data is transferred to the new nodes via logical replication. When some nodes are removed from a cluster, replicated tables get locked for writes for a brief time. Use the distributed_by or replicated CREATE TABLE parameters to create a replicated table. Replicated tables created with replicated are registered in the shardman.global_tables dictionary. Partitioned replicated tables are not supported.
53.1.1.3. Distributed DDL #
Postgres Pro Shardman extension allows creating several kinds of managed objects. These are sharded and replicated tables, roles, sequences, and tablespaces. The list of operations allowed on managed objects is limited particularly to protect consistency of a managed schema, type or function. Note that the roles and tablspaces are managed on a higher level. For the same reason, most operations on managed objects are cluster-wide. The list of cluster-wide operations includes:
CREATEfor sharded and replicated tables, managed roles and tablespaces, or indexes on sharded or replicated tables.DROPfor sharded and replicated tables, managed roles and tablespaces, or indexes on sharded or replicated tables.ALTER TABLEfor sharded and replicated tables.ALTER TABLESPACEfor replicated tablespaces.ALTER ROLEfor managed roles.RENAMEfor sharded and replicated tables or indexes on them.SET CONSTRAINTS ALLinside a transaction block.
These configuration settings control execution of the distributed DDL: shardman.broadcast_ddl and shardman.sync_schema. The first one can be used for a cluster-wide broadcast of all regular DDL operations (for example, creating schemas or functions). The second one controls broadcasting of statements related to managed objects and should never be turned off without consulting the Postgres Pro Shardman support team.
Distributed tables have some specifics. Such table is available on all cluster nodes. As partitioned table, it has a number of partitions, with this number being a constant that cannot be changed without a restart. Each shard usually stores several partitions of this shard, while other partitions are stored elsewhere. Mind that these partitions are interconnected.
53.1.1.4. Colocated Tables #
Postgres Pro Shardman colocated table is a partitioned table that consists of a number of partitions that are colocated with the partitions of a distributed table for the quick access. In this case, partitioning is usually made with partition, sharding, or foreign key for efficiency purposes. These are recommended solutions for larger JOIN.