C.1. General Questions #
- C.1.1. What is Shardman?
- C.1.2. What does Shardman consist of?
- C.1.3. When to use Shardman?
- C.1.4. When is Shardman not appropriate?
- C.1.5. How many nodes does it take to deploy Shardman?
- C.1.6. Does Shardman support fault tolerance?
- C.1.7. How is sharding structured?
- C.1.8. Is it possible to change the number of partitions?
- C.1.9. Does Shardman support resharding?
- C.1.10. Is it possible to convert an unsharded (local) table to a sharded one?
- C.1.11. Does Shardman support adding and removing shards?
- C.1.12. What is the status of data balancing?
- C.1.13. How is a Shardman cluster accessed?
- C.1.14. How is balancing between cluster nodes implemented?
- C.1.15. Is mass data loading supported in Shardman?
- C.1.2. What does Shardman consist of?
C.1.1. What is Shardman? #
Shardman is a PostgreSQL-based distributed database management system (DBMS) that implements sharding. Sharding is a database design principle where rows of a table are held separately in different databases that are potentially managed by different DBMS instances. The main purpose of Shardman is to make querying sharded distributed databases efficient and ease the complexity of managing them.
C.1.2. What does Shardman consist of? #
Shardman is composed of several software components:
PostgreSQL 14 DBMS with a set of patches.
Shardman extension.
Management tools and services, including built-in stolon manager to provide high availability.
C.1.3. When to use Shardman? #
The working volume of data does not fit in the RAM of one server, but several shards can fit (or at least reading is parallelized).
Number of sessions is too large for one instance of PostgreSQL.
Intensive writing to WAL takes place.
Complex logic consuming too much CPU, and one server is not enough.
C.1.4. When is Shardman not appropriate? #
If the memory, session, CPU load can be pulled by a single PostgreSQL server, this will be both faster and simpler. (This applies to testing too!)
C.1.5. How many nodes does it take to deploy Shardman? #
A minimum of three nodes are required to deploy Shardman. One node is required for an etcd cluster (single-node etcd cluster), and a minimum of two nodes is required for the RDBMS cluster. It is possible to reduce the minimum deployment to two nodes by placing etcd on one of the RDBMS cluster nodes. The minimal deployment is described in section Get Started with Shardman.
C.1.6. Does Shardman support fault tolerance? #
Yes, Shardman is fault-tolerant at the level of each shard. Each shard is a fault-tolerant cluster.
C.1.7. How is sharding structured? #
In Shardman, tables are divided into partitions, and the partitions are distributed between shards.
C.1.8. Is it possible to change the number of partitions? #
No, the number of partitions of sharded tables is set when creating them and remains unchanged. If you expect that the amount of data you have will grow significantly, you should create the necessary number of partitions (by default - 20) in advance.
C.1.9. Does Shardman support resharding? #
No, Shardman currently does not support automatic change of a sharding key. In order to change the sharding key, you need to create new tables with a new sharding key and migrate data from old tables to new ones.
C.1.10. Is it possible to convert an unsharded (local) table to a sharded one? #
No, Shardman currently does not support this feature.
C.1.11. Does Shardman support adding and removing shards? #
Minimally a Shardman cluster can consist of a single node without fault tolerance, but such a configuration makes little sense. You can add or remove shards, Shardman will automatically (by default, this is adjustable) redistribute data between nodes. Replicas can be added to Shardman, then shards will be fault-tolerant.
C.1.12. What is the status of data balancing? #
When adding new shards, data will be redistributed between all shards, including new ones.
C.1.13. How is a Shardman cluster accessed? #
Shardman can be accessed through any node in the cluster, all nodes in the cluster are equal. Use the shardmanctl getconnstr command to get the cluster connection string.
C.1.14. How is balancing between cluster nodes implemented? #
There is no built-in balancing solution at the moment. But you can organize balancing at the application level, for example, see JDBC driver options (loadBalanceHosts
). For libpq, this functionality will be implemented in PostgreSQL 16 release.