1.2. When to use #
Shardman provides horizontal scalability with a view and consistency of a single database. Applications can use every node to access the distributed database and operate mostly the same way as with a single PostgreSQL instance. Still internally it is a distributed system that imposes certain rules on designing schema and writing queries. The main direction of adoption is to localize the data and the computations.
The following properties of a database or workload should be marks to consider a distributed system:
The working set of data does not fit in RAM of one server. Sharded systems can have much bigger total size of RAM.
Maintenance operations such as vacuum take too long. Shardman utilizes partitioned tables under the hood. Maintenance operations can be parallelized by nodes and partitions of tables.
Number of read sessions is too large for one instance of PostgreSQL. Shardman allows to distribute read sessions across the cluster and handle internal connections very efficiently with multiplexing transport.
Intensive write operations. Sharded systems can have much bigger total number of disk IOPS.
CPU intensive queries. Shardman allows to distribute calculations by nodes and reduce execution time for complex queries.
When Shardman is not appropriate:
Vertical scaling is economically and technically possible.
Data model and workload require a lot of cross-shard transactions.
Complex analytics, in particular joins of sharded tables when conditions don't include the sharding key.
Multi-DC/Multi-region deployments.