1.2. When to use #

Shardman provides horizontal scalability with a view and consistency of a single database. Applications can use every node to access the distributed database and operate mostly the same way as with a single PostgreSQL instance. Still internally it is a distributed system that imposes certain rules on designing schema and writing queries. The main direction of adoption is to localize the data and the computations.

The following properties of a database or workload should be marks to consider a distributed system:

  • The working set of data does not fit in RAM of one server. Sharded systems can have much bigger total size of RAM.

  • Maintenance operations such as vacuum take too long. Shardman utilizes partitioned tables under the hood. Maintenance operations can be parallelized by nodes and partitions of tables.

  • Number of read sessions is too large for one instance of PostgreSQL. Shardman allows to distribute read sessions across the cluster and handle internal connections very efficiently with multiplexing transport.

  • Intensive write operations. Sharded systems can have much bigger total number of disk IOPS.

  • CPU intensive queries. Shardman allows to distribute calculations by nodes and reduce execution time for complex queries.

When Shardman is not appropriate:

  • Vertical scaling is economically and technically possible.

  • Data model and workload require a lot of cross-shard transactions.

  • Complex analytics, in particular joins of sharded tables when conditions don't include the sharding key.

  • Multi-DC/Multi-region deployments.

pdf