25.1. Architecture #

With built-in high-availability capabilities, Postgres Pro Shardman allows creating a cluster with leader nodes (there can be several leader nodes due to the distributed nature of the system) and several follower nodes. The leader is the primary server of the BiHA cluster, while followers are its replicas.

The shardmanctl utility is used to initialize the Postgres Pro Shardman cluster and create the leader, add followers, convert existing cluster nodes into the leader or the follower on every shard as well as check the cluster node status. The leader is available for read and write transactions, while followers replicate data from the leader in the synchronous mode.

Along with leader and folower, you can set your Postgres Pro Shardman nodes as referee nodes. They are nodes participating in the elections in two modes:

  • The referee mode. In this mode, the node only takes part in elections of the leader and does not participate in data replication, and no replication slots are created on the leader and follower nodes for the referee.

  • The referee_with_wal mode. In this case, the node participates both in the leader elections, in the same way as in the referee mode, and data replication and receives the entire WAL from the leader node.

Note that the postgres DB can also be present on a referee node.

Physical streaming replication implemented in BiHA ensures high availability by providing protection against server failures and data storage system failures. During physical replication, WAL files of the leader node are sent synchronously to the follower node and applied there. In case of synchronous replication, with each commit a user waits for the confirmation from the followers that the transaction is committed. The followers in the BiHA cluster can be used to:

  • Create in-memory tables open for write transactions.

  • Prepare a follower node backup.

  • Restore bad blocks of data on the leader node by receiving them from the follower node.

  • Check corrupt records in WAL files.

Physical streaming replication implemented in BiHA provides protection against several types of failures:

  • Leader node failure. In this case, a follower node is promoted and becomes the new leader of the cluster. It can be fixed manually with shardmactl shard -s shard-2 switch. To switch to a specific shard, use shardmactl shard -s shard-2 switch --new-primary follower2. It can also be done automatically by means of elections.

  • Network failure between the leader node and follower nodes. To fix it, run shardmactl shard -s shard-2 replicas reinit -n follower2.

  • Follower node failure. This failure causes the transaction on the leader node to stop. This happens because the leader stops receiving transaction confirmations from the follower and the transaction fails to end. The fix is the same as for the network failure, to reinitialize the follower.

25.1.1. Postgres Pro Shardman Configuration with BiHA #

The cluster topology is specified in sdmspec, see example. If it is done beforehand, the shardmnctl init command will initialize a cluster with the specified topology. If not, it can be done manually:

shardmanctl init -f sdmspec.json --(creates an empty cluster)
shardanctl nodes add -n master1,master2
shardmactl shard -s shard-1 add -n follower1
shardmactl shard -s shard-2 add -n follower2
shardmactl shard -s shard-1 add -w ref1
shardmactl shard -s shard-2 add -w ref2

Warning

Under no circumstances you should edit shared_preload_libraries as it can result in the entire cluster crash, because it has four crucial elements that must not be disabled: Shardman, postgres_fdw, BiHA, and pgpro_bindump.

postgresql.conf and pg_hba.biha.conf are also included in the sdmspec.

To register a Postgres Pro Shardman cluster in the etcd store, see Registering a Postgres Pro Shardman Cluster.

When initialized by your cluster software, a biha_db database is automatically created. It is merely an operational DB you should not interfere with.

Replication slots with names set in the biha_node_id format are created. These slots are managed automatically without the need to modify or delete them manually.

postgresql.biha.conf and pg_biha/biha.conf files are necessary configuration files for BiHA, they must not be edited manually.

25.1.2. Elections #

Elections are a process conducted by shardmand to automatically determine a new leader node when the current leader is down. Elections are held based on the cluster quorum, which is the minimum number of nodes that participate in the leader election.

Note that you cannot configure quorum, as it is automatically done by shardmanctl. The quorum is calculated as:

number of instatnces on one shard / 2 + 1 = number of nodes required for quorum