F.7. biha — built-in high-availability cluster #

biha is a Postgres Pro extension managed by the bihactl utility. Together with a set of core patches, SQL interface, and the biha-background-worker process, which coordinates the cluster nodes, biha turns Postgres Pro into a BiHA cluster — a cluster with physical replication and built-in failover, high availability, and automatic node failure recovery.

As compared to existing cluster solutions, i.e. a standard PostgreSQL primary-standby cluster and a cluster configured with multimaster, the BiHA cluster offers the following benefits:

  • Physical replication.

  • Built-in failover.

  • Dedicated leader node available for read and write transactions and read-only follower nodes.

  • Synchronous and asynchronous node replication.

  • Built-in capabilities of automatic node failure detection, response, and subsequent cluster reconfiguration, i.e. selection of the new cluster leader and making the old leader the read-only follower.

  • No additional external cluster software required.

F.7.1. Architecture #

With built-in high-availability capabilities, Postgres Pro allows creating a cluster with one dedicated leader node and several follower nodes. The bihactl utility is used to initialize the cluster and create the leader, add followers, convert existing cluster nodes into the leader or the follower in the BiHA cluster as well as check the cluster node status. The leader is available for read and write transactions, while followers are read-only and replicate data from the leader in the synchronous or asynchronous mode.

Physical streaming replication implemented in biha ensures high availability by providing protection against server failures and data storage system failures. During physical replication, WAL files of the leader node are sent, synchronously of asynchronously, to the follower node and applied there. In case of synchronous replication, with each commit a user waits for the confirmation from the follower that the transaction is committed. The follower in the biha cluster can be used to:

  • Perform read transactions in the database.

  • Prepare reports.

  • Create in-memory tables open for write transactions.

  • Prepare a follower node backup.

  • Restore bad blocks of data on the leader node by receiving them from the follower node.

  • Check corrupt records in WAL files.

Physical streaming replication implemented in biha provides protection against several types of failures:

  • Leader node failure. In this case, a follower node is promoted and becomes the new leader of the cluster. The promotion can be done both manually using the biha.set_leader function or automatically by means of elections.

  • Follower node failure. If a follower node uses asynchronous replication, the failure by no means affects the leader node. If a follower node uses synchronous replication, this failure causes the transaction on the leader node to stop. This happens because the leader stops receiving transaction confirmations from the follower and the transaction fails to end. For details on how to set up synchronous replication in the BiHA cluster, see Section F.7.3.7.

  • Network failure between the leader node and follower nodes. In this case, the leader node cannot send and follower nodes cannot receive any data. Note that you cannot allow write transactions on follower nodes if users are connected to the leader node. Any changes made on follower nodes will not be restored on the leader node. To avoid this, configure your network with redundant channels. It is best to provide each follower with its own communication channel to avoid single point of failure issues.

In case of an emergency, such as operating system or hardware failure, you can reinstall Postgres Pro and remove the biha extension from shared_preload_libraries to go back to work as soon as possible.

F.7.1.1. Variants of Cluster Configuration #

There are several variants of the cluster configuration.

  • Three and more nodes where one node is the leader and the rest are the followers.

    Below are possible scenarios for the cases of the leader failure or network connection interruption:

    • When the current leader is down, the new leader is elected automatically. To become the leader, a follower must have the highest number of votes. The number of votes must be higher or equal to the value configured in biha.nquorum.

    • In case of network connection interruptions inside the BiHA cluster, the cluster may split into several groups of nodes. In this case, the new leader node is elected in all groups, where the number of nodes is higher or equal to the biha.nquorum value. After the connection is restored, the new leader will be chosen between the old one and the newly elected one depending on the term value. The node with the highest term becomes the new leader. It is recommended to set the biha.minnodes value equal to the biha.nquorum value.

  • Two-node cluster consisting of the leader and the follower.

    Note

    Using two-node clusters is not recommended as such configurations can cause split-brain issues. To avoid such issues, you can add a referee node.

    Below are possible scenarios for the leader or network failures:

    • When the leader is down, the follower node becomes the new leader automatically if the biha.nquorum configuration parameter is set to 1.

    • When network interruptions occur between the leader and the follower, and both the biha.nquorum and biha.minnodes configuration parameters are set to 1, the cluster may split into two leaders available for reads and writes. The referee node helps avoiding such issues.

  • Single-node configuration consisting of the leader only. A possible variant that can be used to wait until follower nodes are configured. Logically, the node cannot be replaced once down, since there are no follower nodes that can become the leader node.

  • Three-node cluster consisting of the leader, the follower, and the referee. The referee is a node used for voting in elections of the new leader, but it cannot become the leader. In case of faults, the cluster with the referee behaves the same way as the three-node cluster (the leader and two followers). To learn more about the referee, see Section F.7.1.3.

Note

  • You can set the leader manually with the biha.set_leader function.

  • The recommended value of biha.nquorum is higher or equal to the half of the cluster nodes.

  • When you add or remove nodes from your cluster, always revise the biha.nquorum value considering the highest number of nodes, but not less than set in nquorum.

F.7.1.2. Elections #

Elections are a process conducted by the follower nodes to determine a new leader node when the current leader is down. As a result of the elections, the follower node with the most records in the WAL becomes the cluster leader. To be elected, a node must have the biha.can_be_leader and biha.can_vote parameters set to true.

Elections are held based on the cluster quorum, which is the minimum number of nodes that participate in the leader election. The quorum value is set in the biha.nquorum parameter when initializing the cluster with the bihactl init command. Nodes with the biha.can_vote parameter set to false are excluded from voting and are ignored by nquorum.

For the elections to begin, the followers must miss the maximum number of heartbeats from the leader set by the biha.set_heartbeat_max_lost function. At this point one of the followers proposes itself as a leader CANDIDATE, and elections begin. If the leader does not receive the set number of heartbeats from the follower in this case, the follower state changes to UNKNOWN for the leader. In a synchronous cluster, you can use the biha.node_priority parameter to prioritize the nodes. If your cluster has only two nodes and you want to avoid potential split-brain issues in case of elections, you can set up a referee node that participates in the elections in the same way as followers. To learn more, see Section F.7.1.3.

For example, if you have a cluster with three nodes where nquorum=2 and one follower node is down, the cluster leader will continue to operate. If the leader is down in such a cluster, two remaining followers start elections. After the new leader node is elected, the node generation specified in the term is incremented for all cluster nodes. More specifically, the new leader and the remaining followers have term=2, while for the old leader the value is left as term=1. Therefore, when the old leader is back in the cluster, it goes through demotion, i.e. turns into a follower.

After the new leader is set, followers of the cluster start receiving WAL files from this new cluster leader. Note that once the new leader is elected, the old leader is demoted and is not available for write transactions to avoid split-brain issues. You can promote the old leader manually using the biha.set_leader function. Both the cluster quorum and the term concepts are implemented in biha based on the Raft consensus algorithm.

F.7.1.3. The Referee Node in the BiHA Cluster #

The biha extension allows you to set up the referee node that participates in the leader elections and helps to avoid potential split-brain issues if your cluster has only two nodes, i.e. the leader and one follower. In this case, use the referee node and set both biha.nquorum and biha.minnodes configuration parameters to 2.

When you create the referee node, only the biha_db database and system tables are copied to the referee node from the leader. The postgres database and user data are not copied.

The biha extension provides two referee operation modes:

  • The referee mode. In this mode, the node only takes part in elections of the leader and does not participate in data replication, and no replication slots are created on the leader and follower nodes for the referee.

  • The referee_with_wal mode. In this case, the node participates both in the leader elections, in the same way as in the referee mode, and data replication and receives the entire WAL from the leader node. If the referee node has the most WAL records in the cluster when the elections begin, i.e. has the greatest LSN, the follower node tries to get missing WAL files from the referee. This process is also important for the referee node to avoid entering the NODE_ERROR state, which may be the case if WALs diverge. For the referee_with_wal, apply lag is NULL and apply ptr cannot be monitored, as the referee does not apply user data.

Regardless of the mode set for the referee, it sends and receives heartbeats over the control channel, including using SSL, participates in the elections in the same way as follower nodes, supports cluster monitoring functions, and must be taken into account when setting the biha.minnodes configuration parameter. Note that the referee is the final state of the node and it cannot be switched to the leader node using the biha.set_leader function, nor can it become the follower node. If for some reason the follower does not see the leader but the referee does, the referee does not allow the follower to become the leader. If the leader node with greater term connects to the referee node, the referee demotes the leader with lower term and makes it the follower.

F.7.2. Setting Up a BiHA Cluster #

The BiHA cluster is set up by means of the bihactl utility. There are several scenarios of using the bihactl utility:

Before you start setting up the BiHA cluster, carefully read Section F.7.2.1.

F.7.2.1. Prerequisites and Considerations #

Before you begin to set up the BiHA cluster, read the following information and perform the required actions if needed:

  • Ensure network connectivity between all nodes of your future BiHA cluster.

    If network isolation is required, when both the control channel and WAL transmission operate in one network, while client sessions with the database operate in another network, configure the BiHA cluster as follows:

    • Use host names resolving to IP addresses of the network for the control channel and WAL.

    • Add the IP address for client connections to the listen_addresses configuration parameter.

  • To avoid any biha-background-worker issues related to system time settings on cluster nodes, configure time synchronization on all nodes.

  • It is not recommended to execute the bihactl commands in the PGDATA directory. The bihactl utility may create the biha_init.log and biha_add.log files in the directory where it is executed. However, the target PGDATA directory must be empty for proper execution of the bihactl commands.

  • The password for the biha_replication_user role in the password file must be the same on all nodes of the BiHA cluster. It is required for connection between the leader node and follower nodes. You can specify the password using one of the following approaches:

    • The secure and recommended way is adding a separate line for each node:

      echo 'hostname:port:biha_db:biha_replication_user:password' >> ~/.pgpass
      
      echo 'hostname:port:replication:biha_replication_user:password' >> ~/.pgpass
      
    • The simple way is adding a single line for all nodes:

      echo '*:*:*:biha_replication_user:password' >> ~/.pgpass
      
  • During operation, biha creates the following service files in the database directory:

    • standby.signal — a file used to start nodes in standby mode. It is required to make biha read-only at the start of Postgres Pro. This file is deleted from the leader node when its state changes to LEADER_RW.

    • biha.state and biha.conf — files in the pg_biha directory required to save the internal state and configuration of biha.

  • During operation, biha uses its own mechanism to modify the Postgres Pro configuration dynamically. Some Postgres Pro parameters are managed by biha and cannot be modified using ALTER SYSTEM, as they are essential for biha operation. These parameters are the following:

    These parameters are stored in the pg_biha/biha.conf file, as well as in the shared memory of the biha process. When these parameters are modified, biha sends the SIGHUP signal for other processes to be informed about the changes. If you modify any other parameters during this change and do not send a signal to reread the configuration, the parameters that you have changed may be unexpectedly reread.

    Postgres Pro behaves as described above only when biha is loaded and configured, i.e., when the extension is present in the shared_preload_libraries variable and the required biha.* parameters are configured. Otherwise, Postgres Pro operates normally.

  • When the BiHA cluster is initialized, biha modifies the Postgres Pro configuration in postgresql.conf and pg_hba.conf. The changes are first included in biha service files postgresql.biha.conf and pg_hba.biha.conf and then processed by the server after the following include directives are specified in postgresql.conf and pg_hba.conf, respectively:

    include 'postgresql.biha.conf'
    include "pg_hba.biha.conf"
    

  • In some operating systems, user session management may be handled by systemd. In this case, if your server is started using pg_ctl and managed remotely, be aware that all background processes initiated within an SSH session will be terminated by the systemd daemon when the session ends. To avoid such behavior, you can do one of the following:

    • Use the postgrespro-ent-17 systemd unit file to start the DBMS server on the cluster node.

    • Modify the configuration of the user session management service called systemd-logind in the /etc/systemd/logind.conf file, specifically, set the KillUserProcesses parameter to no.

F.7.2.2. Setting Up a BiHA Cluster from Scratch #

To set up a BiHA cluster from scratch, perform the following procedures.

Prerequisites

  1. On all nodes of your future cluster, install the postgrespro-ent-17-contrib package. Do not create a database instance.

  2. Ensure that you execute the bihactl command as the same user that will start the Postgres Pro Enterprise server.

    For example, if you start the server as user postgres, the bihactl command must also be run by user postgres.

  3. If you plan to use pg_probackup with biha, install the pg-probackup-ent-17 package.

Initializing the cluster

Use the bihactl init command to initialize the cluster and create the leader node.

  1. Execute the bihactl init command with the necessary options:

    bihactl init \
        --biha-node-id=1 \
        --host=node_1 \
        --port=node_port_number \
        --biha-port=biha_port_number \
        --nquorum=number_of_nodes \
        --pgdata=leader_PGDATA_directory
    

    The initdb utility is accessed, postgresql.conf and pg_hba.conf files are modified.

    When initializing the BiHA cluster, the magic string is generated. For more information on how to use the magic string, see Section F.7.2.6.

  2. Start the DBMS:

    pg_ctl start -D leader_PGDATA_directory -l leader_log_file
    
  3. Check the node status in the biha.status_v view:

    SELECT * FROM biha.status_v;
    

Adding the follower node

  1. Ensure that the leader node is in the LEADER_RO or LEADER_RW state.

  2. Ensure that the password for the biha_replication_user role in the password file matches the password for the same role on the leader node.

  3. Execute the bihactl add command with the necessary options:

    bihactl add \
        --biha-node-id=2 \
        --host=node_2 \
        --port=node_port_number \
        --biha-port=biha_port_number \
        --use-leader "host=leader_host port=leader_port biha-port=leader_biha_port" \
        --pgdata=follower_PGDATA_directory
    

    A backup of the leader node is created by means of pg_basebackup or pg_probackup depending on the value set in the --backup-method parameter. Besides, postgresql.conf and pg_hba.conf files are modified.

    Note

    During this process, all files are copied from the leader to the new node. The larger the database size, the longer it takes to add the follower.

    You can also add the leader node connection data using the magic string. For more information on how to use the magic string, see Section F.7.2.6.

  4. Start the DBMS:

    pg_ctl start -D follower_PGDATA_directory -l follower_log_file
    
  5. Check the node status in the biha.status_v view:

    SELECT * FROM biha.status_v;
    

F.7.2.3. Setting Up a BiHA Cluster from the Existing Cluster with Streaming Replication #

Convert your existing Postgres Pro Enterprise 17 cluster with streaming replication and a configured database instance into a BiHA cluster. After conversion, the primary node of the existing cluster becomes the leader node, and standby nodes become follower nodes.

Prerequisites

If your existing cluster is synchronous, i.e. its synchronous_standby_names parameter is not empty (for example, synchronous_standby_names=walreceiver), do the following before conversion into the BiHA cluster:

  1. Reset the synchronous_standby_names parameter:

    ALTER SYSTEM RESET synchronous_standby_names;
    
  2. From the postgresql.conf file and all the include directives, manually remove the synchronous_standby_names values.

Converting the existing primary node into the leader node

  1. Stop the existing primary node:

    pg_ctl stop -D primary_PGDATA_directory
    
  2. Execute the bihactl init command with the --convert option:

    bihactl init --convert \
        --biha-node-id=1 \
        --host=node_1 \
        --port=PostgresPro_port \
        --biha-port=biha_port_number \
        --nquorum=number_of_nodes \
        --pgdata=leader_PGDATA_directory
    

    When converting the cluster, the magic string is generated. For more information on how to use the magic string, see Section F.7.2.6.

  3. Start the DBMS:

    pg_ctl start -D leader_PGDATA_directory -l leader_log_file
    
  4. Check the node status in the biha.status_v view:

    SELECT * FROM biha.status_v;
    

Converting the existing standby node into the follower node

  1. Ensure that the password for the biha_replication_user role in the password file matches the password for the same role on the leader node.

  2. Stop the existing standby node:

    pg_ctl stop -D standby_PGDATA_directory
    
  3. Execute the bihactl add command with the --convert-standby option:

    bihactl add --convert-standby \
      --biha-node-id=2 \
      --host=node_2 \
      --port=PostgresPro_port \
      --biha-port=5435 \
      --use-leader "host=leader_host port=leader_port biha-port=leader_biha_port" \
      --pgdata=follower_PGDATA_directory
    

    When converting an existing standby node into the follower node, biha creates the follower_PGDATA_directory/pg_biha/biha.conf and follower_PGDATA_directory/pg_biha/biha.state files required for the node to be connected to the cluster and modifies postgresql.conf and pg_hba.conf.

    You can also add the leader node connection data using the magic string. For more information on how to use the magic string, see Section F.7.2.6.

  4. Start the DBMS:

    pg_ctl start -D follower_PGDATA_directory -l follower_log_file
    
  5. Check the node status in the biha.status_v view:

    SELECT * FROM biha.status_v;
    

F.7.2.4. Setting Up a BiHA Cluster from the Existing Database Server #

If your existing Postgres Pro Enterprise 17 server with a configured database has only one node, you can convert it into the leader and then add more nodes to your BiHA cluster using the bihactl add command.

Converting the existing node into the leader node

  1. Stop the existing node:

    pg_ctl stop -D server_PGDATA_directory
    
  2. Execute the bihactl init command with the --convert option:

    bihactl init --convert \
        --biha-node-id=1 \
        --host=node_1 \
        --port=PostgresPro_port \
        --biha-port=biha_port_number \
        --nquorum=number_of_nodes \
        --pgdata=leader_PGDATA_directory
    

    The postgresql.conf and pg_hba.conf files are modified.

    When converting the node, the magic string is generated. For more information on how to use the magic string, see Section F.7.2.6.

  3. Start the DBMS:

    pg_ctl start -D leader_PGDATA_directory -l leader_log_file
    
  4. Check the node status in the biha.status_v view:

    SELECT * FROM biha.status_v;
    

Adding the follower node

  1. Ensure that the leader node is in the LEADER_RO or LEADER_RW state.

  2. Ensure that the password for the biha_replication_user role in the password file matches the password for the same role on the leader node.

  3. Execute the bihactl add command with the necessary options:

     bihactl add \
         --biha-node-id=2 \
         --host=node_2 \
         --port=node_port_number \
         --biha-port=biha_port_number \
         --use-leader "host=leader_host port=leader_port biha-port=leader_biha_port" \
         --pgdata=follower_PGDATA_directory
     

    A backup of the leader node is created by means of pg_basebackup or pg_probackup depending on the value set in the --backup-method parameter. Besides, postgresql.conf and pg_hba.conf files are modified.

    You can also add the leader node connection data using the magic string. For more information on how to use the magic string, see Section F.7.2.6.

  4. Start the DBMS:

    pg_ctl start -D follower_PGDATA_directory -l follower_log_file
    
  5. Check the node status in the biha.status_v view:

    SELECT * FROM biha.status_v;
    

F.7.2.5. Setting Up the Referee Node in the BiHA Cluster #

The referee node participates in the elections and helps to manage split-brain issues.

Note

  • You can use only pg_basebackup when adding the referee node to your cluster.

  • Only the biha_db database and system tables are copied to the referee node. The postgres database and user data are not copied.

To set up a referee node:

  1. Execute the bihactl add command with the relevant value of the --mode option:

    bihactl add \
        --biha-node-id=3 \
        --host=node_3 \
        --port=node_port_number \
        --biha-port=biha_port_number \
        --use-leader "host=leader_host port=leader_port biha-port=leader_biha_port" \
        --pgdata=referee_PGDATA_directory \
        --mode=referee
    

    or

    bihactl add \
        --biha-node-id=3 \
        --host=node_3 \
        --port=node_port_number \
        --biha-port=biha_port_number \
        --use-leader "host=leader_host port=leader_port biha-port=leader_biha_port" \
        --pgdata=referee_PGDATA_directory \
        --mode=referee_with_wal
    

  2. Start the Postgres Pro instance where you have set up the referee:

    pg_ctl start -D referee_PGDATA_directory
    
  3. Check the node status in the biha.status_v view:

    SELECT * FROM biha.status_v;
    

F.7.2.6. Using the Magic String (Optional) #

The magic string is a special string generated automatically when you initiate a BiHA cluster. The magic string is used in the BiHA cluster setup scripts. It contains the data needed to connect follower nodes to the leader node.

You can use magic string to avoid entering the leader node connection data manually when adding follower nodes.

Here is an example of how to use the magic string:

  1. When initializing the cluster, redirect the bihactl output to a file:

    bihactl init \
       --biha-node-id=1 \
       --host=node_1 \
       --port=node_port_number \
       --biha-port=biha_port_number \
       --nquorum=number_of_nodes \
       --pgdata=leader_PGDATA_directory > /tmp/magic-file
    
  2. When adding a follower node, do the following:

    1. Set up an environment variable:

      export MAGIC_STRING="$(cat /tmp/magic-file)"
      
    2. Add --magic-string as a bihactl add option:

      bihactl add \
       --biha-node-id=2 \
       --host=node_2 \
       --port=node_port_number \
       --biha-port=biha_port_number \
       --magic-string=$MAGIC_STRING \
       --pgdata=follower_PGDATA_directory
      

    The follower node will now use the encoded data from the magic string to connect to the leader node.

F.7.3. Administration #

F.7.3.1. Changing Cluster Composition #

You can change the cluster composition by adding or removing nodes. To add a node, use the bihactl add command with the relevant options. To remove a node, use the biha.remove_node function. For more information on how to set up a high-availability cluster, see Section F.7.2.

F.7.3.2. Changing Configuration Parameters #

Some configuration parameters must have the same value on all cluster nodes and can only be set up by special functions. For example, use the biha.set_heartbeat_max_lost function to set the biha.heartbeat_max_lost parameter value:

SELECT biha.set_heartbeat_max_lost(7);

All available functions for setting up cluster configuration parameters are listed in Section F.7.4.2.4.

The parameters that can vary on different cluster nodes can be set with the ALTER SYSTEM command. For example, you can change the biha.can_vote parameter with ALTER SYSTEM:

ALTER SYSTEM SET biha.can_vote = true;
SELECT pg_reload_conf();

For more information about parameters and the ways they can be set up, see Section F.7.4.

F.7.3.3. Switchover #

In addition to the built-in failover capabilities, the high-availability cluster in Postgres Pro allows for the switchover. The difference between failover and switchover is that the former is performed automatically when the leader node fails and the latter is done manually by the system administrator. To switch over the leader node, use the biha.set_leader function. When you set the new leader, the following happens:

  • All attempts to perform elections are blocked and the timeout is set.

  • The current leader node becomes the follower node.

  • The newly selected node becomes the new leader.

  • If the switchover process does not end within the established timeout, the selected node becomes the follower and new elections are performed to choose the new cluster leader.

F.7.3.4. Roles #

When you initialize the high-availability cluster, the biha_db database is created as well as the biha extension is created in the biha scheme of the biha_db database. Besides, the following roles are created and used:

  • BIHA_CLUSTER_MANAGEMENT_ROLE allows adding and removing nodes of the BiHA cluster.

  • BIHA_REPLICATION_ROLE that is responsible for the data replication in the BiHA cluster. This role is used when running pg_rewind and pg_probackup.

  • biha_replication_user that automatically receives the right to connect using the replication protocol and becomes a member in the BIHA_REPLICATION_ROLE and BIHA_CLUSTER_MANAGEMENT_ROLE roles. The role is used by the bihactl utility as well as when the follower node is connected to the leader node. This role owns the biha_db database.

  • biha_callbacks_user is the default user for callback execution. This user can connect to a database but has no privileges. If required, you can grant privileges using the GRANT command.

  • The predefined pg_monitor role that is used to monitor the state of the built-in high-availability cluster.

The cluster initialization process also creates replication slots with the name set in the biha_node_id format. These slots are controlled automatically without the need to modify or delete them manually.

F.7.3.5. Restoring the Node from the NODE_ERROR State #

Errors occurred in biha or server instance processes listed below may cause the node failure, i.e. it will not be able to restart and will damage the WAL:

  • A rewind automatically performed by biha using pg_rewind if node timelines diverge.

  • The walreceiver process in case of timeline divergence. The follower node WAL may be partially rewritten by the WAL received from the leader node.

When the node goes into the NODE_ERROR state, the WAL recovery is paused and the walreceiver process is stopped. Reading from the nodes in the NODE_ERROR state is prohibited. Besides, the error details are saved to the biha.state file and checked upon the node restart, so the node will go into the same state when the biha-background-worker process is launched.

To restore the node from the NODE_ERROR state, take the following steps:

  1. Save the most recent files from the pg_wal directory, since some of the files unique to this node will be rewritten by pg_rewind.

  2. Run pg_rewind with the --biha option to save biha configuration files. If the rewind has been successful, information about the NODE_ERROR state is deleted from the biha.state file. Besides, when you specify the connection string in the --source-server option of pg_rewind, it also automatically saves this string for the primary_conninfo configuration parameter in the postgresql.auto.conf file. This is important for the node to continue restoring after the restart and reach the consistency point, which is the number of the last record in the source server WAL at the time of the rewind.

  3. (Optional) If the node was offline for a long time, to prevent the risk of data corruption and obsolete data reads, set the biha.flw_ro parameter of the restored node to off.

F.7.3.6. Monitoring the Rewind Results #

You can check the results of the rewind, i.e. the rewind state of cluster nodes, in the rewind_state field of the biha.state file. The field contains the enum values, which are interpreted as follows:

Table F.5. The Rewind State

ValueInterpretation
0 The rewind is not required.
1 The server is stopped, the biha.autorewind configuration parameter was enabled, the rewind will be performed after the server restart.
2 The rewind failed.
3 The rewind was performed successfully.
4 The biha.autorewind configuration parameter was not enabled, and the rewind must be performed manually as described in Section F.7.3.5.

F.7.3.7. Configuration of Synchronous and Asynchronous Replication #

Built-in high availability capabilities in Postgres Pro allow you to create a cluster with synchronous replication. To configure synchronous replication, add the --sync-standbys option when executing bihactl init. This will modify the synchronous_standby_names parameter by specifying the list of all nodes with the keyword ANY. Also, the synchronous_commit parameter is used with the default value on. Note that you cannot select specific nodes for synchronous replication, but only set the number of nodes that must run synchronously by specifying this number in the --sync-standbys option. Other nodes in the cluster will be replicated asynchronously, since streaming replication is asynchronous by default. For more information, see Section 26.2.8.

You can relax synchronous replication restrictions to allow the leader node to continue operation while some of the followers are temporary unavailable. To enable relaxed synchronous replication, specify the minimum number of synchronous follower nodes in the --sync-standbys-min option. If set, the option also changes the MIN field value of the synchronous_standby_names parameter accordingly. The synchronous_standby_gap parameter remains unchanged and keeps its default value — 0. You can set the --sync-standbys-min value when initializing the BiHA cluster by means of the bihactl init command, and modify it later with the biha.set_sync_standbys_min function.

There are a few points that need to be taken into account with regard to synchronous replication and the referee node. If the --mode option is set to referee, the referee does not participate in synchronous replication. When set to referee_with_wal, the node can synchronously receive data. This mode allows the cluster to continue to be available in the 2+1 configuration with --sync-standbys=1 if the follower node is down and the referee node starts confirming transactions for the leader node. The referee behavior depends on the synchronous_commit parameter value. Note that with this parameter set to remote_apply the referee does not confirm transactions.

F.7.3.8. Logging #

biha logs messages sent by its components, i.e. the control channel and the node controller. The control channel is used to exchange service information between the nodes and is marked as BCP in the log. The node controller is the biha core component, which is responsible for the node operation logic and is marked as NC in the log. You can determine the types of messages to be logged by setting the appropriate logging level. biha supports both standard Postgres Pro message severity levels and the extension logging levels.

Postgres Pro message severity levels are used by biha in the following cases:

  • A biha process ends with an error (ERROR and FATAL levels).

  • A biha component is not covered by any logging level of the extension.

  • A message should be logged when component logging is disabled. For example, LOG level messages sent by the control channel, which are displayed only when the component is successfully initialized.

biha logging levels are mapped to the Postgres Pro message severity levels. If you want messages of the required level to appear in the log, the value set for this level should correspond to the value in log_min_messages. The extension logging levels can be configured by editing the postgresql.biha.conf file directly or by setting biha configuration parameters, which are listed in Section F.7.4.1.2.

For example, to display messages related to the control channel, except for the debugging information, specify the following in the postgresql.biha.conf configuration file or set the relevant configuration parameters. In this case, it is assumed that log_min_messages is set to LOG:

biha.BcpTransportLog_log_level = LOG
biha.BcpTransportWarn_log_level = LOG
biha.BcpTransportDetails_log_level = LOG

F.7.3.9. Callbacks #

A callback is an SQL function that notifies users or external services about events in the BiHA cluster, for example, about election of a new leader or change of cluster configuration. As a user, you create an SQL function and register it as a callback. Under certain conditions, the biha extension calls this function.

Timely alerting helps external services to provide proper reaction to events in the BiHA cluster. For example, after receiving information about the leader change, the proxy server redirects traffic to the new leader.

The following callback types are available:

Table F.6. Callback Types

NameDescription
CANDIDATE_TO_LEADER

Called on the node elected as the new leader.

Signature:

my_callback(void) RETURNS void
LEADER_TO_FOLLOWER

Called on the old leader returned to the cluster after demotion.

Signature:

my_callback(void) RETURNS void
LEADER_CHANGED

Called on every node when the BiHA cluster leader changes.

Signature:

my_callback(id integer, host text, port integer) RETURNS void

In this signature, biha passes the new leader details: ID, host name, and node port number.

LEADER_STATE_IS_RW

Called on the leader and other nodes when the leader changes its state to LEADER_RW.

Signature:

my_callback(id integer, host text, port integer) RETURNS void

In this signature, biha passes to the callback the leader details: ID, host name, and node port number.

TERM_CHANGED

Called on the node when its term value increases, for example, when failover or switchover happens, nodes are added or removed, as well as when parameters are changed by the biha.set_* functions.

Signature:

my_callback(old_term integer, new_term integer) RETURNS void

In this signature, biha passes the old and the new term values.

OFFERED_TO_LEADER

Called on the node manually set to leader when it is about to become the leader.

Signature:

my_callback(void) RETURNS void
NODE_ADDED

Called on every node when a new node is added to the BiHA cluster.

Signature:

my_callback(id integer, host text, port integer, mode integer) RETURNS void

In this signature, biha passes to the callback the new node details, such as its host name, node port number, and operation mode: regular, referee, or referee_with_wal. For more information, see Section F.7.1.3.

NODE_REMOVED

Called on every node when a node is removed from the BiHA cluster.

Signature:

my_callback(id integer) RETURNS void

In this signature, biha passes to the callback the ID of the removed node.


F.7.3.9.1. Considerations and Limitations #
  • When an event happens, callbacks are executed in sequence. At this time, biha cannot change its state, for example, initiate elections.

  • If callback execution takes longer than the biha.callbacks_timeout value, biha stops callback execution and continues normal operation.

  • On clusters with asynchronous replication, the biha.register_callback function does not wait for all nodes to receive callbacks. This may lead to a situation where callbacks are present on the leader, but not present on the follower as it lags behind.

  • Normally, callbacks are not executed on the referee in the referee mode. However, if you have registered callbacks on the leader before adding the referee, callbacks may be executed on the referee and cannot be removed from it.

Note

Do not call biha functions inside of a callback as this can cause unexpected behavior.

F.7.3.9.2. Managing Callbacks #

To manage callbacks, you can perform the following actions:

  • register one or several callbacks for a single event

  • view the list of registered callbacks

  • unregister callbacks

Registering Callbacks

Write an SQL function and then register it as a callback using biha.register_callback.

In this example, you create several SQL functions in PL/pgSQL and register them as different callback types.

  1. Ensure that the leader node of your BiHA cluster is in the LEADER_RW state.

  2. On the leader node, use psql and connect to the biha_db database:

    postgres=# \c biha_db
    
  3. Create the following callback functions:

    -- log the node term change
    CREATE FUNCTION log_term_changed(old_term integer, new_term integer)
    RETURNS void AS $$
    BEGIN
        RAISE LOG 'Callback: Term changed from % to %', old_term, new_term;
    END;
    $$ LANGUAGE plpgsql;
    
    -- log the election of the new leader
    CREATE FUNCTION log_leader_changed(id integer, host text, port integer)
    RETURNS void AS $$
    BEGIN
        RAISE LOG 'Callback: New leader is % %:%', id, host, port;
    END;
    $$ LANGUAGE plpgsql;
    
    -- log that the leader was demoted
    CREATE FUNCTION log_leader_to_follower()
    RETURNS void AS $$
    BEGIN
        RAISE LOG 'Callback: demote';
    END;
    $$ LANGUAGE plpgsql;
    
  4. Register the created functions:

    SELECT biha.register_callback('TERM_CHANGED', 'log_term_changed', 'biha_db');
    
    SELECT biha.register_callback('LEADER_CHANGED', 'log_leader_changed', 'biha_db');
    
    SELECT biha.register_callback('LEADER_TO_FOLLOWER', 'log_leader_to_follower', 'biha_db');
    

    You can also specify the user on which behalf the callbacks are executed or determine the order of callbacks execution. For more information, see the description of the biha.register_callback function.

Viewing Callbacks

Registered callbacks are added to the biha.callbacks table located in the biha_db database.

To view all registered callbacks:

  1. On the leader node, use psql and connect to the biha_db database:

    postgres=# \c biha_db
    
  2. Display the content of the biha.callbacks table:

    SELECT * FROM biha.callbacks;
    
    1 | log_term_changed
    2 | log_leader_changed
    3 | log_leader_to_follower
    (3 rows)
    

Unregistering Callbacks

An unregistered callback is deleted from the biha.callbacks table.

  1. Ensure that the leader node of your BiHA cluster is in the LEADER_RW state.

  2. On the leader node, use psql and connect to the biha_db database:

    postgres=# \c biha_db
    
  3. Get an ID of the callback that you want to unregister, for example, log_leader_changed:

    SELECT id FROM biha.callbacks WHERE func = 'log_leader_changed';
    

    The callback ID is returned, for example, 2.

  4. Unregister the callback:

    SELECT biha.unregister_callback(2);
    

    The callback is now unregistered.

F.7.3.10. Recovering from a Backup #

If your database instance was restored from one of the nodes of the BiHA cluster to a separate node and/or using Point-in-Time Recovery (PITR), there must be no connection between the restored node and the operating BiHA cluster nodes. To prevent the connection, take the following steps on the restored node before you start it:

  1. Remove the include 'postgresql.biha.conf' include directive from the postgresql.conf configuration file.

  2. Ensure that biha is not present in the shared_preload_libraries of the postgresql.conf configuration file.

If you want to add the restored node to the cluster, take the following steps:

  1. On the restored node, manually configure streaming replication from the leader.

  2. Synchronize the restored node with the leader.

  3. Stop the restored node:

    pg_ctl stop -D restored_node_PGDATA_directory
    
  4. Add the restored node to the cluster using the bihactl add command with the --convert-standby option.

  5. Start the restored node:

    pg_ctl start -D restored_node_PGDATA_directory
    

F.7.3.11. Migration #

F.7.3.11.1. Migrating the BiHA Cluster to Version 17.2 #

To upgrade your BiHA cluster from Postgres Pro Enterprise version 16.X to version 17.2, perform the following steps:

  1. Completely remove biha.

  2. Stop all nodes of the cluster using the pg_ctl command.

  3. Upgrade the nodes using the instructions described in the pg_upgrade utility documentation.

  4. Convert the primary node into the leader.

  5. Add followers using one of the following ways:

  6. (Optional) If required, recreate the referee.

F.7.3.12. Removing biha #

You can either temporarily turn off, or completely remove the biha extension.

Turning off biha

To turn off the extension:

  1. Remove the include 'postgresql.biha.conf' include directive from the postgresql.conf configuration file.

  2. Ensure that biha is not present in the shared_preload_libraries of the postgresql.conf configuration file.

Removing biha completely

To remove the extension completely, do the following:

  1. Turn off biha.

  2. Execute the DROP EXTENSION command. It must be executed on the leader in the LEADER_RW state and from the biha_db database:

    biha_db=# DROP EXTENSION biha;
    

F.7.4. Reference #

F.7.4.1. Configuration Parameters #

The biha extension provides several configuration parameters described below that are specific to the BiHA cluster. In addition, biha uses the following Postgres Pro configuration parameters:

F.7.4.1.1. Cluster Configuration #

Important

When setting the cluster configuration parameters, you must ensure network reliability for the changes to affect all cluster nodes without any errors.

biha.autorewind (boolean) #

An optional parameter that controls the automatic rewind policy for that node, against which the pg_rewind must be executed. For example, for the old leader when synchronizing it with the new leader. The default value is false meaning that the automatic rewind is not performed. When the value is set to true, the automatic rewind is performed after the error that usually causes the NODE_ERROR state of the node. The automatic rewind is performed if it may complete successfully meaning that preliminary launching of pg_rewind with the --dry-run option was a success. If the automatic rewind fails, the node is transferred to the NODE_ERROR state. In this case, you can find the actual rewind state of the node in the biha.state file as described in Section F.7.3.6. Note that the rewind may cause the loss of some node WAL records.

biha.callbacks_timeout (integer) #

Sets time for execution of all callbacks for a single event, in milliseconds. The default value is 10000 (10 seconds). The minimum value is 1000 (1 second).

The value of the biha.callbacks_timeout parameter can be different for different nodes. Changing the value for the leader does not change it for the followers.

biha.can_be_leader (boolean) #

Determines the ability of a node to become the leader. The default value is true. If set to false, the node cannot stand as a candidate in elections of the new leader.

A node that cannot become the leader may have the largest WAL. In this case, if elections are required, those nodes that can stand as candidates attempt to receive missing data from that node. If they succeed, one of these nodes becomes the leader. If they fail to receive missing data, the automatic rewind process is launched on the node that cannot become the leader. If biha.autorewind is not enabled, the node state changes to NODE_ERROR.

biha.can_vote (boolean) #

Determines whether a node is allowed to vote. The default value is true. If set to false, the node cannot vote, as well as cannot stand as a candidate in elections of the new leader.

biha.flw_ro (boolean) #

Determines whether a follower is available for read operations. If set to off, reading from this follower is prohibited. The default value is on.

biha.heartbeat_max_lost (integer) #

Specifies the maximum number of heartbeats that can be missed before the node is considered offline. This parameter can be set with the biha.set_heartbeat_max_lost function. The default value is 10.

biha.heartbeat_send_period (integer) #

Specifies the heartbeat sending frequency, in milliseconds. This parameter can be set with the biha.set_heartbeat_send_period function. The default value is 1000.

biha.host (text) #

Specifies the host of the cluster node. This parameter is unique for each node of the cluster. For the first node it is set at cluster initialization, for other nodes it is set when adding them to the cluster. It is not recommended to modify this parameter.

biha.id (integer) #

Specifies the ID of the high-availability cluster node. This parameter is unique for each node of the cluster. For the first node it is set at cluster initialization, for other nodes it is set when adding them to the cluster. It is not recommended to modify this parameter.

biha.minnodes (integer) #

Specifies the minimum number of operational nodes for the leader node to be open for write transactions. This parameter can be set with the biha.set_minnodes function. If you do not specify the --minnodes option in bihactl init command, the biha.minnodes value equals the biha.nquorum value.

When setting up this value, consider the split-brain risk. It is recommended to use the following formula: (total number of nodes + 1)/2. For example, if your cluster has 3 nodes, the minnodes value must be 2.

Nodes with the biha.can_vote parameter set to false are ignored.

biha.no_wal_on_follower (integer) #

Specifies the timeout of the replication slot position advancement, in milliseconds. This parameter can be set with the biha.set_no_wal_on_follower function. The default value is 20000.

If the timeout is reached for the follower, the leader node notices that the follower does not receive WAL. If at the same time the biha.heartbeat_max_lost * biha.heartbeat_send_period value is exceeded, the leader considers the follower offline.

The parameter is also used when you set the leader manually as the selected node is promoted after it stops receiving WAL from the old leader.

biha.node_priority (integer) #

Sets the node priority in a synchronous cluster, in seconds. The value determines the timeout that must be reached before the node stands as a candidate in elections. The zero value indicates the highest priority. The default value is -1, meaning that the parameter is ignored.

A cluster is considered synchronous if all nodes of this cluster are synchronous, i.e.all cluster nodes are listed in the synchronous_standby_names parameter. For asynchronous clusters, the parameter is ignored.

biha.nquorum (integer) #

Specifies the number of nodes, which must vote for the new leader node if the current leader is down. This parameter can be set with the biha.set_nquorum function.

When setting up this value, consider the split-brain risk. It is recommended to use the following formula: (total number of nodes + 1)/2. For example, if your cluster has 3 nodes, the minnodes value must be 2.

Nodes with the biha.can_vote parameter set to false are ignored.

biha.port (integer) #

Specifies the port used to exchange service information between nodes. This parameter is required to establish a connection with the cluster. It is not recommended to modify this parameter.

biha.sync_standbys_min (integer) #

Specifies the minimum number of synchronous follower nodes that must be available for the leader node to continue operation. This parameter can be set with the biha.set_sync_standbys_min function. The value must be lower than --sync-standbys and cannot be negative. The default value is -1, meaning that the parameter is ignored. If the parameter is not specified, the BiHA cluster operates according to the default synchronous replication restrictions, i.e., the leader node is not available for write transactions until all followers catch up with its current state.

Note

Both the biha.sync_standbys_min parameter and the biha.set_sync_standbys_min function can operate only if you set the --sync-standbys-min option when initializing the BiHA cluster by means of the bihactl init command.

F.7.4.1.2. biha Logging Levels #
biha.BihaLog_log_level (enum) #

Specifies the logging level to provide general information about the operation of biha components. The default value is LOG.

biha.BcpTransportDebug_log_level (enum) #

Specifies the logging level to provide debugging information about the control channel operation. The default value is DEBUG4.

biha.BcpTransportDetails_log_level (enum) #

Specifies the logging level to provide detailed information about the control channel operation. The default value is DEBUG4.

biha.BcpTransportLog_log_level (enum) #

Specifies the logging level to provide general information about the control channel operation. The default value is DEBUG4.

biha.BcpTransportWarn_log_level (enum) #

Specifies the logging level to provide warnings of likely problems in the control channel. The default value is DEBUG4.

biha.NodeControllerDebug_log_level (enum) #

Specifies the logging level to provide debugging information about the node controller operation. The default value is DEBUG4.

biha.NodeControllerDetails_log_level (enum) #

Specifies the logging level to provide detailed information about the node controller operation. The default value is DEBUG4.

biha.NodeControllerLog_log_level (enum) #

Specifies the logging level to provide general information about the node controller operation. The default value is DEBUG4.

biha.NodeControllerWarn_log_level (enum) #

Specifies the logging level to provide warnings of likely problems in the node controller. The default value is DEBUG4.

F.7.4.2. Functions #

All functions listed below should be called from the biha_db database, for example:

psql biha_db -c "select biha.set_leader(2)"
F.7.4.2.1. Magic String Generation #
biha.get_magic_string () returns string #

Generates a magic string for the cluster node.

F.7.4.2.2. Cluster Node Removal #
biha.remove_node (id integer) returns boolean #

Removes the node from the cluster. Before removing, the node must be stopped. This function can only be called on the leader node.

F.7.4.2.3. Setting the Leader Manually #
biha.set_leader (id integer) returns boolean #

Sets the leader node manually. It is recommended to call this function on the node that you want to set as the new leader.

Note

Calling the biha.set_leader function on the current leader is not recommended. If you call the function on the current leader in the LEADER_RW state, in case of a successful switchover request, there may be not enough time for the request result to be sent to the client before the current leader reboots for demotion.

F.7.4.2.4. Cluster Configuration #
biha.config () returns setof record #

Returns the cluster configuration values: id, term, nquorum, minnodes, heartbeat_send_period, heartbeat_max_lost, no_wal_on_follower, sync_standbys_min, priority, can_be_leader, can_vote.

biha.set_heartbeat_max_lost (integer) returns boolean #

Sets the biha.heartbeat_max_lost value. This function can be called only from the leader node and changes the parameter on all nodes. You do not need to restart the cluster for the changes to take effect.

biha.set_heartbeat_send_period (integer) returns boolean #

Sets the biha.heartbeat_send_period value, in milliseconds. This function can be called only from the leader node and changes the parameter on all nodes. You do not need to restart the cluster for the changes to take effect.

biha.set_no_wal_on_follower (integer) returns boolean #

Sets the biha.no_wal_on_follower value, in milliseconds. This function can be called only from the leader node and changes the parameter on all nodes. You do not need to restart the cluster for the changes to take effect.

biha.set_minnodes (integer) returns boolean #

Sets the biha.minnodes value. This function can be called only from the leader node and changes the parameter on all nodes. You do not need to restart the cluster for the changes to take effect.

biha.set_nquorum (integer) returns boolean #

Sets the biha.nquorum value. This function can be called only from the leader node and changes the parameter on all nodes. You do not need to restart the cluster for the changes to take effect.

biha.set_nquorum_and_minnodes (integer, integer) returns boolean #

Sets the biha.nquorum and biha.minnodes values. This function can be called only from the leader node and changes the parameters on all nodes. You do not need to restart the cluster for the changes to take effect.

biha.set_sync_standbys_min (integer) returns boolean #

Sets the biha.sync_standbys_min value and changes the MIN field value of the synchronous_standby_names parameter accordingly if required. This function can be called only from the leader node and changes the parameter on all nodes. You do not need to restart the cluster for the changes to take effect.

Note

Both the biha.sync_standbys_min parameter and the biha.set_sync_standbys_min function can operate only if you set the --sync-standbys-min option when initializing the BiHA cluster by means of the bihactl init command.

F.7.4.2.5. Cluster Monitoring #
biha.nodes () returns setof record #

Defines the biha.nodes_v view, which is described in detail in the biha.nodes_v section.

biha.status () returns setof record #

Defines the biha.status_v view, which is described in detail in the biha.status_v section. If the node does not respond when trying to display the view, try the same request on any other node to learn the actual state of cluster nodes.

F.7.4.2.6. The NODE_ERROR State #
biha.error_details () returns setof record #

Returns the description of why the node transferred to the NODE_ERROR state. The returned record contains the type of the error, its details, the place it occurred specifying begin_lsn, end_lsn, and identifier of the current and the next timeline, as well as replay_lsn.

F.7.4.2.7. Callback Management #
biha.register_callback (event text, func text, database text, executor text, priority integer) returns integer #

Adds a new callback and returns its unique ID. This function can only be called from the leader node in the LEADER_RW state. The new callback will be replicated to the followers.

Note

On the biha side, there is no check that func exists in database. If the specified function does not exist, the callback fails.

See Registering Callbacks for the example of using the biha.register_callback function.

Table F.7. Variable Definitions

NameTypeDescription
eventtext Event of the BiHA cluster that triggers a callback. For more information about events and corresponding callback types, see Callback Types.
functext Name of the SQL function that biha executes when event happens. This function must be located in database, otherwise it cannot be executed.
databasetextDatabase where func is executed.
executortext

User that executes func. This parameter is optional. The default value is biha_callbacks_user.

priorityinteger The lower the value, the sooner the callback is executed. This parameter is optional. The default value is 0.

biha.unregister_callback(callback_id) #

Deletes a callback. This function can only be called on the leader node in the LEADER_RW state. See Unregistering Callbacks for the example of using the biha.unregister_callback function.

F.7.4.3. Views #

F.7.4.3.1. biha.nodes_v #

This view displays the connection status of nodes in the cluster. For the node that the view is queried on, the following columns contain NULL: state, since_conn_start, conn_count.

Table F.8. The biha.nodes_v view

Column NameDescription
id The node ID.
host The host of the node.
port The port of the node.
state The connection state of the node. This column may contain one of the following values: ACTIVE, CONNECTING, IDLE, or INIT.
since_conn_start The time since the node connection.
conn_count The number of times the node was connected since the start of the cluster.

F.7.4.3.2. biha.status_v #

This view displays the state of nodes in the cluster.

Table F.9. The biha.status_v View

Column NameDescription
id The node ID.
leader_id The leader node ID.
term The term of the node. This is used for the purposes of the leader election.
online Shows whether the node is online.
state

The state of the node. This column may contain one of the following values:

  • PRESTARTUP — the initial state of the node at the BiHA cluster launch. The node sends heartbeats and executes pg_rewind if scheduled. Otherwise, the node proceeds to STARTUP.

  • STARTUP — the node waits for the Postgres Pro startup process to reach the consistency point.

  • CSTATE_FORMING — the node receives and sends heartbeats to determine what state it must transfer to.

  • LEADER_RO — the node is the leader available for read-only operations.

  • LEADER_RW — the node is the leader available for read and write operations.

  • FOLLOWER — the node is a replica of the leader. If biha.can_be_leader and biha.can_vote are set to true, the follower can be elected as the new leader.

  • FOLLOWER_OFFERED — the node was manually nominated as the new leader by means of the biha.set_leader function.

  • CANDIDATE — the node stands as a candidate in elections of the new leader.

  • REFEREE — the node is the cluster referee. The state is the same both for the referee and referee_with_wal modes.

  • NODE_ERROR — the node is non-operational due to an error. When the node is in the NODE_ERROR state, reading from this node is prohibited. To get more information about the error, use the biha.error_details function. To restore the faulty node, see Section F.7.3.5.

  • UNKNOWN — the node is offline for the current node.

last_known_state The last known state of the node.
since_last_hb The time since the last received heartbeat message.