5.3. Time Synchronization #

The algorithm that provides data consistency on all the cluster nodes uses the system clock installed on the hosts. Therefore, the transaction commit latency depends on clock drift on different hosts, as the coordinator always waits for the most lagging host to catch up. This makes it crucial that the time on all the connected nodes of a Shardman cluster are synchronized, as lack of synchronization may have a negative impact on Shardman performance by increasing the query latency.

First, to ensure time synchronization on all cluster nodes, install chrony daemon when deploying a new cluster.

  sudo apt update
  sudo apt install -y chrony
  sudo systemctl enable --now chrony

Check that chrony is working properly.

chronyc tracking

Expected output:

      Reference ID    : C0248F82 (Time100.Stupi.SE)
      Stratum         : 2
      Ref time (UTC)  : Tue Apr 18 11:50:44 2023
      System time     : 0.000019457 seconds slow of NTP time
      Last offset     : -0.000005579 seconds
      RMS offset      : 0.000089375 seconds
      Frequency       : 30.777 ppm fast
      Residual freq   : -0.000 ppm
      Skew            : 0.003 ppm
      Root delay      : 0.018349268 seconds
      Root dispersion : 0.000334640 seconds
      Update interval : 1039.1 seconds
      Leap status     : Normal

Note that managing the clock drift should be performed using the OS tools. Shardman diagnostic tools cannot be considered as the only and defining measurement utility.

To see if any major drift already exists, use the shardman.pg_stat_csn view that shows statistics on delays that take place during import of CSN snapshots. Its values are calculated when any related action is performed, or if any of the shardman.trim_csnxid_map() or shardman.pg_oldest_csn_snapshot() functions are called. These functions are called from the csn trimmer routine worker, therefore disabling this worker will result in these statistics not being collected.

The csn_max_shift field of the shardman.pg_stat_csn view shows the maximum registered snapshot CSN shift that caused a delay. This value defines the clock drift between the nodes in the cluster. A consecutive increase of this value means at least one's cluster system clock is out of sync. If this value exceeds 1000 (microseconds), it is recommended to check the time synchronization settings.

The same can be discovered if the csn_total_import_delay value increases while csn_max_shift remains unchanged. However, one-time increase may be due to single failures, non-related to the time issues.

Also, if the difference between CSNXidMap_head_csn and shardman.oldest_csn exceeds the csn_snapshot_defer_time parameter value and stays the same for a long time, it means that the CSNSnapshotXidMap map is full. It can result in a global transaction failure.

There are two main reasons for this issue.

  • There is a transaction that runs for more than csn_snapshot_defer_time seconds and holds the entire cluster, holding the VACUUM process. In this case, xid field of the shardman.oldest_csn view is used to determine the transaction ID of this transaction, and the rgid field is used to determine the cluster node where this transaction is located.

  • The CSNSnapshotXidMap map lacks capacity. During the normal operation the system might have transactions that exceed the csn_snapshot_defer_time value. To fix it, increase the csn_snapshot_defer_time time so that these transactions stay below this value.

If the shardman.silk_tracepoints configuration parameter is enabled, executing the EXPLAIN command for the distributed queries outputs the rows with information about how much time was spent on the query execution and what result it ended with, depending on the system components. These rows show metric values for the time spent on each component. The net (qry), net (1st tup), net (last tup) metrics calculate the difference between timestamps on different servers. This difference includes both time spent on a message transfer and the clock drift (positive or negative) between these servers. Therefore, these metrics can also help to determine whether there is any clock drift.