Thread: PostgreSQL with Patroni not replicating to all nodes after adding 3rd node (another secondary)
PostgreSQL with Patroni not replicating to all nodes after adding 3rd node (another secondary)
From
Zb B
Date:
Hi,
I am new to Patroni and PostgreSQL.We have set up a cluster with etcd (3 nodes), Patroni (2 nodes) and PostgreSQL (2 nodes) with replication from primary to secondary.In SYNC mode. Seemed to work fine. Then I added a third DB node without Patroni - just to replicate the data from the primary using:
1) added another slot in patroni.yml:
slots:
bdc2b:
type: physical
2) used
pg_basebackup -v -R -h 10.17.5.211,10.17.5.83 -U replication --slot=bdc2b -D 14/data
As a result the primary DB was showing two replication slots and the Patroni cluster looked healthy by executing:
patronictl -c /etc/patroni/patroni.yml list
But when I started my remote test application that was executing small insert transactions I noticed the records are replicated to the 3rd node only (the secondary without Patroni). They are not replicated to secondary node (the Replica with Patroni)
Some debugging using
journalctl -f
shows that the replica is not healthy and after a while the replication slot becomes inactive. See the log below:
Jun 22 08:06:35 xyzd3riardb02 patroni[12495]: 2022-06-22 08:06:35,280 INFO: Got response from xyzd3riardb01 http://10.17.5.211:8008/patroni: {"state": "running", "postmaster_start_time": "2022-06-22 05:05:37.382607-04:00", "role": "master", "server_version": 140004, "xlog": {"location": 117558448}, "timeline": 4, "replication": [{"usename": "replication", "application_name": "test1b", "client_addr": "10.17.5.56", "state": "streaming", "sync_state": "async", "sync_priority": 0}, {"usename": "replication", "application_name": "xyzd3riardb02", "client_addr": "10.17.5.83", "state": "streaming", "sync_state": "sync", "sync_priority": 1}], "dcs_last_seen": 1655899566, "database_system_identifier": "7111967488904966919", "patroni": {"version": "2.1.4", "scope": "test1b"}}
Jun 22 08:06:35 xyzd3riardb02 patroni[12495]: 2022-06-22 08:06:35,375 WARNING: Master (xyzd3riardb01) is still alive
Jun 22 08:06:35 xyzd3riardb02 patroni[12495]: server signaled
Jun 22 08:06:35 xyzd3riardb02 patroni[12495]: 2022-06-22 08:06:35,400 INFO: following a different leader because i am not the healthiest node
Jun 22 08:07:05 xyzd3riardb02 patroni[12495]: 2022-06-22 08:07:05,279 INFO: Got response from xyzd3riardb01 http://10.17.5.211:8008/patroni: {"state": "running", "postmaster_start_time": "2022-06-22 05:05:37.382607-04:00", "role": "master", "server_version": 140004, "xlog": {"location": 117558448}, "timeline": 4, "replication": [{"usename": "replication", "application_name": "test1b", "client_addr": "10.17.5.56", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "dcs_last_seen": 1655899596, "database_system_identifier": "7111967488904966919", "patroni": {"version": "2.1.4", "scope": "test1b"}}
Jun 22 08:07:05 xyzd3riardb02 patroni[12495]: 2022-06-22 08:07:05,374 WARNING: Master (xyzd3riardb01) is still alive
Jun 22 08:07:05 xyzd3riardb02 patroni[12495]: 2022-06-22 08:07:05,393 INFO: following a different leader because i am not the healthiest node
But the Patroni cluster still looks healthy after executing
patronictl -c /etc/patroni/patroni.yml list
while not replicating the records to the replica.
I am new to Patroni and PostgreSQL.We have set up a cluster with etcd (3 nodes), Patroni (2 nodes) and PostgreSQL (2 nodes) with replication from primary to secondary.In SYNC mode. Seemed to work fine. Then I added a third DB node without Patroni - just to replicate the data from the primary using:
1) added another slot in patroni.yml:
slots:
bdc2b:
type: physical
2) used
pg_basebackup -v -R -h 10.17.5.211,10.17.5.83 -U replication --slot=bdc2b -D 14/data
As a result the primary DB was showing two replication slots and the Patroni cluster looked healthy by executing:
patronictl -c /etc/patroni/patroni.yml list
(the Leader and replica were running)
Some debugging using
journalctl -f
shows that the replica is not healthy and after a while the replication slot becomes inactive. See the log below:
Jun 22 08:06:35 xyzd3riardb02 patroni[12495]: 2022-06-22 08:06:35,280 INFO: Got response from xyzd3riardb01 http://10.17.5.211:8008/patroni: {"state": "running", "postmaster_start_time": "2022-06-22 05:05:37.382607-04:00", "role": "master", "server_version": 140004, "xlog": {"location": 117558448}, "timeline": 4, "replication": [{"usename": "replication", "application_name": "test1b", "client_addr": "10.17.5.56", "state": "streaming", "sync_state": "async", "sync_priority": 0}, {"usename": "replication", "application_name": "xyzd3riardb02", "client_addr": "10.17.5.83", "state": "streaming", "sync_state": "sync", "sync_priority": 1}], "dcs_last_seen": 1655899566, "database_system_identifier": "7111967488904966919", "patroni": {"version": "2.1.4", "scope": "test1b"}}
Jun 22 08:06:35 xyzd3riardb02 patroni[12495]: 2022-06-22 08:06:35,375 WARNING: Master (xyzd3riardb01) is still alive
Jun 22 08:06:35 xyzd3riardb02 patroni[12495]: server signaled
Jun 22 08:06:35 xyzd3riardb02 patroni[12495]: 2022-06-22 08:06:35,400 INFO: following a different leader because i am not the healthiest node
Jun 22 08:07:05 xyzd3riardb02 patroni[12495]: 2022-06-22 08:07:05,279 INFO: Got response from xyzd3riardb01 http://10.17.5.211:8008/patroni: {"state": "running", "postmaster_start_time": "2022-06-22 05:05:37.382607-04:00", "role": "master", "server_version": 140004, "xlog": {"location": 117558448}, "timeline": 4, "replication": [{"usename": "replication", "application_name": "test1b", "client_addr": "10.17.5.56", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "dcs_last_seen": 1655899596, "database_system_identifier": "7111967488904966919", "patroni": {"version": "2.1.4", "scope": "test1b"}}
Jun 22 08:07:05 xyzd3riardb02 patroni[12495]: 2022-06-22 08:07:05,374 WARNING: Master (xyzd3riardb01) is still alive
Jun 22 08:07:05 xyzd3riardb02 patroni[12495]: 2022-06-22 08:07:05,393 INFO: following a different leader because i am not the healthiest node
But the Patroni cluster still looks healthy after executing
patronictl -c /etc/patroni/patroni.yml list
while not replicating the records to the replica.
What can be the reason? Where to look for the problem?
Thanks,
Zbigniew