Patroni question - Mailing list pgsql-general
From | Zwettler Markus (OIZ) |
---|---|
Subject | Patroni question |
Date | |
Msg-id | d1ee012b1c9c4367ade1e8662e80a0dc@zuerich.ch Whole thread Raw |
List | pgsql-general |
We had a failover. I would read the Patroni logs below as following. 2022-09-21 11:13:56,384 secondary did a HTTP GET request to primary. This failed with a read timeout. 2022-09-21 11:13:56,792 secondary promoted itself to primary 2022-09-21 11:13:57,279 primary did a HTTP GET request to secondary. An exception happend. Probably also due to read timeout. 2022-09-21 11:13:57,983 primary demoted itself So, the failover has been caused by a network timeout between primary and secondary. QUESTION 1 : Do you agree? I thought that the Patroni nodes do not communicate directly with each other but only by DCS? QUESTION 2: Is this not correct anymore? =========================== patroni version: 2.1.3 =========================== Patroni Logfile of Host szhm49346 (IP 10.9.132.13) => Primary until Failover ... ... 2022-09-21 11:13:57,279 DEBUG: API thread: 10.9.132.16 - - "GET /patroni HTTP/1.1" 200 - latency: 2245.090 ms 2022-09-21 11:13:57,378 ERROR: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 566, in wrapper retval = func(self, *args, **kwargs) is not None File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 696, in _update_leader return self.retry(self._client.write, self.leader_path, self._name, prevValue=self._name, ttl=self._ttl) File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 447, in retry return retry(*args, **kwargs) File "/usr/lib/python3.6/site-packages/patroni/utils.py", line 334, in __call__ return func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/etcd/client.py", line 500, in write response = self.api_execute(path, method, params=params) File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 257, in api_execute return self._handle_server_response(response) File "/usr/lib/python3.6/site-packages/etcd/client.py", line 987, in _handle_server_response etcd.EtcdError.handle(r) File "/usr/lib/python3.6/site-packages/etcd/__init__.py", line 306, in handle raise exc(msg, payload) etcd.EtcdCompareFailed: Compare failed : [pcl_p011@szhm49346 != pcl_p011@szhm49345] 2022-09-21 11:13:57,558 WARNING: Exception happened during processing of request from 10.9.132.16:49080 2022-09-21 11:13:57,965 ERROR: failed to update leader lock 2022-09-21 11:13:57,983 INFO: Demoting self (immediate-nolock) 2022-09-21 11:13:58,214 WARNING: Traceback (most recent call last): File "/usr/lib64/python3.6/socketserver.py", line 654, in process_request_thread self.finish_request(request, client_address) File "/usr/lib64/python3.6/socketserver.py", line 364, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib64/python3.6/socketserver.py", line 724, in __init__ self.handle() File "/usr/lib64/python3.6/http/server.py", line 418, in handle self.handle_one_request() File "/usr/lib/python3.6/site-packages/patroni/api.py", line 652, in handle_one_request BaseHTTPRequestHandler.handle_one_request(self) File "/usr/lib64/python3.6/http/server.py", line 406, in handle_one_request method() File "/usr/lib/python3.6/site-packages/patroni/api.py", line 198, in do_GET_patroni self._write_status_response(200, response) File "/usr/lib/python3.6/site-packages/patroni/api.py", line 94, in _write_status_response self._write_json_response(status_code, response) File "/usr/lib/python3.6/site-packages/patroni/api.py", line 53, in _write_json_response self._write_response(status_code, json.dumps(response, default=str), content_type='application/json') File "/usr/lib/python3.6/site-packages/patroni/api.py", line 50, in _write_response self.wfile.write(body.encode('utf-8')) File "/usr/lib64/python3.6/socketserver.py", line 803, in write self._sock.sendall(b) BrokenPipeError: [Errno 32] Broken pipe ... ... =========================== Patroni Logfile of Host szhm49345 (IP 10.9.132.16) => Standby until Failover ... ... 2022-09-21 11:13:54,381 DEBUG: Starting new HTTP connection (1): szhm49346.global.szh.loc:8009 2022-09-21 11:13:56,384 WARNING: Request failed to pcl_p011@szhm49346: GET http://szhm49346.global.szh.loc:8009/patroni (HTTPConnectionPool(host='szhm49346.global.szh.loc',port=8009): Max retries exceeded with url: /patroni (Caused by ReadTimeoutError("HTTPConnectionPool(host='szhm49346.global.szh.loc',port=8009): Read timed out. (read timeout=2)",))) 2022-09-21 11:13:56,484 DEBUG: Writing pcl_p011@szhm49345 to key /patroni/pcl_p011/leader ttl=30 dir=False append=False 2022-09-21 11:13:56,485 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2022-09-21 11:13:56,562 DEBUG: http://10.7.211.13:2379 "PUT /v2/keys/patroni/pcl_p011/leader HTTP/1.1" 201 197 2022-09-21 11:13:56,562 DEBUG: Issuing read for key /patroni/pcl_p011/ with args {'recursive': True, 'retry': <patroni.utils.Retryobject at 0x7fcbb0d0c2b0>} 2022-09-21 11:13:56,563 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2022-09-21 11:13:56,634 DEBUG: http://10.7.211.13:2379 "GET /v2/keys/patroni/pcl_p011/?recursive=true HTTP/1.1" 200 None 2022-09-21 11:13:56,635 DEBUG: Writing {"leader":"pcl_p011@szhm49345","sync_standby":null} to key /patroni/pcl_p011/syncttl=None dir=False append=False 2022-09-21 11:13:56,635 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2022-09-21 11:13:56,713 DEBUG: http://10.7.211.13:2379 "PUT /v2/keys/patroni/pcl_p011/sync HTTP/1.1" 200 368 2022-09-21 11:13:56,713 DEBUG: Writing {"conn_url":"postgres://szhm49345.global.szh.loc:5432/pcl_p011","api_url":"http://szhm49345.global.szh.loc:8009/patroni","state":"running","role":"replica","version":"2.1.3","checkpoint_after_promote":false,"xlog_location":9087609453816,"timeline":6} tokey /patroni/pcl_p011/members/pcl_p011@szhm49345 ttl=30 dir=False append=False 2022-09-21 11:13:56,714 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2022-09-21 11:13:56,791 DEBUG: http://10.7.211.13:2379 "PUT /v2/keys/patroni/pcl_p011/members/pcl_p011@szhm49345 HTTP/1.1"200 896 2022-09-21 11:13:56,792 INFO: promoted self to leader by acquiring session lock 2022-09-21 11:13:56,798 INFO: cleared rewind state after becoming the leader ... ...
pgsql-general by date: