RE: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request - Mailing list pgsql-bugs

From Улаев Александр Сергеевич
Subject RE: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request
Date
Msg-id 779598bff1e7430f9e032d1b62b8006b@rtlabs.ru
Whole thread Raw
In response to BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request  (Sergei Kornilov <sk@zsrv.org>)
List pgsql-bugs
Hi, thanks for your cooperation

2022-02-01 16:11:34,964 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:11:44,950 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:11:54,677 INFO: Lock owner: db02.local; I am db02.local
2022-02-01 16:11:56,145 ERROR: Request to server http://IP_ETCD1:2379 failed:
MaxRetryError('HTTPConnectionPool(host=\'IP_ETCD1\',port=2379): Max retries exceeded with url:
/v2/keys/postgresql/db/leader(Caused by Read 
TimeoutError("HTTPConnectionPool(host=\'IP_ETCD1\', port=2379): Read timed out. (read
timeout=1.4283105651182788)",))',)
2022-02-01 16:11:56,146 INFO: Reconnection allowed, looking for another server.
2022-02-01 16:11:56,146 INFO: Retrying on http://IP_ETCD2:2379
2022-02-01 16:11:56,262 INFO: Selected new etcd server http://IP_ETCD2:2379
2022-02-01 16:11:56,475 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:12:04,950 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:12:14,960 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:12:24,669 INFO: Lock owner: db02.local; I am db02.local
2022-02-01 16:12:24,738 ERROR:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 562, in wrapper
    retval = func(self, *args, **kwargs) is not None
  File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 692, in _update_leader
    return self.retry(self._client.write, self.leader_path, self._name, prevValue=self._name, ttl=self._ttl)
  File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 443, in retry
    return retry(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/patroni/utils.py", line 334, in __call__
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/etcd/client.py", line 500, in write
    response = self.api_execute(path, method, params=params)
  File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 256, in api_execute
    return self._handle_server_response(response)
  File "/usr/lib/python3.6/site-packages/etcd/client.py", line 987, in _handle_server_response
    etcd.EtcdError.handle(r)
  File "/usr/lib/python3.6/site-packages/etcd/__init__.py", line 306, in handle
    raise exc(msg, payload)
etcd.EtcdKeyNotFound: Key not found : /postgresql/db/leader
2022-02-01 16:12:24,928 ERROR: failed to update leader lock
2022-02-01 16:12:27,063 INFO: demoted self because failed to update leader lock in DCS
2022-02-01 16:12:27,079 INFO: closed patroni connection to the postgresql cluster
2022-02-01 16:12:28,685 INFO: postmaster pid=153397
2022-02-01 16:12:34,670 INFO: Lock owner: None; I am db02.local
2022-02-01 16:12:34,671 INFO: not healthy enough for leader race
2022-02-01 16:12:34,738 INFO: starting after demotion in progress
2022-02-01 16:12:44,672 INFO: Lock owner: None; I am db02.local
2022-02-01 16:12:44,672 INFO: not healthy enough for leader race
2022-02-01 16:12:44,741 INFO: starting after demotion in progress
2022-02-01 16:12:54,672 INFO: Lock owner: None; I am db02.local
2022-02-01 16:12:54,672 INFO: not healthy enough for leader race
2022-02-01 16:12:54,739 INFO: starting after demotion in progress
2022-02-01 16:13:04,672 INFO: Lock owner: None; I am db02.local
2022-02-01 16:13:04,672 INFO: not healthy enough for leader race
2022-02-01 16:13:04,746 INFO: starting after demotion in progress
2022-02-01 16:13:14,673 INFO: Lock owner: None; I am db02.local
2022-02-01 16:13:14,674 INFO: not healthy enough for leader race
2022-02-01 16:13:14,743 INFO: starting after demotion in progress
2022-02-01 16:13:18,748 INFO: establishing a new patroni connection to the postgres cluster
2022-02-01 16:13:19,961 INFO: Got response from db01.local http://IP_DB01:8008/patroni: {"state": "starting", "role":
"replica","database_system_identifier": "9879384539485923423", "patroni": {"version": "2.1.0", "scope": "db"}} 
2022-02-01 16:13:19,961 WARNING: Request failed to db01.local: GET http://IP_DB01:8008/patroni ('xlog')
2022-02-01 16:13:20,365 INFO: promoted self to leader by acquiring session lock
2022-02-01 16:13:20,380 INFO: cleared rewind state after becoming the leader
2022-02-01 16:13:21,816 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:13:24,830 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:13:34,826 INFO: no action. I am (db02.local) the leader with the lock

Best regards,
Ulaev Alexander


-----Original Message-----
From: Sergei Kornilov [mailto:sk@zsrv.org]
Sent: Thursday, February 3, 2022 11:52 AM
To: PG Bug reporting form <noreply@postgresql.org>
Cc: Улаев Александр Сергеевич <alexander.ulaev@rtlabs.ru>; pgsql-bugs@lists.postgresql.org
Subject: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request

Hello

> 2022-02-01 16:12:24 MSK [16959] LOG: received immediate shutdown request

Please prove by patroni logs, that not patroni commanded immediate shutdown. As far as I remember, there should be a
markabout the request for stopping the database immediatly. 

Seems unrelated to archiver. Patroni is able to command immediate shutdown, immediate shutdown is a kind of "terminated
abnormally".The sequence of logs looks completely correct. 

regards, Sergei



pgsql-bugs by date:

Previous
From: Juan José Santamaría Flecha
Date:
Subject: Re: Postgresql error : PANIC: could not locate a valid checkpoint record
Next
From: Sergei Kornilov
Date:
Subject: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request