Errors with physical replication - Mailing list pgsql-general
From | greigwise |
---|---|
Subject | Errors with physical replication |
Date | |
Msg-id | 1526905137308-0.post@n3.nabble.com Whole thread Raw |
Responses |
Re: Errors with physical replication
|
List | pgsql-general |
Hello. We are on Postgresql version 9.6.6. We have 2 EC2 instances in different Amazon regions and we are doing physical replication via VPN. It all seems to work just fine most of the time. I'm noticing in the logs that we have recurring erros (maybe 10 or 12 times per day) that look like this: 2018-05-17 06:36:14 UTC 5af0599f.210d LOG: invalid resource manager ID 49 at 384/42A4AB00 2018-05-17 06:36:14 UTC 5afd22de.7ac4 LOG: started streaming WAL from primary at 384/42000000 on timeline 1 2018-05-17 07:20:17 UTC 5afd22de.7ac4 FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Or some that also look like this: 2018-05-17 07:20:17 UTC 5af0599f.210d LOG: record with incorrect prev-link 49F07120/9F100C95 at 384/45209FC0 2018-05-17 07:20:18 UTC 5afd2d31.1889 LOG: started streaming WAL from primary at 384/45000000 on timeline 1 2018-05-17 08:03:28 UTC 5afd2d31.1889 FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. And some like this: 2018-05-17 23:00:13 UTC 5afd63ec.26fc LOG: invalid magic number 0000 in log segment 00000001000003850000003C, offset 10436608 2018-05-17 23:00:14 UTC 5afe097d.49aa LOG: started streaming WAL from primary at 385/3C000000 on timeline 1 Then, like maybe once every couple months or so, we have a crash with logs looking like this: 2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology WARNING: terminating connection because of crash of another server process 2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology HINT: In a moment you should be able to reconnect to the database and repeat your command. 2018-05-17 08:03:28 UTC 5af0599f.210a LOG: database system is shut down When this last error occurs, the recovery is to go on the replica and remove all the WAL logs from the pg_xlog director and then restart Postgresql. Everything seems to recover and come up fine. I've done some tests comparing counts between the replica and the primary and everything seems synced just fine from all I can tell. So, a couple of questions. 1) Should I be worried that my replica is corrupt in some way or given that everything *seems* ok, is it reasonable to believe that things are working correctly in spite of these errors being reported. 2) Is there something I should configure differently to avoid some of these errors? Thanks in advance for any help. Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
pgsql-general by date: