Re: Standby corruption after master is restarted - Mailing list pgsql-bugs
From | Emre Hasegeli |
---|---|
Subject | Re: Standby corruption after master is restarted |
Date | |
Msg-id | CAE2gYzwUUW+_o8hSQuUkpgJj=OkerBghuKXUe4ZE12dJWA2vaw@mail.gmail.com Whole thread Raw |
In response to | [BUGS] Standby corruption after master is restarted (Emre Hasegeli <emre@hasegeli.com>) |
Responses |
Re: Standby corruption after master is restarted
|
List | pgsql-bugs |
We experienced this issue again, this time on production. The primary instance was in a loop of being killed by Linux OOM-killer and being restarted in 1 minute intervals. The corruption only happened on one of the two standbys. The primary and the other standby have no problems. Only the was killed and restarted, the standbys were not. There weren't any unusual settings, "fsync" was not disabled. Here is the information I collected. The logs at the time standby broke: > 2018-03-28 14:00:30 UTC [3693-67] LOG: invalid resource manager ID 39 at 1DFB/D43BE688 > 2018-03-28 14:00:30 UTC [25347-1] LOG: started streaming WAL from primary at 1DFB/D4000000 on timeline 5 > 2018-03-28 14:00:59 UTC [3748-357177] LOG: restartpoint starting: time > 2018-03-28 14:01:23 UTC [25347-2] FATAL: could not receive data from WAL stream: SSL SYSCALL error: EOF detected > 2018-03-28 14:01:24 UTC [3693-68] FATAL: invalid memory alloc request size 1916035072 And from the next try: > 2018-03-28 14:02:15 UTC [26808-5] LOG: consistent recovery state reached at 1DFB/D6BDDFF8 > 2018-03-28 14:02:15 UTC [26808-6] FATAL: invalid memory alloc request size 191603507 The output of pg_controldata: > pg_control version number: 942 > Catalog version number: 201510051 > Database system identifier: 6223303827582122631 > Database cluster state: in archive recovery > pg_control last modified: 2018-03-28T15:25:51 UTC > Latest checkpoint location: 1DFB/D1E17FD8 > Prior checkpoint location: 1DFB/D1E17FD8 > Latest checkpoint's REDO location: 1DFB/D1E17FD8 > Latest checkpoint's REDO WAL file: 0000000500001DFB000000D1 > Latest checkpoint's TimeLineID: 5 > Latest checkpoint's PrevTimeLineID: 5 > Latest checkpoint's full_page_writes: on > Latest checkpoint's NextXID: 2/3522301360 > Latest checkpoint's NextOID: 141900 > Latest checkpoint's NextMultiXactId: 3434 > Latest checkpoint's NextMultiOffset: 7985 > Latest checkpoint's oldestXID: 3329390243 > Latest checkpoint's oldestXID's DB: 16385 > Latest checkpoint's oldestActiveXID: 0 > Latest checkpoint's oldestMultiXid: 1 > Latest checkpoint's oldestMulti's DB: 16385 > Latest checkpoint's oldestCommitTsXid:0 > Latest checkpoint's newestCommitTsXid:0 > Time of latest checkpoint: 2018-03-28T13:55:52 UTC > Fake LSN counter for unlogged rels: 0/1 > Minimum recovery ending location: 1DFB/D6BDDFF8 > Min recovery ending loc's timeline: 5 > Backup start location: 0/0 > Backup end location: 0/0 > End-of-backup record required: no > wal_level setting: hot_standby > wal_log_hints setting: on > max_connections setting: 500 > max_worker_processes setting: 8 > max_prepared_xacts setting: 0 > max_locks_per_xact setting: 64 > track_commit_timestamp setting: off > Maximum data alignment: 8 > Database block size: 8192 > Blocks per segment of large relation: 131072 > WAL block size: 8192 > Bytes per WAL segment: 16777216 > Maximum length of identifiers: 64 > Maximum columns in an index: 32 > Maximum size of a TOAST chunk: 1996 > Size of a large-object chunk: 2048 > Date/time type storage: 64-bit integers > Float4 argument passing: by value > Float8 argument passing: by value > Data page checksum version: 0 Hexdump of the WAL file at the position on standby: 0bddfc0 0002 0003 0004 0005 0006 0007 0008 0009 0bddfd0 000a 000b 000c 000d 000e 000f 0010 0011 0bddfe0 0012 0013 0014 0015 0016 0017 0000 0000 0bddff0 0000 0017 0000 0000 5f32 7234 4b55 7045 0bde000 d087 0005 0005 0000 e000 66bd 1dfb 0000 0bde010 1931 0000 0000 0000 5a43 7746 7166 6e34 0bde020 304e 764e 9c32 0158 5400 e709 0900 6f66 0bde030 0765 7375 6111 646e 6f72 6469 370d 312e Hexdump at the same position on the master: 0bddfc0 0002 0003 0004 0005 0006 0007 0008 0009 0bddfd0 000a 000b 000c 000d 000e 000f 0010 0011 0bddfe0 0012 0013 0014 0015 0016 0017 0000 0000 0bddff0 0000 0017 0000 0000 006a 0000 0000 0000 0bde000 d087 0005 0005 0000 e000 d6bd 1dfb 0000 0bde010 0062 0000 0000 0000 df90 d6bd 1dfb 0000 0bde020 0000 0000 63d3 aa61 50ff dff8 d6bd 1dfb 0bde030 0000 0005 0000 0005 0000 0001 0000 0002
pgsql-bugs by date: