Re: [BUG] Panic due to incorrect missingContrecPtr after promotion - Mailing list pgsql-hackers
From | Imseih (AWS), Sami |
---|---|
Subject | Re: [BUG] Panic due to incorrect missingContrecPtr after promotion |
Date | |
Msg-id | 6A726564-E016-4305-97AB-F0E825B3B53C@amazon.com Whole thread Raw |
In response to | Re: [BUG] Panic due to incorrect missingContrecPtr after promotion (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
|
List | pgsql-hackers |
> The server seem to have started as a standby after crashing a > primary. Is it correct? Yes, that is correct. 2022-08-05 17:18:51 UTC::@:[359]:LOG: database system was interrupted; last known up at 2022-08-05 17:08:52 UTC 2022-08-05 17:18:51 UTC::@:[359]:LOG: creating missing WAL directory "pg_wal/archive_status" recovering 00000002.history 2022-08-05 17:18:51 UTC::@:[359]:LOG: entering standby mode 2022-08-05 17:18:51 UTC::@:[359]:LOG: ### [S] @0/48A3400: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: checkpoint record is at 0/48A3388 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: redo record is at 0/48A3388; shutdown true 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: next transaction ID: 533; next OID: 16395 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: next MultiXactId: 1; next MultiXactOffset: 0 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: oldest unfrozen transaction ID: 479, in database 1 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: oldest MultiXactId: 1, in database 1 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: commit timestamp Xid oldest/newest: 0/0 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: transaction ID wrap limit is 2147484126, limited by database with OID 1 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: starting up replication slots 2022-08-05 17:18:51 UTC::@:[359]:LOG: database system was not properly shut down; automatic recovery in progress 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: resetting unlogged relations: cleanup 1 init 0 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: initializing for hot standby 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: recovery snapshots are now enabled 2022-08-05 17:18:51 UTC::@:[359]:LOG: redo starts at 0/48A3400 2022-08-05 17:18:51 UTC::@:[359]:LOG: ### [F] @0/10000000: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1 2022-08-05 17:18:51 UTC::@:[359]:DEBUG: reached end of WAL in pg_wal, entering archive recovery 2022-08-05 17:18:51 UTC::@:[359]:LOG: consistent recovery state reached at 0/10000000 2022-08-05 17:18:51 UTC::@:[357]:LOG: database system is ready to accept read only connections > Archive recovery ended here. The server should have promoted that > time.. Do you see some interesting log lines around this time? The server did promote around that time 2022-08-05 18:38:14 UTC::@:[359]:LOG: ### [F] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=1, SbyModeReq=1 2022-08-05 18:38:14 UTC::@:[359]:LOG: received promote request 2022-08-05 18:38:14 UTC::@:[359]:LOG: redo done at 6/B6CB27A8 2022-08-05 18:38:14 UTC::@:[359]:LOG: last completed transaction was at log time 2022-08-05 18:38:00.832047+00 recovering 00000001000000060000002D .... ........ 2022-08-05 18:38:14 UTC::@:[359]:LOG: ### [S] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1 recovering 00000002.history 2022-08-05 18:38:14 UTC::@:[359]:LOG: selected new timeline ID: 2 2022-08-05 18:38:14 UTC::@:[359]:LOG: archive recovery complete recovering 00000001.history 2022-08-05 18:38:14 UTC::@:[357]:LOG: database system is ready to accept connections 2022-08-05 18:38:15 UTC::@:[367]:LOG: restartpoint complete: wrote 21388 buffers (2.1%); 0 WAL file(s) added, 9 removed,0 recycled; write=98.394 s, sync=0.041 s, total=98.586 s; sync files=46, longest=0.012 s, average=0.001 s; distance=1048565kB, estimate=1048584 kB 2022-08-05 18:38:15 UTC::@:[367]:LOG: recovery restart point at 6/5C0003B0 > But, recovery continues in non-standby mode. I don't see how come it > behaves that way. But the server crashes sometime after which is why recovery starts. 022-08-05 18:50:13 UTC::@:[357]:LOG: listening on IPv4 address "0.0.0.0", port 5432 2022-08-05 18:50:13 UTC::@:[357]:LOG: listening on IPv6 address "::", port 5432 2022-08-05 18:50:13 UTC::@:[357]:LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2022-08-05 18:50:13 UTC::@:[359]:LOG: database system was interrupted; last known up at 2022-08-05 18:38:15 UTC 2022-08-05 18:50:13 UTC::@:[359]:LOG: creating missing WAL directory "pg_wal/archive_status" recovering 00000003.history 2022-08-05 18:50:13 UTC::@:[359]:LOG: entering standby mode recovering 00000002.history 2022-08-05 18:50:13 UTC::@:[359]:LOG: ### [S] @6/B8000198: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1 2022-08-05 18:50:13 UTC::@:[359]:LOG: database system was not properly shut down; automatic recovery in progress 2022-08-05 18:50:13 UTC::@:[359]:LOG: redo starts at 6/B80000E8 And a few hours later, is when we see a panic Thanks -- Sami Imseih Amazon Web Services (AWS)
pgsql-hackers by date: