Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs
From | Alexander Kukushkin |
---|---|
Subject | Re: BUG #15346: Replica fails to start after the crash |
Date | |
Msg-id | CAFh8B==4gpwor6ODercru0iifqwd0nQrdRvXzmorWkMRQa4q=g@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15346: Replica fails to start after the crash (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: BUG #15346: Replica fails to start after the crash
|
List | pgsql-bugs |
Hi Michae, It aborts not right after reaching a consistent state, but a little bit later: "consistent recovery state reached at AB3/4A1B3118" "WAL contains references to invalid pages",,,,,"xlog redo at AB3/50323E78 for Btree/DELETE: 182 items" Wal segments 0000000500000AB30000004A and 0000000500000AB300000050 correspondingly. Also I managed to get a backtrace: Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007fd3f1deb801 in __GI_abort () at abort.c:79 #2 0x00005644f7094aaf in errfinish (dummy=dummy@entry=0) at ./build/../src/backend/utils/error/elog.c:557 #3 0x00005644f7096a49 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x5644f70e5998 "WAL contains references to invalid pages") at ./build/../src/backend/utils/error/elog.c:1378 #4 0x00005644f6d8ec18 in log_invalid_page (node=..., forkno=forkno@entry=MAIN_FORKNUM, blkno=blkno@entry=179503104, present=<optimized out>) at ./build/../src/backend/access/transam/xlogutils.c:95 #5 0x00005644f6d8f168 in XLogReadBufferExtended (rnode=..., forknum=forknum@entry=MAIN_FORKNUM, blkno=179503104, mode=mode@entry=RBM_NORMAL) at ./build/../src/backend/access/transam/xlogutils.c:515 #6 0x00005644f6d59448 in btree_xlog_delete_get_latestRemovedXid (record=0x5644f7d35758) at ./build/../src/backend/access/nbtree/nbtxlog.c:593 #7 btree_xlog_delete (record=0x5644f7d35758) at ./build/../src/backend/access/nbtree/nbtxlog.c:682 #8 btree_redo (record=0x5644f7d35758) at ./build/../src/backend/access/nbtree/nbtxlog.c:1012 #9 0x00005644f6d83c34 in StartupXLOG () at ./build/../src/backend/access/transam/xlog.c:6967 #10 0x00005644f6f27953 in StartupProcessMain () at ./build/../src/backend/postmaster/startup.c:216 #11 0x00005644f6d925a7 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7fff1f893a80) at ./build/../src/backend/bootstrap/bootstrap.c:419 #12 0x00005644f6f24983 in StartChildProcess (type=StartupProcess) at ./build/../src/backend/postmaster/postmaster.c:5300 #13 0x00005644f6f27437 in PostmasterMain (argc=16, argv=0x5644f7d07f10) at ./build/../src/backend/postmaster/postmaster.c:1321 #14 0x00005644f6d08b5b in main (argc=16, argv=0x5644f7d07f10) at ./build/../src/backend/main/main.c:228 From looking on xlogutils.c I actually figured out that this backtrace is a bit misleading because it can't get to the line 515 So I checked out REL9_6_10 tag and build it with -O0, now it makes more sense. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f9ae0777801 in __GI_abort () at abort.c:79 #2 0x000055c9f285f56d in errfinish (dummy=0) at elog.c:557 #3 0x000055c9f2861d7c in elog_finish (elevel=22, fmt=0x55c9f28cbff0 "WAL contains references to invalid pages") at elog.c:1378 #4 0x000055c9f2424e76 in log_invalid_page (node=..., forkno=MAIN_FORKNUM, blkno=179503104, present=0 '\000') at xlogutils.c:95 #5 0x000055c9f24258ce in XLogReadBufferExtended (rnode=..., forknum=MAIN_FORKNUM, blkno=179503104, mode=RBM_NORMAL) at xlogutils.c:470 #6 0x000055c9f23de8ce in btree_xlog_delete_get_latestRemovedXid (record=0x55c9f3cbf438) at nbtxlog.c:593 #7 0x000055c9f23dea88 in btree_xlog_delete (record=0x55c9f3cbf438) at nbtxlog.c:682 #8 0x000055c9f23df857 in btree_redo (record=0x55c9f3cbf438) at nbtxlog.c:1012 #9 0x000055c9f24160e8 in StartupXLOG () at xlog.c:6967 #10 0x000055c9f267d9f6 in StartupProcessMain () at startup.c:216 #11 0x000055c9f242a276 in AuxiliaryProcessMain (argc=2, argv=0x7ffea746c260) at bootstrap.c:419 #12 0x000055c9f267c93e in StartChildProcess (type=StartupProcess) at postmaster.c:5300 #13 0x000055c9f2677469 in PostmasterMain (argc=16, argv=0x55c9f3c91690) at postmaster.c:1321 #14 0x000055c9f25c1515 in main (argc=16, argv=0x55c9f3c91690) at main.c:228 > After close to an hour of repeated tests, I am not able to reproduce > that. Maybe there are some specifics in your schema causing more > certain types of records to be created. Could you check if in the set > of WAL segments replayed you have references to the block the hash table > is complaining about? I am sorry, but don't really know where to look :( Regards, -- Alexander Kukushkin
pgsql-bugs by date: