Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs
From | Alexander Kukushkin |
---|---|
Subject | Re: BUG #15346: Replica fails to start after the crash |
Date | |
Msg-id | CAFh8B=mYGP1J8ueHKU40Dr4bNBUPJj1rTbZ6Wpg4gR30uCRunA@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15346: Replica fails to start after the crash (Alexander Kukushkin <cyberdemn@gmail.com>) |
Responses |
Re: BUG #15346: Replica fails to start after the crash
Re: BUG #15346: Replica fails to start after the crash |
List | pgsql-bugs |
Hi, btree_xlog_delete_get_latestRemovedXid is callled only when the postgres is already running InHotStandby And what this function is actually doing, it reads index page from disk and iterates over IndexTuple records to get corresponding HeapPages. The problem is that xlrec->nitems = 182 in my case, but on the index page there are only 56 tuples and the rest of the page is filled with zeros: (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f07a2be2801 in __GI_abort () at abort.c:79 #2 0x000055938000f586 in errfinish (dummy=0) at elog.c:557 #3 0x0000559380011d95 in elog_finish (elevel=22, fmt=0x55938007c010 "WAL contains references to invalid pages") at elog.c:1378 #4 0x000055937fbd4e8f in log_invalid_page (node=..., forkno=MAIN_FORKNUM, blkno=179503104, present=0 '\000') at xlogutils.c:95 #5 0x000055937fbd58e7 in XLogReadBufferExtended (rnode=..., forknum=MAIN_FORKNUM, blkno=179503104, mode=RBM_NORMAL) at xlogutils.c:470 #6 0x000055937fb8e8e7 in btree_xlog_delete_get_latestRemovedXid (record=0x55938151b438) at nbtxlog.c:599 #7 0x000055937fb8eaa1 in btree_xlog_delete (record=0x55938151b438) at nbtxlog.c:688 #8 0x000055937fb8f870 in btree_redo (record=0x55938151b438) at nbtxlog.c:1018 #9 0x000055937fbc6101 in StartupXLOG () at xlog.c:6967 #10 0x000055937fe2da0f in StartupProcessMain () at startup.c:216 #11 0x000055937fbda28f in AuxiliaryProcessMain (argc=2, argv=0x7ffe88f10df0) at bootstrap.c:419 #12 0x000055937fe2c957 in StartChildProcess (type=StartupProcess) at postmaster.c:5300 #13 0x000055937fe27482 in PostmasterMain (argc=16, argv=0x5593814ed690) at postmaster.c:1321 #14 0x000055937fd7152e in main (argc=16, argv=0x5593814ed690) at main.c:228 (gdb) frame 6 #6 0x000055937fb8e8e7 in btree_xlog_delete_get_latestRemovedXid (record=0x55938151b438) at nbtxlog.c:599 599 hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM, hblkno, RBM_NORMAL); (gdb) l 587 582 * Loop through the deleted index items to obtain the TransactionId from 583 * the heap items they point to. 584 */ 585 unused = (OffsetNumber *) ((char *) xlrec + SizeOfBtreeDelete); 586 587 for (i = 0; i < xlrec->nitems; i++) 588 { 589 /* 590 * Identify the index tuple about to be deleted 591 */ (gdb) l 592 iitemid = PageGetItemId(ipage, unused[i]); 593 itup = (IndexTuple) PageGetItem(ipage, iitemid); 594 595 /* 596 * Locate the heap page that the index tuple points at 597 */ 598 hblkno = ItemPointerGetBlockNumber(&(itup->t_tid)); 599 hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM, hblkno, RBM_NORMAL); 600 if (!BufferIsValid(hbuffer)) 601 { (gdb) p i $45 = 53 (gdb) p unused[i] $46 = 57 (gdb) p *iitemid $47 = {lp_off = 0, lp_flags = 0, lp_len = 0} Therefore iitemid points to the beginning of the index page, what doesn't make sense. Why the number of tuples in the xlog is greater than the number of tuples on the index page? Because this page was already overwritten and its LSN is HIGHER than the current LSN! (gdb) p *(PageHeader)ipage $49 = {pd_lsn = {xlogid = 2739, xrecoff = 1455373160}, pd_checksum = 3042, pd_flags = 0, pd_lower = 248, pd_upper = 5936, pd_special = 8176, pd_pagesize_version = 8196, pd_prune_xid = 0, pd_linp = 0x7f06ed9e9598} Index page LSN: AB3/56BF3B68 Current LSN: AB3/50323E78 Is there a way to recover from such a situation? Should the postgres in such case do comparison of LSNs and if the LSN on the page is higher than the current LSN simply return InvalidTransactionId? Apparently, if there are no connections open postgres simply is not running this code and it seems ok. Regards, -- Alexander Kukushkin
pgsql-bugs by date: