Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs
From | Alexander Kukushkin |
---|---|
Subject | Re: BUG #15346: Replica fails to start after the crash |
Date | |
Msg-id | CAFh8B=kuFyBsth-1wuPqncnTiCa7reKzCz=RzO1X-Y78U74F5g@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15346: Replica fails to start after the crash (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: BUG #15346: Replica fails to start after the crash
|
List | pgsql-bugs |
Hi, 2018-08-29 8:17 GMT+02:00 Michael Paquier <michael@paquier.xyz>: > Hmm. That does not sound right to me. If the page has a header LSN > higher than the one replayed, we should not see it as consistency has > normally been reached. XLogReadBufferExtended() seems to complain in > your case about a page which should not exist per the information of > your backtrace. What's the length of relation file at this point? If > the relation is considered as having less blocks, shouldn't we just > ignore it if we're trying to delete items on it and return > InvalidTransactionId? I have to admit that I am not the best specialist > with this code. This is what pg_controldata reports: Latest checkpoint location: AB3/4A1B30A8 Prior checkpoint location: AB3/4A1B30A8 Latest checkpoint's REDO location: AB3/4149B340 Latest checkpoint's REDO WAL file: 0000000500000AB300000041 Latest checkpoint's TimeLineID: 5 Latest checkpoint's PrevTimeLineID: 5 Latest checkpoint's full_page_writes: on Minimum recovery ending location: AB3/4A1B3118 Min recovery ending loc's timeline: 5 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: replica wal_log_hints setting: on Therefore it reaches consistency at AB3/4A1B3118 and starts accepting connections. A few moments later it starts replaying 0000000500000AB300000050, where "AB3/50323E78" contains "Btree/DELETE: 182". This record in the pg_xlogdump looks like: rmgr: Btree len (rec/tot): 436/ 547, tx: 518902810, lsn: AB3/50323E78, prev AB3/50323E20, desc: DELETE 182 items, blkref #0: rel 1663/18055/212907 blk 72478 FPW 212907 -- btree index: $ ls -al data/base/18055/212907* -rw------- 1 akukushkin akukushkin 594812928 Aug 22 07:22 data/base/18055/212907 -rw------- 1 akukushkin akukushkin 163840 Aug 22 07:22 data/base/18055/212907_fsm 212875 -- the table: $ ls -al data/base/18055/212875* -rw------- 1 akukushkin akukushkin 1073741824 Aug 22 07:20 data/base/18055/212875 -rw------- 1 akukushkin akukushkin 1014947840 Aug 22 07:21 data/base/18055/212875.1 -rw------- 1 akukushkin akukushkin 532480 Aug 22 07:21 data/base/18055/212875_fsm -rw------- 1 akukushkin akukushkin 65536 Aug 22 07:21 data/base/18055/212875_vm As you can see its size is only about 2GB. > > hblkno looks also unsanely high to me if you look at the other blkno > references you are mentioning upthread. hblkno is calculated from the data which postgres reads from indexfile data/base/18055/212907, block 72478. This block has only 56 index tuples, while there expected to be at least 182, therefore starting from tuple 57 the tuple pointer starts looking as: (gdb) p *iitemid $47 = {lp_off = 0, lp_flags = 0, lp_len = 0} Basically it points to the beginning of the page, what doesn't make sense at all. Why the block 72478 of index relfile doesn't meet our expectations (contains so few tuples)? The answer to this question is in the page header. LSN, written in the indexpage header is AB3/56BF3B68. That has only one meaning, while the postgres was working before the crash it managed to apply WAL stream til at least AB3/56BF3B68, what is far ahead of "Minimum recovery ending location: AB3/4A1B3118". Regards, -- Alexander Kukushkin
pgsql-bugs by date: