Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs

From Alexander Kukushkin
Subject Re: BUG #15346: Replica fails to start after the crash
Date
Msg-id CAFh8B=kuFyBsth-1wuPqncnTiCa7reKzCz=RzO1X-Y78U74F5g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15346: Replica fails to start after the crash  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #15346: Replica fails to start after the crash
List pgsql-bugs
Hi,

2018-08-29 8:17 GMT+02:00 Michael Paquier <michael@paquier.xyz>:
> Hmm.  That does not sound right to me.  If the page has a header LSN
> higher than the one replayed, we should not see it as consistency has
> normally been reached.  XLogReadBufferExtended() seems to complain in
> your case about a page which should not exist per the information of
> your backtrace.  What's the length of relation file at this point?  If
> the relation is considered as having less blocks, shouldn't we just
> ignore it if we're trying to delete items on it and return
> InvalidTransactionId?  I have to admit that I am not the best specialist
> with this code.

This is what pg_controldata reports:
Latest checkpoint location:           AB3/4A1B30A8
Prior checkpoint location:            AB3/4A1B30A8
Latest checkpoint's REDO location:    AB3/4149B340
Latest checkpoint's REDO WAL file:    0000000500000AB300000041
Latest checkpoint's TimeLineID:       5
Latest checkpoint's PrevTimeLineID:   5
Latest checkpoint's full_page_writes: on

Minimum recovery ending location:     AB3/4A1B3118
Min recovery ending loc's timeline:   5
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                on

Therefore it reaches consistency at AB3/4A1B3118 and starts accepting
connections.
A few moments later it starts replaying 0000000500000AB300000050,
where "AB3/50323E78" contains "Btree/DELETE: 182".
This record in the pg_xlogdump looks like:
rmgr: Btree       len (rec/tot):    436/   547, tx:  518902810, lsn:
AB3/50323E78, prev AB3/50323E20, desc: DELETE 182 items, blkref #0:
rel 1663/18055/212907 blk 72478 FPW

212907 -- btree index:
$ ls -al data/base/18055/212907*
-rw------- 1 akukushkin akukushkin 594812928 Aug 22 07:22 data/base/18055/212907
-rw------- 1 akukushkin akukushkin    163840 Aug 22 07:22
data/base/18055/212907_fsm

212875 -- the table:
$ ls -al data/base/18055/212875*
-rw------- 1 akukushkin akukushkin 1073741824 Aug 22 07:20
data/base/18055/212875
-rw------- 1 akukushkin akukushkin 1014947840 Aug 22 07:21
data/base/18055/212875.1
-rw------- 1 akukushkin akukushkin     532480 Aug 22 07:21
data/base/18055/212875_fsm
-rw------- 1 akukushkin akukushkin      65536 Aug 22 07:21
data/base/18055/212875_vm

As you can see its size is only about 2GB.

>
> hblkno looks also unsanely high to me if you look at the other blkno
> references you are mentioning upthread.

hblkno is calculated from the data which postgres reads from indexfile
data/base/18055/212907, block 72478.
This block has only 56 index tuples, while there expected to be at
least 182, therefore starting from tuple 57 the tuple pointer starts
looking as:
(gdb) p *iitemid
$47 = {lp_off = 0, lp_flags = 0, lp_len = 0}

Basically it points to the beginning of the page, what doesn't make
sense at all.

Why the block 72478 of index relfile doesn't meet our expectations
(contains so few tuples)?
The answer to this question is in the page header. LSN, written in the
indexpage header is AB3/56BF3B68.
That has only one meaning, while the postgres was working before the
crash it managed to apply WAL stream til at least AB3/56BF3B68, what
is far ahead of "Minimum recovery ending location:     AB3/4A1B3118".

Regards,
--
Alexander Kukushkin


pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #15346: Replica fails to start after the crash
Next
From: Ashutosh Bapat
Date:
Subject: Re: BUG #15352: postgresql FDW error "ERROR: ORDER BY position 0 isnot in select list"