Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #15346: Replica fails to start after the crash
Date
Msg-id 20180822081126.GE4333@paquier.xyz
Whole thread Raw
In response to BUG #15346: Replica fails to start after the crash  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #15346: Replica fails to start after the crash  (Alexander Kukushkin <cyberdemn@gmail.com>)
List pgsql-bugs
On Wed, Aug 22, 2018 at 07:36:58AM +0000, PG Bug reporting form wrote:
> 2018-08-22 06:22:23.633 UTC,,,54,,5b7d0114.36,23,,2018-08-22 06:22:12
> UTC,1/0,0,WARNING,01000,"page 179503104 of relation base/18055/212875 does
> not exist",,,,,"xlog redo at AB3/50323E78 for Btree/DELETE: 182
> items",,,,""
> 2018-08-22 06:22:23.634 UTC,,,54,,5b7d0114.36,24,,2018-08-22 06:22:12
> UTC,1/0,0,PANIC,XX000,"WAL contains references to invalid pages",,,,,"xlog
> redo at AB3/50323E78 for Btree/DELETE: 182 items",,,,""

Once recovery has reached a consistent state, the startup process would
look at if there are any invalid pages tracked in a given hash table and
complains loudly about them.  It is not the last record or its
surroundings which matter in case, but if this page has been found in
one of the records replayed during recovery up to the consistent point.
Do you have in any records from the WAL segments fetched a reference to
this page?  A page is 8kB, and the page number is 179503104, which is
definitely weird as that would cause a relation file to be more than
1000GB.  If the record itself is in bad shape, this may be a corrupted
segment.  As far as I can see you only have one incorrect page reference
(see XLogCheckInvalidPages in xlog.c).

> I will keep this instance around for further investigation and would be
> happy to provide some more details if you need.

That would be nice!
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #15346: Replica fails to start after the crash
Next
From: David Steele
Date:
Subject: Re: BUG #15335: Documentation is wrong about archive_command andexisting files