Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #15346: Replica fails to start after the crash
Date
Msg-id 20180829061723.GA5903@paquier.xyz
Whole thread Raw
In response to Re: BUG #15346: Replica fails to start after the crash  (Alexander Kukushkin <cyberdemn@gmail.com>)
Responses Re: BUG #15346: Replica fails to start after the crash  (Alexander Kukushkin <cyberdemn@gmail.com>)
List pgsql-bugs
On Sat, Aug 25, 2018 at 09:54:39AM +0200, Alexander Kukushkin wrote:
> Is there a way to recover from such a situation? Should the postgres
> in such case do comparison of LSNs and if the LSN on the page is
> higher than the current LSN simply return InvalidTransactionId?

Hmm.  That does not sound right to me.  If the page has a header LSN
higher than the one replayed, we should not see it as consistency has
normally been reached.  XLogReadBufferExtended() seems to complain in
your case about a page which should not exist per the information of
your backtrace.  What's the length of relation file at this point?  If
the relation is considered as having less blocks, shouldn't we just
ignore it if we're trying to delete items on it and return
InvalidTransactionId?  I have to admit that I am not the best specialist
with this code.

hblkno looks also unsanely high to me if you look at the other blkno
references you are mentioning upthread.

> Apparently, if there are no connections open postgres simply is not
> running this code and it seems ok.

Yeah, that's used for standby conflicts.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #15350: Getting invalid cache ID: 11 Errors
Next
From: Alexander Kukushkin
Date:
Subject: Re: BUG #15346: Replica fails to start after the crash