Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #15346: Replica fails to start after the crash
Date
Msg-id 20180828024409.GB29157@paquier.xyz
Whole thread Raw
In response to Re: BUG #15346: Replica fails to start after the crash  (Alexander Kukushkin <cyberdemn@gmail.com>)
Responses Re: BUG #15346: Replica fails to start after the crash  (Andres Freund <andres@anarazel.de>)
Re: BUG #15346: Replica fails to start after the crash  (Alexander Kukushkin <cyberdemn@gmail.com>)
Re: BUG #15346: Replica fails to start after the crash  (Stephen Frost <sfrost@snowman.net>)
List pgsql-bugs
On Sat, Aug 25, 2018 at 09:54:39AM +0200, Alexander Kukushkin wrote:
> Why the number of tuples in the xlog is greater than the number of
> tuples on the index page?
> Because this page was already overwritten and its LSN is HIGHER than
> the current LSN!

That's annoying.  Because that means that the control file of your
server maps to a consistent point which is older than some of the
relation pages.  How was the base backup of this node created?  Please
remember that when taking a base backup from a standby, you should
backup the control file last, as there is no control of end backup with
records available.  So it seems to me that the origin of your problem
comes from an incorrect base backup expectation?

> Is there a way to recover from such a situation? Should the postgres
> in such case do comparison of LSNs and if the LSN on the page is
> higher than the current LSN simply return InvalidTransactionId?
> Apparently, if there are no connections open postgres simply is not
> running this code and it seems ok.

One idea I have would be to copy all the WAL segments up to the point
where the pages to-be-updated are, and let Postgres replay all the local
WALs first.  However it is hard to say if that would be enough, as you
could have more references to pages even newer than the btree one you
just found.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #15350: Getting invalid cache ID: 11 Errors
Next
From: Michael Paquier
Date:
Subject: Re: BUG #15347: Unaccent for greek characters does not work