Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #15346: Replica fails to start after the crash
Date
Msg-id 20180829205436.GE5903@paquier.xyz
Whole thread Raw
In response to Re: BUG #15346: Replica fails to start after the crash  (Alexander Kukushkin <cyberdemn@gmail.com>)
List pgsql-bugs
On Wed, Aug 29, 2018 at 03:05:25PM +0200, Alexander Kukushkin wrote:
> BTW, I am thinking that we should return InvalidTransactionId from the
> btree_xlog_delete_get_latestRemovedXid if the index page we read from
> disk is newer then xlog record we are currently processing. Please see
> the patch attached.

This is not a solution in my opinion, as you could invalidate activities
of backends connected to the database when the incorrect consistent
point allows connections to come in too early.

What happens if you replay with hot_standby = on up to the latest point,
without any concurrent connections, then issue a checkpoint on the
standby once you got to a point newer than the complain, and finally
restart the standby with the bgworker?

Another idea I have would be to make the standby promote, issue a
checkpoint on it, and then use pg_rewind as a trick to update the
control file to a point newer than the inconsistency.  As PG < 9.6.10
could make the minimum recovery point go backwards, applying the upgrade
after the consistent point got to an incorrect state would trigger the
failure.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Alexander Kukushkin
Date:
Subject: Re: BUG #15346: Replica fails to start after the crash
Next
From: Thomas Munro
Date:
Subject: Re: BUG #15350: Getting invalid cache ID: 11 Errors