Re: [HACKERS] Crash on promotion when recovery.conf is renamed - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Crash on promotion when recovery.conf is renamed
Date
Msg-id 20180709012955.GD1467@paquier.xyz
Whole thread Raw
In response to Re: [HACKERS] Crash on promotion when recovery.conf is renamed  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Fri, Jul 06, 2018 at 09:09:27PM +0900, Michael Paquier wrote:
> Sure.  I think I can finish the set on Monday JST then.

So, I have been able to back-patch things down to 9.5, but further down
I am not really convinced that it is worth meddling with that,
particularly in light of 7cbee7c which has reworked the way partial
segments on older timelines are handled at the end of promotion.  The
portability issues I have found is related to the timeline number
exitArchiveRecovery uses which comes for the WAL reader hence this gets
set to the timeline from the last page read by the startup process.
This can actually cause quite a bit of trouble at the end of recovery
as we would get a failure when trying to copy the last segment from the
old timeline to the new timeline.  Well, it could be possible to fix
things properly by roughly back-porting 7cbee7c down to REL9_4_STABLE
but that's not really worth the risk, and moving exitArchiveRecovery()
and PrescanPreparedTransactions() around is a straight-forward move with
9.5~ thanks to this commit.

I have also done nothing yet for the detection of corrupted 2PC files
which get ignored at recovery.  While looking again at the patch I sent
upthread, the thing was actually missing some more error handling in
ReadTwoPhaseFile().  In particular, with the proposed patch we would
still not report an error if a 2PC file cannot be opened because of for
example EPERM.  In FinishPreparedTransaction, one would example get a
simply a *crash* with no hints about what happened.  I have a patch for
that which still needs some polishing, and I will start a new thread on
the matter.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] Secondary index access optimizations
Next
From: Michael Paquier
Date:
Subject: Re: Copy function for logical replication slots