pgsql: Rework order of end-of-recovery actions to delay timelinehistor - Mailing list pgsql-committers

From Michael Paquier
Subject pgsql: Rework order of end-of-recovery actions to delay timelinehistor
Date
Msg-id E1fcKzR-0003wM-4w@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Rework order of end-of-recovery actions to delay timeline history write

A critical failure in some of the end-of-recovery actions before the
end-of-recovery record is written can cause PostgreSQL to react
inconsistently with the rest of the cluster in the event of a crash
before the final record is written.  Two such failures are for example
an error while processing a two-phase state files or when operating on
recovery.conf.  With this commit, the failures are still considered
FATAL, but the write of the timeline history file is delayed as much as
possible so as the window between the moment the file is written and the
end-of-recovery record is generated gets minimized. This way, in the
event of a crash or a failure, the new timeline decided at promotion
will not seem taken by other nodes in the cluster.  It is not really
possible to reduce to zero this window, hence one could still see
failures if a crash happens between the history file write and the
end-of-recovery record, so any future code should be careful when
adding new end-of-recovery actions.  The original report from Magnus
Hagander mentioned a renamed recovery.conf as original end-of-recovery
failure which caused a timeline to be seen as taken but the subsequent
processing on the now-missing recovery.conf cause the startup process to
issue stop on FATAL, which at follow-up startup made the system
inconsistent because of on-disk changes which already happened.

Processing of two-phase state files still needs some work as corrupted
entries are simply ignored now.  This is left as a future item and this
commit fixes the original complain.

Reported-by: Magnus Hagander
Author: Heikki Linnakangas
Reviewed-by: Alexander Korotkov, Michael Paquier, David Steele
Discussion: https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com

Branch
------
REL9_6_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/619dea4678c345d119036311c503b55b974695d3

Modified Files
--------------
src/backend/access/transam/xlog.c | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)


pgsql-committers by date:

Previous
From: Jeff Davis
Date:
Subject: pgsql: Fix WITH CHECK OPTION on views referencing postgres_fdw tables.
Next
From: Michael Paquier
Date:
Subject: Re: pgsql: Use access() to check file existence inGetNewRelFileNode()