Re: BUG #14999: pg_rewind corrupts control file global/pg_control - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Date
Msg-id 22961.1522867812@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #14999: pg_rewind corrupts control file global/pg_control  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #14999: pg_rewind corrupts control file global/pg_control
List pgsql-bugs
Michael Paquier <michael@paquier.xyz> writes:
> So after that I falled back to your patch and began testing it, which is
> where I noticed that we can *never* give the insurance to recover a data
> folder on which an error has happened in the middle of a pg_rewind.  The
> reason for that is quite simple: even if the truncation has been moved
> down to the moment where the first chunk of a file is received, you may
> have already done work on some relation files.  Particularly, some of
> them may have been truncated down to a given size without a new range of
> blocks fetched from the source.  So the data folder would be in an
> inconsistent state if trying to rewind it again.

Yes, we certainly cannot guarantee that failure partway through pg_rewind
leaves a consistent state of the target data directory.  It is likely
worth pointing that out in the documentation.  Whether we can or should
do anything about it is a different question.

When I first started looking at this thread, I wondered if maybe somebody
had had in mind to create an active defense against starting a postmaster
in an inconsistent target cluster, by dint of intentionally truncating
pg_control before the transfer starts and not making it valid again till
the very end.  It's now clear from looking at the code that that's not
what's going on :-(.  But I wonder how hard it would be to make it so,
and whether that'd be worth doing if it's not too hard.

Actually, probably a safer way to attack that would be to remove or
rename the topmost PG_VERSION file, and then put it back afterwards.
That'd be far easier to recover from manually, if need be, than
clobbering pg_control.

In any case, that seems separate from the question of what to do with
read-only files in the data directory.  Should we push forward with
committing Michael's previous patch, and leave that issue for later?

            regards, tom lane


pgsql-bugs by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: BUG #15142: ERROR: MultiXactId nnnnn has not been created yet -- apparent wraparound in v9.5
Next
From: PG Bug reporting form
Date:
Subject: BUG #15143: Window Functions – Paranthese not allowed before OVER term