Michael Paquier <michael@paquier.xyz> writes:
> So after that I falled back to your patch and began testing it, which is
> where I noticed that we can *never* give the insurance to recover a data
> folder on which an error has happened in the middle of a pg_rewind. The
> reason for that is quite simple: even if the truncation has been moved
> down to the moment where the first chunk of a file is received, you may
> have already done work on some relation files. Particularly, some of
> them may have been truncated down to a given size without a new range of
> blocks fetched from the source. So the data folder would be in an
> inconsistent state if trying to rewind it again.
Yes, we certainly cannot guarantee that failure partway through pg_rewind
leaves a consistent state of the target data directory. It is likely
worth pointing that out in the documentation. Whether we can or should
do anything about it is a different question.
When I first started looking at this thread, I wondered if maybe somebody
had had in mind to create an active defense against starting a postmaster
in an inconsistent target cluster, by dint of intentionally truncating
pg_control before the transfer starts and not making it valid again till
the very end. It's now clear from looking at the code that that's not
what's going on :-(. But I wonder how hard it would be to make it so,
and whether that'd be worth doing if it's not too hard.
Actually, probably a safer way to attack that would be to remove or
rename the topmost PG_VERSION file, and then put it back afterwards.
That'd be far easier to recover from manually, if need be, than
clobbering pg_control.
In any case, that seems separate from the question of what to do with
read-only files in the data directory. Should we push forward with
committing Michael's previous patch, and leave that issue for later?
regards, tom lane