On Mon, Oct 16, 2023 at 11:45 AM David Steele <david@pgmasters.net> wrote:
> Hmmm, the reason to back patch this is that it would fix [1], which sure
> looks like a problem to me even if it is not a "bug". We can certainly
> require backup software to retry pg_control until the checksum is valid
> but that seems like a pretty big ask, even considering how complicated
> backup is.
That seems like a problem with pg_control not being written atomically
when the standby server is updating it during recovery, rather than a
problem with backup_label not being used at the start of recovery.
Unless I'm confused.
> If you start from the last checkpoint (which is what will generally be
> stored in pg_control) then the effect is pretty similar.
If the backup didn't span a checkpoint, then restoring from the one in
pg_control actually works fine. Not that I'm encouraging that. But if
you replay WAL from the control file, you at least get the last
checkpoint's worth of WAL; if you use pg_resetwal, you get nothing.
I don't really want to get hung up on this though. My main point here
is that I have trouble believing that an error after you've already
screwed up your backup helps much. I think what we need is to make it
less likely that you will screw up your backup in the first place.
> Right now the user can remove backup_label and get a "successful"
> restore and not realize that they have just corrupted their cluster.
> This is independent of the backup/restore tool doing all the right things.
I don't think it's independent of that at all.
--
Robert Haas
EDB: http://www.enterprisedb.com