On Mon, Oct 16, 2023 at 12:25:59PM -0400, Robert Haas wrote:
> On Mon, Oct 16, 2023 at 11:45 AM David Steele <david@pgmasters.net> wrote:
> > If you start from the last checkpoint (which is what will generally be
> > stored in pg_control) then the effect is pretty similar.
>
> If the backup didn't span a checkpoint, then restoring from the one in
> pg_control actually works fine. Not that I'm encouraging that. But if
> you replay WAL from the control file, you at least get the last
> checkpoint's worth of WAL; if you use pg_resetwal, you get nothing.
There's no guarantee that the backend didn't spawn an extra checkpoint
while a base backup was taken, either, because we don't fail hard at
the end of the BASE_BACKUP code paths if the redo LSN has been updated
since the beginning of a BASE_BACKUP. So that's really *never* do it
except if you like silent corruptions.
> I don't really want to get hung up on this though. My main point here
> is that I have trouble believing that an error after you've already
> screwed up your backup helps much. I think what we need is to make it
> less likely that you will screw up your backup in the first place.
Yeah.. Now what's the best user experience? Is it better for a base
backup to fail and have a user retry? Or is it better to have the
backend-side backup logic do what we think is safer? The former
(likely with a REDO check or similar), will likely never work on large
instances, while users will likely always find ways to screw up base
backups taken by latter methods. A third approach is to put more
careful checks at restore time, and the latter helps a lot here.
--
Michael