Greetings,
* David Steele (david@pgmasters.net) wrote:
> On 10/16/23 15:06, Robert Haas wrote:
> > On Mon, Oct 16, 2023 at 1:00 PM David Steele <david@pgmasters.net> wrote:
> > > After some agonizing (we hope) they decide to delete backup_label and,
> > > wow, it just works! So now they merrily go on their way with a corrupted
> > > cluster. They also remember for the next time that deleting backup_label
> > > is definitely a good procedure.
> > >
> > > The idea behind this patch is that deleting backup_label would produce a
> > > hard error because pg_control would be missing as well (if the backup
> > > software did its job). If both pg_control and backup_label are present
> > > (but pg_control has not been loaded with the contents of backup_label,
> > > i.e. it is the running copy from the backup cluster) we can also error.
> >
> > I mean, I think we're just going in circles, here. I did and do
> > understand, but I didn't and don't agree. You're hypothesizing a user
> > who is willing to do ONE thing that they shouldn't do during backup
> > restoration (namely, remove backup_label) but who won't be willing to
> > do a SECOND thing that they shouldn't do during backup restoration
> > (namely, run pg_resetwal).
>
> In my experience the first case is much more likely than the second. Your
> experience may vary.
My experience (though perhaps not a surprise) mirrors David's.
> Anyway, I think they are pretty different. Deleting backup label appears to
> give a perfectly valid restore. Running pg_resetwal is more clearly (I
> think) the nuclear solution.
Right, and a delete of backup_label is just an 'rm' that folks may think
"oh, this is just some leftover thing that isn't actually needed".
OTOH, pg_resetwal has an online documentation page and a man page that's
very clear that it's only to be used as a last resort (perhaps we should
pull that into the --help output too..?). It's also pretty clear that
pg_resetwal is actually changing things about the cluster while nuking
backup_label doesn't *seem* to be in that same category, even though we
all know it is because it's needed once recovery begins.
I'd also put out there that while people don't do restore testing
nearly as much as they should, they tend to at _least_ try to do it once
after taking their first backup and if that fails then they try to figure
out why and what they're not doing right.
Thanks,
Stephen