Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server
Date
Msg-id CA+TgmoarDvYe-v+mmLEtZH5wFKu_GymzcRZacMf5pU2WVncq8w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Sat, Aug 5, 2017 at 12:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Because default values should be safe in the backup and restore area,
> and wait_for_archive = false is not the default.

Neither is archive_mode = always, without which wait_for_archive =
true doesn't actually wait.

> I would like to point
> out that the 9.6 behavior has been discussed as being a bug upthread
> for 9.6 by three people (David, Sawada-san and I) as there is a real
> risk to take inconsistent backups from standbys (a WAL segment may not
> be archived when pg_stop_backup reports back, so the user may not have
> all the WAL it would need to get back to a consistent state), and that
> the default should be to get consistent and safe backups.

Sure, but that only happens if you haven't archived the very next WAL
segment yet, which in many environments isn't going to be a problem.
Furthermore, there's also the opposite danger of somebody's backup
script hanging where it currently doesn't.  I think it's just wrong to
say that without this change you don't get consistent backups.  It
just means you have an additional step to do to make sure that you
have all the WAL files you need - which is also true *with* the patch,
because you still have to make sure a WAL rotation happens on the
master.

Besides, if not waiting is so bad, then what about the fact that 9.5,
9.4, 9.3, and 9.2 have the exact same code to not wait, and you're not
proposing to change that?  What about the fact waiting isn't even
well-defined unless standbys support archiving?

> Backup history files are around mainly for debugging purposes. While I
> don't mind about the choice to not generate them on back-branches, the
> inconsistency between primary and standbys in generating them is
> really disturbing. We could have taken the occasion to address that
> here as this is not invasive, but well... I do complain a lot about
> keeping changes going to v10 non-invasive if possible. So no real
> complain to do what's been done.

I'm sorry you're disturbed, but I think that's clearly not a separate
change and not a bug fix.  The current behavior is well-documented,
including in places your patch didn't change, like the pg_basebackup
documentation.

>> I think the right thing to do about 9.6 is
>> document the behavior; there's no problem here that a user can't work
>> around by doing it right.
>
> There are many ways for users to do it wrong in this area, that I am
> of the opinion to give them safe defaults if we have a way to make
> things work safe in the backend. And here we are talking about extra
> checks to make sure that a WAL segment is correctly archived... I have
> seen bugs lately in custom backup code which led to inconsistent
> backups.

I agree that there are many ways for users to do it wrong, and I do
not think that changing critical behavior in minor releases will
result in a reduction in the number of ways for users to do it wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server
Next
From: Joe Conway
Date:
Subject: Re: [HACKERS] git.postgresql.org (and other services) down