Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server
Date
Msg-id CAB7nPqSkcXT31YWbfJHikZFsT30MmYGJt045TewQLa1k=ynykw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, Aug 5, 2017 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Aug 5, 2017 at 9:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> If we apply these patches to 9.6, then pg_stop_backup() on a standby
>> will start writing backup history files and removing no-longer-needed
>> backup history files.  That's a clear behavior change, and it isn't a
>> bug fix.  Making the waitforarchive option work is a bug fix, but I'm
>> not convinced we should take it as a license to change other aspects
>> of the behavior in a point release.
>
> After refreshing my memory further, I take it back.  pg_stop_backup()
> doesn't even have a second argument on v9.6, so back-porting this fix
> to 9.6 is a meaningless thing; there's nothing to fix.  What the
> patches propose to do there instead is adopt the behavior proposed for
> v10 when pg_stop_backup()'s second argument is true as the
> unconditional behavior in v9.6.  For that to be the right thing to do,
> we have to decide that the current v9.6 behavior is a bug.  But I
> (still) think that's very arguable, because the whole point is that
> we've just finished adding a flag in v10 to disable on the master the
> *very same behavior* we're proposing to mandate on the standby in
> v9.6.  How can we say that v9.6's current behavior is a bug when v10
> produces the same behavior with wait_for_archive = false?  That just
> doesn't make any sense.

Because default values should be safe in the backup and restore area,
and wait_for_archive = false is not the default. I would like to point
out that the 9.6 behavior has been discussed as being a bug upthread
for 9.6 by three people (David, Sawada-san and I) as there is a real
risk to take inconsistent backups from standbys (a WAL segment may not
be archived when pg_stop_backup reports back, so the user may not have
all the WAL it would need to get back to a consistent state), and that
the default should be to get consistent and safe backups.

> I've pushed a minimal fix for v10 which should address Sawada-san's
> original complaint: now, if you say wait_for_archive = true on a
> standby, you'll get a warning if it didn't wait, or if archive_mode =
> always, it will wait.

Backup history files are around mainly for debugging purposes. While I
don't mind about the choice to not generate them on back-branches, the
inconsistency between primary and standbys in generating them is
really disturbing. We could have taken the occasion to address that
here as this is not invasive, but well... I do complain a lot about
keeping changes going to v10 non-invasive if possible. So no real
complain to do what's been done.

> I think the right thing to do about 9.6 is
> document the behavior; there's no problem here that a user can't work
> around by doing it right.

There are many ways for users to do it wrong in this area, that I am
of the opinion to give them safe defaults if we have a way to make
things work safe in the backend. And here we are talking about extra
checks to make sure that a WAL segment is correctly archived... I have
seen bugs lately in custom backup code which led to inconsistent
backups.
-- 
Michael



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: [HACKERS] git.postgresql.org (and other services) down
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server