Re: [BUG] pg_basebackup from disconnected standby fails - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [BUG] pg_basebackup from disconnected standby fails
Date
Msg-id 20161024.124802.173206562.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [BUG] pg_basebackup from disconnected standby fails  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [BUG] pg_basebackup from disconnected standby fails  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [BUG] pg_basebackup from disconnected standby fails  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Thank you for looking and retelling this.

At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaRdyfCqcg9x-E99XCxdC4LW0zygP5qLsPhFsb6BYDKow@mail.gmail.com>
> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> >> So, if I understand correctly, then we can mark the version posted by
> >> you upthread [1] which includes a test along with Kyotaro's fix can be
> >> marked as Ready for committer.  If so, then please change the status
> >> of patch accordingly.
> >
> > Patch moved to next CF 2016-11, still with status "ready for committer".
> >
> > IMO, this thread has reached as conclusion to use this patch to
> > address the problem reported:
> > CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com
> 
> I spent some time reading this thread today and trying to understand
> exactly what's going on here.  Here's my attempt to restate the
> problem:
> 
> 1. If a restartpoint flushes no dirty buffers, then the redo location
> advances but the minimum recovery point does not; therefore, the
> minimum recovery point can fall behind the redo location.  That's
> correct behavior, because if the standby is subsequently restarted, it
> can correctly begin at the checkpoint record associated with the
> restartpoint and need not replay any further.

Yes.

> 2. However, it causes a problem when a base backup with the "include
> WAL" option is taken from a standby because -- on standby servers only
> -- the end location for a backup as returned by do_pg_stop_backup() is
> the minimum recovery point.  If this precedes the start point for the
> backup -- which is the restart point -- perform_base_backup() will
> conclude that no WAL files are required for the backup and fail an
> internal sanity check.

Yes. The option implies that the copied $PGDATA is a standalone
backup, that is, usable to start a standby instantly so at least
one WAL segment is needed, as mentioned in 3 below.

> 3. Removing the sanity check wouldn't help, because it can never be
> right to end up with zero WAL files as a result of a base backup.  At
> a minimum, we need the WAL file which contains the checkpoint record
> from which the control file specifies that redo should start.
> Actually, I think that checkpoint record could be spread across
> multiple files: it might be big.  We need them all, but the current
> code won't ensure that we get them.

Yes. The "checkpoit records" means that records during a checkpoint.

> The consensus solution on this thread seems to be that we should have
> pg_do_stop_backup() return the last-replayed XLOG location as the
> backup end point.  If the control file has been updated with a newer
> redo location, then the associated checkpoint record has surely been
> replayed, so we'll definitely have enough WAL to replay that
> checkpoint record (and we don't need to replay anything else, because
> we're supposing that this is the case where the minimum recovery point
> precedes the redo location).  Although this will work, it might
> include more WAL in the backup than is strictly necessary.  If the
> additional WAL replayed beyond the minimum recovery point does NOT
> include a checkpoint, we don't need any of it; if it does, we need
> only the portion up to and including the last checkpoint record, and
> not anything beyond that.

StartupXLOG mandates the existence of the start record of the
latest checkpoint. pg_start_backup() returns the start point
(redo location) of the restartpoint that it requested and
minRecoveryPoint is always after the replpayed checkpoint so it
works if always was going well.

But if the checkpoint contains no update, the restart point
exceeds minRecoveryPoint. Then no WAL copied.

> I can think of two solutions that would be "tighter":
> 
> 1. When performing a restartpoint, update the minimum recovery point
> to just beyond the checkpoint record.  I think this can't hurt anyone
> who is actually restarting recovery on the same machine, because
> that's exactly the point where they are going to start recovery.  A
> minimum recovery point which precedes the point at which they will
> start recovery is no better than one which is equal to the point at
> which they will start recovery.

This is a candidate that I thought of. But I avoided to change
the behavior of minRecoveryPoint that (seems to me that) it
describes only buffer state. So checkpoint with no update doesn't
advances minRecoveryPoint as my understanding.

> 2. In perform_base_backup(), if the endptr returned by
> do_pg_stop_backup() precedes the end of the checkpoint record returned
> by do_pg_start_backup(), use the latter value instead.  Unfortunately,
> that's not so easy: we can't just say if (endptr < starptr) startptr =
> endptr; because startptr is the *start* of the checkpoint record, not
> the end.  I suspect somebody could figure out a solution to this
> problem, though.

Yes, and the reason I rejected it was that it is not logical,
maybe the same meaning to it would cause another problem later.

> If we decide we don't want to aim for one of these tighter solutions
> and just adopt the previously-discussed patch, then at the very least
> it needs better comments.  The minimum comment change the patch makes
> right now doesn't give any explanation of why the new algorithm is
> better than the old, and that's not cool.  Without that, the next
> person who comes along to maintain this code won't have any easy way
> of knowing that we grappled with this problem and what our conclusions
> were.

If I minunderstand the meaning/function of minRecoveryPoint or we
are allowed to change the behavior/definition of it, using it can
reduce the amount of copied WAL for some casees. (Though no
concrete example is coming to my mind now.)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





pgsql-hackers by date:

Previous
From: "Samuel D. Leslie"
Date:
Subject: Add radiustimeout parameter for RADIUS HBA
Next
From: Michael Paquier
Date:
Subject: Re: FSM corruption leading to errors