Re: [BUG] pg_basebackup from disconnected standby fails - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [BUG] pg_basebackup from disconnected standby fails
Date
Msg-id CAA4eK1+bas_xDo3S5R6o6HKVtQu8K87RcHfo2NrAffaRtyEcSg@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] pg_basebackup from disconnected standby fails  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [BUG] pg_basebackup from disconnected standby fails  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Thank you for looking and retelling this.
>
> At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaRdyfCqcg9x-E99XCxdC4LW0zygP5qLsPhFsb6BYDKow@mail.gmail.com>
>> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> >> So, if I understand correctly, then we can mark the version posted by
>> >> you upthread [1] which includes a test along with Kyotaro's fix can be
>> >> marked as Ready for committer.  If so, then please change the status
>> >> of patch accordingly.
>> >
>> > Patch moved to next CF 2016-11, still with status "ready for committer".
>> >
>> > IMO, this thread has reached as conclusion to use this patch to
>> > address the problem reported:
>> > CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com
>>
>> I spent some time reading this thread today and trying to understand
>> exactly what's going on here.  Here's my attempt to restate the
>> problem:
>>
>> 1. If a restartpoint flushes no dirty buffers, then the redo location
>> advances but the minimum recovery point does not; therefore, the
>> minimum recovery point can fall behind the redo location.  That's
>> correct behavior, because if the standby is subsequently restarted, it
>> can correctly begin at the checkpoint record associated with the
>> restartpoint and need not replay any further.
>
> Yes.
>
>> 2. However, it causes a problem when a base backup with the "include
>> WAL" option is taken from a standby because -- on standby servers only
>> -- the end location for a backup as returned by do_pg_stop_backup() is
>> the minimum recovery point.  If this precedes the start point for the
>> backup -- which is the restart point -- perform_base_backup() will
>> conclude that no WAL files are required for the backup and fail an
>> internal sanity check.
>
> Yes. The option implies that the copied $PGDATA is a standalone
> backup, that is, usable to start a standby instantly so at least
> one WAL segment is needed, as mentioned in 3 below.
>
>> 3. Removing the sanity check wouldn't help, because it can never be
>> right to end up with zero WAL files as a result of a base backup.  At
>> a minimum, we need the WAL file which contains the checkpoint record
>> from which the control file specifies that redo should start.
>> Actually, I think that checkpoint record could be spread across
>> multiple files: it might be big.  We need them all, but the current
>> code won't ensure that we get them.
>
> Yes. The "checkpoit records" means that records during a checkpoint.
>
>> The consensus solution on this thread seems to be that we should have
>> pg_do_stop_backup() return the last-replayed XLOG location as the
>> backup end point.  If the control file has been updated with a newer
>> redo location, then the associated checkpoint record has surely been
>> replayed, so we'll definitely have enough WAL to replay that
>> checkpoint record (and we don't need to replay anything else, because
>> we're supposing that this is the case where the minimum recovery point
>> precedes the redo location).  Although this will work, it might
>> include more WAL in the backup than is strictly necessary.  If the
>> additional WAL replayed beyond the minimum recovery point does NOT
>> include a checkpoint, we don't need any of it; if it does, we need
>> only the portion up to and including the last checkpoint record, and
>> not anything beyond that.
>
> StartupXLOG mandates the existence of the start record of the
> latest checkpoint. pg_start_backup() returns the start point
> (redo location) of the restartpoint that it requested and
> minRecoveryPoint is always after the replpayed checkpoint so it
> works if always was going well.
>
> But if the checkpoint contains no update, the restart point
> exceeds minRecoveryPoint. Then no WAL copied.
>
>> I can think of two solutions that would be "tighter":
>>
>> 1. When performing a restartpoint, update the minimum recovery point
>> to just beyond the checkpoint record.  I think this can't hurt anyone
>> who is actually restarting recovery on the same machine, because
>> that's exactly the point where they are going to start recovery.  A
>> minimum recovery point which precedes the point at which they will
>> start recovery is no better than one which is equal to the point at
>> which they will start recovery.
>
> This is a candidate that I thought of. But I avoided to change
> the behavior of minRecoveryPoint that (seems to me that) it
> describes only buffer state. So checkpoint with no update doesn't
> advances minRecoveryPoint as my understanding.
>

I think what you are saying is not completely right, because we do
update minRecoveryPoint when we don't perform a new restart point.
When we perform restart point, then it assumes that flushing the
buffers will take care of updating minRecoveryPoint.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pg_basebackup stream xlog to tar
Next
From: Michael Paquier
Date:
Subject: Re: pg_basebackup stream xlog to tar