Re: [BUG] pg_basebackup from disconnected standby fails - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [BUG] pg_basebackup from disconnected standby fails
Date
Msg-id CAA4eK1LV4uYLhmhhJP-j0DRE+h+hRST0sFHBu5KuD6HDap+-fA@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] pg_basebackup from disconnected standby fails  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [BUG] pg_basebackup from disconnected standby fails  (Michael Paquier <michael.paquier@gmail.com>)
Re: [BUG] pg_basebackup from disconnected standby fails  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Oct 21, 2016 at 10:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> So, if I understand correctly, then we can mark the version posted by
>>> you upthread [1] which includes a test along with Kyotaro's fix can be
>>> marked as Ready for committer.  If so, then please change the status
>>> of patch accordingly.
>>
>> Patch moved to next CF 2016-11, still with status "ready for committer".
>>
>> IMO, this thread has reached as conclusion to use this patch to
>> address the problem reported:
>> CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com
>
> I spent some time reading this thread today and trying to understand
> exactly what's going on here.  Here's my attempt to restate the
> problem:
>
> 1. If a restartpoint flushes no dirty buffers, then the redo location
> advances but the minimum recovery point does not; therefore, the
> minimum recovery point can fall behind the redo location.  That's
> correct behavior, because if the standby is subsequently restarted, it
> can correctly begin at the checkpoint record associated with the
> restartpoint and need not replay any further.
>
> 2. However, it causes a problem when a base backup with the "include
> WAL" option is taken from a standby because -- on standby servers only
> -- the end location for a backup as returned by do_pg_stop_backup() is
> the minimum recovery point.  If this precedes the start point for the
> backup -- which is the restart point -- perform_base_backup() will
> conclude that no WAL files are required for the backup and fail an
> internal sanity check.
>
> 3. Removing the sanity check wouldn't help, because it can never be
> right to end up with zero WAL files as a result of a base backup.  At
> a minimum, we need the WAL file which contains the checkpoint record
> from which the control file specifies that redo should start.
> Actually, I think that checkpoint record could be spread across
> multiple files: it might be big.  We need them all, but the current
> code won't ensure that we get them.
>
> The consensus solution on this thread seems to be that we should have
> pg_do_stop_backup() return the last-replayed XLOG location as the
> backup end point.  If the control file has been updated with a newer
> redo location, then the associated checkpoint record has surely been
> replayed, so we'll definitely have enough WAL to replay that
> checkpoint record (and we don't need to replay anything else, because
> we're supposing that this is the case where the minimum recovery point
> precedes the redo location).  Although this will work, it might
> include more WAL in the backup than is strictly necessary.  If the
> additional WAL replayed beyond the minimum recovery point does NOT
> include a checkpoint, we don't need any of it; if it does, we need
> only the portion up to and including the last checkpoint record, and
> not anything beyond that.
>

You are right that it will include additional WAL than strictly
necessary, but that can happen today as well because minrecoverypoint
could be updated after you have established restart point for
do_pg_start_backup().  Do you want to say that even without patch in
some cases we are including additional WAL in backup than what is
strictly necessary, so it is better to improve the situation?

> I can think of two solutions that would be "tighter":
>
> 1. When performing a restartpoint, update the minimum recovery point
> to just beyond the checkpoint record.  I think this can't hurt anyone
> who is actually restarting recovery on the same machine, because
> that's exactly the point where they are going to start recovery.  A
> minimum recovery point which precedes the point at which they will
> start recovery is no better than one which is equal to the point at
> which they will start recovery.
>

I think this will work but will cause an unnecessary control file
update for the cases when buffer flush will anyway do it.  It can work
if we try to do only when required (minrecoverypoint is lesser than
lastcheckpoint) after flush of buffers.

> 2. In perform_base_backup(), if the endptr returned by
> do_pg_stop_backup() precedes the end of the checkpoint record returned
> by do_pg_start_backup(), use the latter value instead.  Unfortunately,
> that's not so easy: we can't just say if (endptr < starptr) startptr =
> endptr; because startptr is the *start* of the checkpoint record, not
> the end.  I suspect somebody could figure out a solution to this
> problem, though.
>

With this approach, don't we need something similar for API's
pg_stop_backup()/pg_stop_backup_v2()?


> If we decide we don't want to aim for one of these tighter solutions
> and just adopt the previously-discussed patch, then at the very least
> it needs better comments.
>

+1.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: FSM corruption leading to errors
Next
From: Michael Paquier
Date:
Subject: Re: pg_basebackup stream xlog to tar