Re: patch proposal - Mailing list pgsql-hackers

From Venkata B Nagothi
Subject Re: patch proposal
Date
Msg-id CAEyp7J-dcGAqpxtDfFzV+JjKu96BiUEwGLf1o_1gSAQZy6kJNA@mail.gmail.com
Whole thread Raw
In response to Re: patch proposal  (Venkata B Nagothi <nag1010@gmail.com>)
Responses Re: patch proposal  (Venkata B Nagothi <nag1010@gmail.com>)
List pgsql-hackers

Attached is the patch which introduces the new parameter "recovery_target_incomplete" -

Currently this patch addresses two scenarios -

Scenario 1 :

Provide options to DBA when the recovery target is not reached and has stopped mid-way due to unavailability of WALs

When performing recovery to a specific recovery target, "recovery_target_incomplete" parameter can be used to either promote, pause or shutdown when recovery does not reach the specified recovery target and reaches end-of-the-wal.

Scenario 2 :

Generates Errors, Hints when the specified recovery target is prior to the backup's current position (xid, time and lsn). This behaviour is integrated with the parameters "recovery_target_time","recovery_target_lsn" and "recovery_target_xid".

I would like to know if the approach this patch incorporates sounds ok ?

My other comments are inline 

On Mon, Aug 29, 2016 at 3:17 PM, Venkata B Nagothi <nag1010@gmail.com> wrote:

On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Venkata B Nagothi (nag1010@gmail.com) wrote:
> On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Venkata B Nagothi (nag1010@gmail.com) wrote:
> > > This behaviour will be similar to that of recovery_target="immediate" and
> > > can be aliased.
> >
> > I don't believe we're really going at this the right way.  Clearly,
> > there will be cases where we'd like promotion at the end of the WAL
> > stream (as we currently have) even if the recovery point is not found,
> > but if the new option's "promote" is the same as "immediate" then we
> > don't have that.
>
> My apologies for the confusion. Correction - I meant, "promote" option
> would promote the database once the consistent point is reached at the
> end-of-the-WAL.

"consistent point" and "end-of-the-WAL" are not the same.

> > recovery_target = immediate, other
> >
> > recovery_action_target_found = promote, pause, shutdown
>
> This is currently supported by the existing parameter called
> "recovery_target_action"

Right, sure, we could possibly use that, or create an alias, etc.

> > recovery_action_target_not_found = promote, pause, shutdown
>
> This is exactly what newly proposed parameter will do.

Then it isn't the same as 'recovery_target = immediate'.

> > One question to ask is if we need to support an option for xid and time
> > related to when we realize that we won't find the recovery target.  If
> > consistency is reached at a time which is later than the recovery target
> > for time, what then?  Do we go through the rest of the WAL and perform
> > the "not found" action at the end of the WAL stream?  If we use that
> > approach, then at least all of the recovery target types are handled the
> > same, but I can definitely see cases where an administrator might prefer
> > an "error" option.
>
> Currently, this situation is handled by recovery_target_inclusive
> parameter.

No, that's not the same.

> Administrator can choose to stop the recovery at any consistent
> point before or after the specified recovery target. Is this what you were
> referring to ?

Not quite.

> I might need a bit of clarification here, the parameter i am proposing
> would be effective only if the specified recovery target is not reached and
> may not be effective if the recovery goes beyond recovery target point.

Ok, let's consider this scenario:

Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08
Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
recovery_target is not set
recovery_target_time = 2016-08-26 08:29:30
recovery_target_inclusive = false

In such a case, we will necessairly commit transactions which happened
between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
reached consistency.  We could possibly detect that case and throw an
error instead.  The same could happen with xid.

Yes, currently, PG just recovers until the consistency is reached. It makes sense to throw an error instead.

This has not been addressed yet. Currently, the patch only considers generating an error if the recovery target position is prior the current backup's position and this is irrespective of weather backup_label is present or not.
I will share my updates on how this can be addressed.
 
 
Working through more scenarios would be useful, I believe.  For example,
what if the xid or time specified happened before the backup started?

Basically, what I was trying to get at is that we might be able to
detect that we'll never find a given recovery target point without
actually replaying all of WAL and wondering if we should provide options
to control what to do in such a case.

Ah, Yes, I got the point and I agree. Thanks for the clarification. 

Yes, good idea and it is important to ensure PG errors out and warn the administrator with appropriate message indicating that... "a much earlier backup must be used..." 
if the specified recovery target xid, name or time is earlier than the backup time.

This scenario has been addressed in this patch.


I have a question regarding the following comment -

        /*
         * Override any inconsistent requests. Not that this is a change of
         * behaviour in 9.5; prior to this we simply ignored a request to pause if
         * hot_standby = off, which was surprising behaviour.
         */
        if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE &&
                recoveryTargetActionSet &&
                !EnableHotStandby)
                recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

Because of which, option "pause" is ignored and the database is shutdown instead.

The "pause" request is ignored for recovery_target_action parameter and that is because of the following lines in recoveryPausesHere() function ? am i getting it right ?

        /* Don't pause unless users can connect! */
        if (!LocalHotStandbyActive)
                return;

Not sure why.

The parameter i am introducing faced the similar problem while testing and i have introduced a new function called IncompleteRecoveryPause() without the above condition which will pause the recovery at end-of-the-wal. Hope that is OK ?
If this is ok, i can change the function name and use the similar function for recovery_target_action='pause' as well.

Please note that documentation is still pending from my end.

This is my first patch to the community and i am very new the code base. Please let me know if this patch needs any changes.

Regards,

Venkata B N

Database Consultant 
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Re: Patch to implement pg_current_logfile() function
Next
From: "Karl O. Pinc"
Date:
Subject: Re: Patch to implement pg_current_logfile() function