Re: patch proposal - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: patch proposal
Date
Msg-id 20160826023008.GH4028@tamriel.snowman.net
Whole thread Raw
In response to Re: patch proposal  (Venkata B Nagothi <nag1010@gmail.com>)
Responses Re: patch proposal  (Venkata B Nagothi <nag1010@gmail.com>)
List pgsql-hackers
* Venkata B Nagothi (nag1010@gmail.com) wrote:
> On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > I'm not a fan of the "recovery_target" option, particularly as it's only
> > got one value even though it can mean two things (either "immediate" or
> > "not set"), but we need a complete solution before we can consider
> > deprecating it.  Further, we could consider making it an alias for
> > whatever better name we come up with.
>
> The new parameter will accept options : "pause", "shutdown" and "promote"
>
> *"promote"*
>
> This option will ensure database starts up once the "immediate" consistent
> recovery point is reached even if it is well before the mentioned recovery
> target point (XID, Name or time).
> This behaviour will be similar to that of recovery_target="immediate" and
> can be aliased.

I don't believe we're really going at this the right way.  Clearly,
there will be cases where we'd like promotion at the end of the WAL
stream (as we currently have) even if the recovery point is not found,
but if the new option's "promote" is the same as "immediate" then we
don't have that.

We need to break this down into all the different possible combinations
and then come up with names for the options to define them.  I don't
believe a single option is going to be able to cover all of the cases.

The cases which I'm considering are:

recovery target is immediate (as soon as we have consistency)
recovery target is a set point (name, xid, time, whatever)

action to take if recovery target is found
action to take if recovery target is not found

Generally, "action" is one of "promote", "pause", or "shutdown".
Clearly, not all actions are valid for all recovery target cases- in
particular, "immediate" with "recovery target not found" can not support
the "promote" or "pause" options.  Otherwise, we can support:

Recovery Target  |  Found  |  Action
-----------------|---------|----------
immediate        |  Yes    | promote
immediate        |  Yes    | pause
immediate        |  Yes    | shutdown

immediate        |  No     | shutdown

name/xid/time    |  Yes    | promote
name/xid/time    |  Yes    | pause
name/xid/time    |  Yes    | shutdown

name/xid/time    |  No     | promote
name/xid/time    |  No     | pause
name/xid/time    |  No     | shutdown

We could clearly support this with these options:

recovery_target = immediate, other
recovery_action_target_found = promote, pause, shutdown
recovery_action_target_not_found = promote, pause, shutdown

One question to ask is if we need to support an option for xid and time
related to when we realize that we won't find the recovery target.  If
consistency is reached at a time which is later than the recovery target
for time, what then?  Do we go through the rest of the WAL and perform
the "not found" action at the end of the WAL stream?  If we use that
approach, then at least all of the recovery target types are handled the
same, but I can definitely see cases where an administrator might prefer
an "error" option.

I'd suggest we attempt to support that also.

Thanks!

Stephen

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Next
From: "Tsunakawa, Takayuki"
Date:
Subject: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly