Re: Immediate standby promotion - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Immediate standby promotion
Date
Msg-id CA+U5nMKOsigzvYbe7f7k0k-15Z5RMj3DRhMSko1hNVQBWWGomA@mail.gmail.com
Whole thread Raw
In response to Re: Immediate standby promotion  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Immediate standby promotion  (Michael Paquier <michael.paquier@gmail.com>)
Re: Immediate standby promotion  (Robert Haas <robertmhaas@gmail.com>)
Re: Immediate standby promotion  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 18 September 2014 01:22, Robert Haas <robertmhaas@gmail.com> wrote:

>> "fast" promotion was actually a supported option in r8 of Postgres but
>> this option was removed when we implemented streaming replication in
>> r9.0
>>
>> The *rough* requirement is sane, but that's not the same thing as
>> saying this exact patch makes sense.
>
> Granted.  Fair point.
>
>> If you are paused and you can see that WAL up ahead is damaged, then
>> YES, you do want to avoid applying it. That is possible by setting a
>> PITR target so that recovery stops at a precise location specified by
>> you. As an existing option is it better than the blunt force trauma
>> suggested here.
>
> You can pause at a recovery target, but then what if you want to go
> read/write at that point?  Or what if you've got a time-delayed
> standby and you want to break replication so that it doesn't replay
> the DROP TABLE students that somebody ran on the master?  It doesn't
> have to be that WAL is unreadable or corrupt; it's enough for it to
> contain changes you wish to avoid replaying.
>
>> If you really don't care, just shutdown server, resetxlog and start
>> her up - again, no need for new option.
>
> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
> seems like a friendlier interface than making somebody shut down the
> server, run pg_resetxlog, and start it up again.

It makes sense to go from paused --> promoted.

It doesn't make sense to go from normal running --> promoted, since
that is just random data loss. I very much understand the case where
somebody is shouting "get the web site up, we are losing business".
Implementing a feature that allows people to do exactly what they
asked (go live now), but loses business transactions that we thought
had been safely recorded is not good. It implements only the exact
request, not its actual intention.

Any feature that lumps both cases together is wrongly designed and
will cause data loss.

We go to a lot of trouble to ensure data is successfully on disk and
in WAL. I won't give that up, nor do I want to make it easier to lose
data than it already is.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: identify_locking_dependencies is broken for schema-only dumps
Next
From: Alvaro Herrera
Date:
Subject: Re: missing isinf declaration on solaris