Re: Sync Rep v19 - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Sync Rep v19
Date
Msg-id AANLkTimCgX2gysx0TbXtf-d605zz1CFCOahS1MKcCet7@mail.gmail.com
Whole thread Raw
In response to Re: Sync Rep v19  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Sync Rep v19
Re: Sync Rep v19
List pgsql-hackers
On Sun, Mar 6, 2011 at 1:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Sat, 2011-03-05 at 20:08 +0900, Fujii Masao wrote:
>> On Sat, Mar 5, 2011 at 7:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > Yes, that can happen. As people will no doubt observe, this seems to be
>> > an argument for wait-forever. What we actually need is a wait that lasts
>> > longer than it takes for us to decide to failover, if the standby is
>> > actually up and this is some kind of split brain situation. That way the
>> > clients are still waiting when failover occurs. WAL is missing, but
>> > since we didn't acknowledge the client we are OK to treat that situation
>> > as if it were an abort.
>>
>> Oracle Data Guard in the maximum availability mode behaves that way?
>>
>> I'm sure that you are implementing something like the maximum availability
>> mode rather than the maximum protection one. So I'd like to know how
>> the data loss situation I described can be avoided in the maximum availability
>> mode.
>
> It can't. (Oracle or otherwise...)
>
> Once we begin waiting for sync rep, if the transaction or backend ends
> then other backends will be able to see the changed data. The only way
> to prevent that is to shutdown the database to ensure that no readers or
> writers have access to that.
>
> Oracle's protection mechanism is to shutdown the primary if there is no
> sync standby available. Maximum Protection. Any other mode must
> therefore be less than maximum protection, according to Oracle, and me.
> "Available" here means one that has not timed out, via parameter.
>
> Shutting down the main server is cool, as long as you failover to one of
> the standbys. If there aren't any standbys, or you don't have a
> mechanism for switching quickly, you have availability problems.
>
> What shutting down the server doesn't do is keep the data safe for
> transactions that were in their commit-wait phase when the disconnect
> occurs. That data exists, yet will not have been transferred to the
> standby.
>
> >From now, I also say we should wait forever. It is the safest mode and I
> want no argument about whether sync rep is safe or not. We can introduce
> a more relaxed mode later with high availability for the primary. That
> is possible and in some cases desirable.
>
> Now, when we lose last sync standby we have three choices:
>
>  1. reconnect the standby, or wait for a potential standby to catchup
>
>  2. immediate shutdown of master and failover to one of the standbys
>
>  3. reclassify an async standby as a sync standby
>
> More than likely we would attempt to do (1) for a while, then do (2)
>
> This means that when we startup the primary will freeze for a while
> until the sync standbys connect. Which is OK, since they try to
> reconnect every 5 seconds and we don't plan on shutting down the primary
> much anyway.
>
> I've removed the timeout parameter, plus if we begin waiting we wait
> until released, forever if that's how long it takes.
>
> The recommendation to use more than one standby remains.
>
> Fast shutdown will wake backends from their latch and there isn't any
> changed interrupt behaviour any more.
>
> synchronous_standby_names = '*' is no longer the default
>
> On a positive note this is one less parameter and will improve
> performance as well.
>
> All above changes made.
>
> Ready to commit, barring concrete objections to important behaviour.
>
> I will do one final check tomorrow evening then commit.

I agree with this change.

One comment; what about introducing built-in function to wake up all the
waiting backends? When replication connection is closed, if we STONITH
the standby, we can safely (for not physical data loss but logical one)
switch the primary to standalone mode. But there is no way to wake up
the waiting backends for now. Setting synchronous_replication to OFF
and reloading the configuration file doesn't affect the existing waiting
backends. The attached patch introduces the "pg_wakeup_all_waiters"
(better name?) function which wakes up all the backends on the queue.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Sync Rep v19
Next
From: Fujii Masao
Date:
Subject: Re: Sync Rep v19