Re: Sync Rep v17 - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Sync Rep v17
Date
Msg-id AANLkTikqSZZSU7xn-mpQPymWr7zoO=8jsjjeQZqrxebV@mail.gmail.com
Whole thread Raw
In response to Re: Sync Rep v17  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Sync Rep v17
Re: Sync Rep v17
List pgsql-hackers
On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The WALSender deliberately does *not* wake waiting users if the standby
> disconnects. Doing so would break the whole reason for having sync rep
> in the first place. What we do is allow a potential standby to takeover
> the role of sync standby, if one is available. Or the failing standby
> can reconnect and then release waiters.

If there is potential standby when synchronous standby has gone, I agree
that it's not good idea to release the waiting backends soon. In this case,
those backends should wait for next synchronous standby.

On the other hand, if there is no potential standby, I think that the waiting
backends should not wait for the timeout and should wake up as soon as
synchronous standby has gone. Otherwise, those backends suspend for
a long time (i.e., until the timeout expires), which would decrease the
high-availability, I'm afraid.

Keeping those backends waiting for the failed standby to reconnect is an
idea. But this looks like the behavior for "allow_standalone_primary = off".
If allow_standalone_primary = on, it looks more natural to make the
primary work alone without waiting the timeout.

> If we shutdown, then we wait for the shutdown commit record to be
> transferred to our standby, so a normal or fast shutdown of the master
> always leaves all connected standbys up to date. We already do that, so
> sync rep doesn't touch that behaviour. If a standby is disconnected,
> then it doesn't receive the shutdown checkpoint record.
>
> The wait state for a commit does not persist when we shutdown and
> restart.
>
> Can you restate which bits of the above you think need to be changed?

What I'm thinking is: when the waiting backends are released because
of the timeout while the fast shutdown is being done in the master,
those backends should not return the success indication to the client.
Of course, in that case, WAL has already been flushed in the master,
but I think that those backends should exit with FATAL error before
returning the success. This is for avoiding breaking the synchronous
replication rule, i.e., all the transaction which the client knows as
committed must be committed in the synchronous standby after failover.

If we allow those backends to return the success in that situation, the
following scenario which can cause a data loss can happen.

1. The primary is running with allow_standalone_primary = on. There   is only one (synchronous) standby connected.
2. The replication connection is closed because of the network outage.
3. While some backends are waiting for replication, the user requests   fast shutdown in the master.
4. Since the timeout expires, those backends stop waiting and return   the success indication to the client (but not
replicatedto the standby).
 
5. Since there is no backend waiting for replication, fast shutdown   completes.
6. The clusterware like pacemaker detects the death of the primary   and triggers the failover.
7. New primary doesn't have some transactions committed to the   client, i.e., transaction lost happens!!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Sync Rep v17
Next
From: Fujii Masao
Date:
Subject: Re: Sync Rep v17