Re: The way to know whether the standby has caught up with the master - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: The way to know whether the standby has caught up with the master
Date
Msg-id BANLkTinLjpN0dk6XrxD5Nns0=Hsch9iT4A@mail.gmail.com
Whole thread Raw
In response to Re: The way to know whether the standby has caught up with the master  (Jaime Casanova <jaime@2ndquadrant.com>)
List pgsql-hackers
On Wed, May 25, 2011 at 3:11 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
> On Wed, May 25, 2011 at 12:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
>>> By the time the standby has received that message, it might not be caught-up
>>> anymore because new WAL might've been generated in the master already.
>>
>> Right. But, thanks to sync rep, until such a new WAL has been replicated to
>> the standby, the commit of transaction is not visible to the client. So, even if
>> there are some WAL not replicated to the standby, the clusterware can promote
>> the standby safely without any data loss (to the client point of view), I think.
>
> then, you also need to transmit to the standby if it is the current
> sync standby.

Yes. After further thought, we can promote the standby safely only when the
corresponding walsender meets the following conditions:
   1. sync_state is "sync"   2. the standby's flush_location is bigger than or equal to the smallest wait
locationin the sync rep queue. Which guarantees that all the committed       transactions (i.e., their "success"
indicationshave been
 
returned to the       client) have been replicated to the standby.

Once the above conditions get satisfied, the failover is safe until sync_state
is flipped to "async". By using this logic, walsender needs to check whether
failover is safe, and send the message according to the result.

One problem is that, when sync_state is flipped to "async", walsender might
perform replication asynchronously before the standby receives the message
indicating failover is unsafe. In this case, if the master crashes,
the clusterware
would wrongly think that failover is safe and promote the standby despite
which causes data loss.

To solve this problem, walsender would need to send that message
*synchronously*,
i.e., wait for the ACK of the message to arrive from the standby before actually
changing sync_state to "async".

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: SSI predicate locking on heap -- tuple or row?
Next
From: Fujii Masao
Date:
Subject: Re: The way to know whether the standby has caught up with the master