Home > mailing lists

Re: The way to know whether the standby has caught up with the master - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: The way to know whether the standby has caught up with the master
Date	May 26, 2011 08:29:45
Msg-id	BANLkTinLjpN0dk6XrxD5Nns0=Hsch9iT4A@mail.gmail.com Whole thread Raw
In response to	Re: The way to know whether the standby has caught up with the master (Jaime Casanova <jaime@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Wed, May 25, 2011 at 3:11 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
> On Wed, May 25, 2011 at 12:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
>>> By the time the standby has received that message, it might not be caught-up
>>> anymore because new WAL might've been generated in the master already.
>>
>> Right. But, thanks to sync rep, until such a new WAL has been replicated to
>> the standby, the commit of transaction is not visible to the client. So, even if
>> there are some WAL not replicated to the standby, the clusterware can promote
>> the standby safely without any data loss (to the client point of view), I think.
>
> then, you also need to transmit to the standby if it is the current
> sync standby.

Yes. After further thought, we can promote the standby safely only when the
corresponding walsender meets the following conditions:
   1. sync_state is "sync"   2. the standby's flush_location is bigger than or equal to the smallest wait
locationin the sync rep queue. Which guarantees that all the committed       transactions (i.e., their "success"
indicationshave been

returned to the       client) have been replicated to the standby.

Once the above conditions get satisfied, the failover is safe until sync_state
is flipped to "async". By using this logic, walsender needs to check whether
failover is safe, and send the message according to the result.

One problem is that, when sync_state is flipped to "async", walsender might
perform replication asynchronously before the standby receives the message
indicating failover is unsafe. In this case, if the master crashes,
the clusterware
would wrongly think that failover is safe and promote the standby despite
which causes data loss.

To solve this problem, walsender would need to send that message
*synchronously*,
i.e., wait for the ACK of the message to arrive from the standby before actually
changing sync_state to "async".

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 26 May 2011, 08:07:06
Subject: Re: SSI predicate locking on heap -- tuple or row?

From: Fujii Masao
Date: 26 May 2011, 09:09:56
Subject: Re: The way to know whether the standby has caught up with the master

Re: The way to know whether the standby has caught up with the master - Mailing list pgsql-hackers

Previous

Next