Thread: The way to know whether the standby has caught up with the master

The way to know whether the standby has caught up with the master

From
Fujii Masao
Date:
Hi,

For reliable high-availability, when the master crashes, the clusterware must
know whether it can promote the standby safely without any data loss,
before actually promoting it. IOW, it must know whether the standby has
already caught up with the primary. Otherwise, failover might cause data loss.
We can know that from pg_stat_replication on the master. But the problem
is that pg_stat_replication is not available since the master is not running at
that moment. So that info should be available also on the standby.

To achieve that, I'm thinking to change walsender so that, when the standby
has caught up with the master, it sends back the message indicating that to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.

Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: The way to know whether the standby has caught up with the master

From
Heikki Linnakangas
Date:
On 25.05.2011 07:42, Fujii Masao wrote:
> For reliable high-availability, when the master crashes, the clusterware must
> know whether it can promote the standby safely without any data loss,
> before actually promoting it. IOW, it must know whether the standby has
> already caught up with the primary. Otherwise, failover might cause data loss.
> We can know that from pg_stat_replication on the master. But the problem
> is that pg_stat_replication is not available since the master is not running at
> that moment. So that info should be available also on the standby.
>
> To achieve that, I'm thinking to change walsender so that, when the standby
> has caught up with the master, it sends back the message indicating that to
> the standby. And I'm thinking to add new function (or view like
> pg_stat_replication)
> available on the standby, which shows that info.

By the time the standby has received that message, it might not be 
caught-up anymore because new WAL might've been generated in the master 
already.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: The way to know whether the standby has caught up with the master

From
Fujii Masao
Date:
On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 25.05.2011 07:42, Fujii Masao wrote:
>>
>> For reliable high-availability, when the master crashes, the clusterware
>> must
>> know whether it can promote the standby safely without any data loss,
>> before actually promoting it. IOW, it must know whether the standby has
>> already caught up with the primary. Otherwise, failover might cause data
>> loss.
>> We can know that from pg_stat_replication on the master. But the problem
>> is that pg_stat_replication is not available since the master is not
>> running at
>> that moment. So that info should be available also on the standby.
>>
>> To achieve that, I'm thinking to change walsender so that, when the
>> standby
>> has caught up with the master, it sends back the message indicating that
>> to
>> the standby. And I'm thinking to add new function (or view like
>> pg_stat_replication)
>> available on the standby, which shows that info.
>
> By the time the standby has received that message, it might not be caught-up
> anymore because new WAL might've been generated in the master already.

Right. But, thanks to sync rep, until such a new WAL has been replicated to
the standby, the commit of transaction is not visible to the client. So, even if
there are some WAL not replicated to the standby, the clusterware can promote
the standby safely without any data loss (to the client point of view), I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: The way to know whether the standby has caught up with the master

From
Jaime Casanova
Date:
On Wed, May 25, 2011 at 12:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
>>>
>>> To achieve that, I'm thinking to change walsender so that, when the
>>> standby
>>> has caught up with the master, it sends back the message indicating that
>>> to
>>> the standby. And I'm thinking to add new function (or view like
>>> pg_stat_replication)
>>> available on the standby, which shows that info.
>>
>> By the time the standby has received that message, it might not be caught-up
>> anymore because new WAL might've been generated in the master already.
>
> Right. But, thanks to sync rep, until such a new WAL has been replicated to
> the standby, the commit of transaction is not visible to the client. So, even if
> there are some WAL not replicated to the standby, the clusterware can promote
> the standby safely without any data loss (to the client point of view), I think.
>

then, you also need to transmit to the standby if it is the current
sync standby.

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL


Re: The way to know whether the standby has caught up with the master

From
Simon Riggs
Date:
On Wed, May 25, 2011 at 6:16 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

>> To achieve that, I'm thinking to change walsender so that, when the
>> standby
>> has caught up with the master, it sends back the message indicating that
>> to
>> the standby. And I'm thinking to add new function (or view like
>> pg_stat_replication)
>> available on the standby, which shows that info.
>
> By the time the standby has received that message, it might not be caught-up
> anymore because new WAL might've been generated in the master already.

AFAICS, this is an exact case of the Byzantine General's problem.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 25.05.2011 07:42, Fujii Masao wrote:
>> To achieve that, I'm thinking to change walsender so that, when the standby
>> has caught up with the master, it sends back the message indicating that to
>> the standby. And I'm thinking to add new function (or view like
>> pg_stat_replication)
>> available on the standby, which shows that info.

> By the time the standby has received that message, it might not be 
> caught-up anymore because new WAL might've been generated in the master 
> already.

Even assuming that you believe this is a useful capability, there is no
need to change walsender.  It *already* sends the current-end-of-WAL in
every message, which indicates precisely whether the message contains
all of available WAL data.
        regards, tom lane


Re: The way to know whether the standby has caught up with the master

From
David Fetter
Date:
On Wed, May 25, 2011 at 12:34:59PM +0100, Simon Riggs wrote:
> On Wed, May 25, 2011 at 6:16 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> 
> >> To achieve that, I'm thinking to change walsender so that, when the
> >> standby
> >> has caught up with the master, it sends back the message indicating that
> >> to
> >> the standby. And I'm thinking to add new function (or view like
> >> pg_stat_replication)
> >> available on the standby, which shows that info.
> >
> > By the time the standby has received that message, it might not be caught-up
> > anymore because new WAL might've been generated in the master already.
> 
> AFAICS, this is an exact case of the Byzantine General's problem.

Have they updated it to acknowledge that the city is now called
Istanbul?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: The way to know whether the standby has caught up with the master

From
Fujii Masao
Date:
On Wed, May 25, 2011 at 3:11 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
> On Wed, May 25, 2011 at 12:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
>>> By the time the standby has received that message, it might not be caught-up
>>> anymore because new WAL might've been generated in the master already.
>>
>> Right. But, thanks to sync rep, until such a new WAL has been replicated to
>> the standby, the commit of transaction is not visible to the client. So, even if
>> there are some WAL not replicated to the standby, the clusterware can promote
>> the standby safely without any data loss (to the client point of view), I think.
>
> then, you also need to transmit to the standby if it is the current
> sync standby.

Yes. After further thought, we can promote the standby safely only when the
corresponding walsender meets the following conditions:
   1. sync_state is "sync"   2. the standby's flush_location is bigger than or equal to the smallest wait
locationin the sync rep queue. Which guarantees that all the committed       transactions (i.e., their "success"
indicationshave been
 
returned to the       client) have been replicated to the standby.

Once the above conditions get satisfied, the failover is safe until sync_state
is flipped to "async". By using this logic, walsender needs to check whether
failover is safe, and send the message according to the result.

One problem is that, when sync_state is flipped to "async", walsender might
perform replication asynchronously before the standby receives the message
indicating failover is unsafe. In this case, if the master crashes,
the clusterware
would wrongly think that failover is safe and promote the standby despite
which causes data loss.

To solve this problem, walsender would need to send that message
*synchronously*,
i.e., wait for the ACK of the message to arrive from the standby before actually
changing sync_state to "async".

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: The way to know whether the standby has caught up with the master

From
Fujii Masao
Date:
On Wed, May 25, 2011 at 11:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> On 25.05.2011 07:42, Fujii Masao wrote:
>>> To achieve that, I'm thinking to change walsender so that, when the standby
>>> has caught up with the master, it sends back the message indicating that to
>>> the standby. And I'm thinking to add new function (or view like
>>> pg_stat_replication)
>>> available on the standby, which shows that info.
>
>> By the time the standby has received that message, it might not be
>> caught-up anymore because new WAL might've been generated in the master
>> already.
>
> Even assuming that you believe this is a useful capability, there is no
> need to change walsender.  It *already* sends the current-end-of-WAL in
> every message, which indicates precisely whether the message contains
> all of available WAL data.

That's not enough to calculate whether failover is safe or not. Even if the
standby's flush location is equal to the master's current end location, new
WAL might have already been generated, and the "success" indication of
the corresponding transaction might have been returned to the client (this
is possible only when async mode). So in addition to the master's current
end location, the standby must know its sync mode, which walsender would
need to send.

Another problem is that, when we can safely promote the standby, the
standby's flush location isn't always equal to the master's current end
location. Imagine the case where there are some unsent WAL in the master
and corresponding transactions are waiting for replication. In this case,
obviously those locations are not the same. But in sync replication, we can
guarantee that all the committed (from the client's view) transactions have
been replicated to the standby, so failover is safe.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center