Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
Date
Msg-id AANLkTin=8a9oS7D3dg6v3wB0Sgz21W-x-8zLfgy5Vt0f@mail.gmail.com
Whole thread Raw
In response to Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep  (Robert Haas <robertmhaas@gmail.com>)
Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep  (Aidan Van Dyk <aidan@highrise.ca>)
List pgsql-hackers
On Tue, Dec 7, 2010 at 12:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah.  If we rely on the TCP send buffer filling up, then the amount
> of time the master takes to notice a dead standby is going to be hard
> for the user to predict.  I think the standby ought to send some sort
> of heartbeat and the master should declare the standby dead if it
> doesn't see a heartbeat soon enough.  Maybe the heartbeat could even
> include the receive/fsync/replay LSNs, so that sync rep can use the
> same machinery but with more aggressive policies about when they must
> be sent.

OK. How about keepalive-like parameters and behaviors?
   replication_keepalives_idle   replication_keepalives_interval   replication_keepalives_count

The master sends the keepalive packet if replication_keepalives_idle
elapsed after receiving the last ACK packet including the receive/
fsync/replay LSNs from the standby. OTOH, the standby sends the
ACK packet back to the master as soon as receiving the keepalive
packet.

If the master could not receive the ACK packet for
replication_keepalives_interval, it repeats sending the keepalive
packet and receiving the ACK replication_keepalives_count -1
times. If no ACK packet has finally arrived, the master thinks the
standby has been dead.

One obvious merit against my original proposal is that the master
can notice the death of the standby even when there are no WAL
records sendable. One demerit is that the standby needs to send
some packets even in asynchronous replication.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Alex Hunsaker
Date:
Subject: Re: plperlu problem with utf8
Next
From: Dimitri Fontaine
Date:
Subject: Re: Extensions and custom_variable_classes