Thread: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

From
Fujii Masao
Date:
On Mon, Dec 6, 2010 at 3:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
>> -forever' (i.e., you set allow_standalone_master to false), the master
>> should detect the standby crash as soon as possible by using the
>> timeout. For example, imagine that max_wal_senders is set to one and
>> the master cannot detect the standby crash because of absence of the
>> timeout. In this case, even if you start new standby, it will not be
>> able to connect to the master since there is no free walsender slot.
>> As the result, the master actually waits forever.
>
> This occurred to me that the timeout would be required even for
> asynchronous streaming replication. So, how about implementing the
> replication timeout feature before synchronous replication itself?

Here is the patch. This is one of features required for synchronous
replication, so I added this into current CF as a part of synchronous
replication.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

From
Heikki Linnakangas
Date:
On 06.12.2010 08:51, Fujii Masao wrote:
> On Mon, Dec 6, 2010 at 3:42 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>>> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
>>> -forever' (i.e., you set allow_standalone_master to false), the master
>>> should detect the standby crash as soon as possible by using the
>>> timeout. For example, imagine that max_wal_senders is set to one and
>>> the master cannot detect the standby crash because of absence of the
>>> timeout. In this case, even if you start new standby, it will not be
>>> able to connect to the master since there is no free walsender slot.
>>> As the result, the master actually waits forever.
>>
>> This occurred to me that the timeout would be required even for
>> asynchronous streaming replication. So, how about implementing the
>> replication timeout feature before synchronous replication itself?
>
> Here is the patch. This is one of features required for synchronous
> replication, so I added this into current CF as a part of synchronous
> replication.

Hmm, that's actually a quite different timeout than what's required for 
synchronous replication. In synchronous replication, you need to get an 
acknowledgment within a timeout. This patch only puts a timeout on how 
long we wait to have enough room in the TCP send buffer. That doesn't 
seem all that useful.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


On Mon, Dec 6, 2010 at 9:54 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>>> This occurred to me that the timeout would be required even for
>>> asynchronous streaming replication. So, how about implementing the
>>> replication timeout feature before synchronous replication itself?
>>
>> Here is the patch. This is one of features required for synchronous
>> replication, so I added this into current CF as a part of synchronous
>> replication.
>
> Hmm, that's actually a quite different timeout than what's required for
> synchronous replication. In synchronous replication, you need to get an
> acknowledgment within a timeout. This patch only puts a timeout on how long
> we wait to have enough room in the TCP send buffer. That doesn't seem all
> that useful.

Yeah.  If we rely on the TCP send buffer filling up, then the amount
of time the master takes to notice a dead standby is going to be hard
for the user to predict.  I think the standby ought to send some sort
of heartbeat and the master should declare the standby dead if it
doesn't see a heartbeat soon enough.  Maybe the heartbeat could even
include the receive/fsync/replay LSNs, so that sync rep can use the
same machinery but with more aggressive policies about when they must
be sent.

I also can't help noticing that this approach requires drilling a hole
through the abstraction stack.  We just invented latches; if the API
is going to have to change every time someone wants to implement a
feature, we've built ourselves an awfully porous abstraction layer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Mon, Dec 6, 2010 at 11:54 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Hmm, that's actually a quite different timeout than what's required for
> synchronous replication. In synchronous replication, you need to get an
> acknowledgment within a timeout. This patch only puts a timeout on how long
> we wait to have enough room in the TCP send buffer. That doesn't seem all
> that useful.

Yeah, I'm planning to implement that timeout for synchronous replication later.
Since I thought that we should implement the timeout for *asynchronous*
replication first and then extend it for synchronous replication, I created this
patch. This kind of timeout is required for asynchronous replication since
since there is no acknowledgement from the standby in it.

Most part of the patch implements the non-blocking send function and
changes walsender so that it uses that function instead of existing blocking
one. This will be infrastructure for the timeout for synchronous replication.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Tue, Dec 7, 2010 at 12:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah.  If we rely on the TCP send buffer filling up, then the amount
> of time the master takes to notice a dead standby is going to be hard
> for the user to predict.  I think the standby ought to send some sort
> of heartbeat and the master should declare the standby dead if it
> doesn't see a heartbeat soon enough.  Maybe the heartbeat could even
> include the receive/fsync/replay LSNs, so that sync rep can use the
> same machinery but with more aggressive policies about when they must
> be sent.

OK. How about keepalive-like parameters and behaviors?
   replication_keepalives_idle   replication_keepalives_interval   replication_keepalives_count

The master sends the keepalive packet if replication_keepalives_idle
elapsed after receiving the last ACK packet including the receive/
fsync/replay LSNs from the standby. OTOH, the standby sends the
ACK packet back to the master as soon as receiving the keepalive
packet.

If the master could not receive the ACK packet for
replication_keepalives_interval, it repeats sending the keepalive
packet and receiving the ACK replication_keepalives_count -1
times. If no ACK packet has finally arrived, the master thinks the
standby has been dead.

One obvious merit against my original proposal is that the master
can notice the death of the standby even when there are no WAL
records sendable. One demerit is that the standby needs to send
some packets even in asynchronous replication.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Mon, Dec 20, 2010 at 3:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Dec 7, 2010 at 12:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Yeah.  If we rely on the TCP send buffer filling up, then the amount
>> of time the master takes to notice a dead standby is going to be hard
>> for the user to predict.  I think the standby ought to send some sort
>> of heartbeat and the master should declare the standby dead if it
>> doesn't see a heartbeat soon enough.  Maybe the heartbeat could even
>> include the receive/fsync/replay LSNs, so that sync rep can use the
>> same machinery but with more aggressive policies about when they must
>> be sent.
>
> OK. How about keepalive-like parameters and behaviors?
>
>    replication_keepalives_idle
>    replication_keepalives_interval
>    replication_keepalives_count
>
> The master sends the keepalive packet if replication_keepalives_idle
> elapsed after receiving the last ACK packet including the receive/
> fsync/replay LSNs from the standby. OTOH, the standby sends the
> ACK packet back to the master as soon as receiving the keepalive
> packet.
>
> If the master could not receive the ACK packet for
> replication_keepalives_interval, it repeats sending the keepalive
> packet and receiving the ACK replication_keepalives_count -1
> times. If no ACK packet has finally arrived, the master thinks the
> standby has been dead.

This doesn't really make sense, because you're connecting over a TCP
connection.  Once you send the first keepalive, TCP will keep retrying
in some way that we have no control over.  If those packets aren't
getting through, adding more data to what has to be transmitted seems
unlikely to do anything useful.  I think the parameters we can
usefully set are:

- how long does the master wait before sending a keepalive request?
- how long does the master wait after sending a keepalive before
declaring the slave dead and closing the connection?

But this can be further simplified.  The slave doesn't really need the
master to prompt it to send acknowledgments.  It only needs to send
them sufficiently often.  As part of the start-replication sequence,
let's have the master tell the slave "send me an acknowledgment at
least every N seconds".  And then the slave must do that.  The master
then has some value K > N, such that if no acknowledgment is received
after K seconds, the connection is disconnected.

The only reason to have the master send explicit keepalive requests
(vs. just telling the client the interval) is if the master might
request them for some reason other than timer expiration.  Since the
main point of this is to detect the situation where the slave has e.g.
power cycled so that the connection is gone but the master doesn't
know it, you could imagine a system where, when a new replication
connection is received, we request keepalives on all of the existing
connections to see if any of them are defunct.  But I don't really
think it needs to be quite that complicated.

Another consideration is that you could configure the
keepalive-frequency on the slave and the declare-dead-time on the
master.  Then the master wouldn't need to tell the slave the
keepalive-frequency at replication start-up time.  But that might also
increase the chances of incompatible settings (e.g. slave's keepalive
frequency is >= master's declare-dead-time), which would result in a
lot of unnecessary reconnects.  If both parameters are configured on
the master, then we can enforce that declare-dead-time >
keepalive-frequency.

So I suggest:

replication_keepalive_time - how often the slave is instructed to send
acknowledgments when idle
replication_idle_timeout - the period of inactivity after which the
master closes the connection to the slave

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

From
Aidan Van Dyk
Date:
On Mon, Dec 20, 2010 at 3:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> OK. How about keepalive-like parameters and behaviors?
>
>    replication_keepalives_idle
>    replication_keepalives_interval
>    replication_keepalives_count
>
> The master sends the keepalive packet if replication_keepalives_idle
> elapsed after receiving the last ACK packet including the receive/
> fsync/replay LSNs from the standby. OTOH, the standby sends the
> ACK packet back to the master as soon as receiving the keepalive
> packet.
>
> If the master could not receive the ACK packet for
> replication_keepalives_interval, it repeats sending the keepalive
> packet and receiving the ACK replication_keepalives_count -1
> times. If no ACK packet has finally arrived, the master thinks the
> standby has been dead.

I thought we were using a single TCP session per standby/slave?  So
adding another "KEEPALIVE" into the local buffer side of the TCP
stream isn't going to help a "stuck" one arrive earlier.

You really only have a few situations:

1) Network problems.  Stuffing more stuff into the local buffers isn't
gonig to help get packets from the remote that it would like to send
(I say like to send, because network problems could be on either/both
directions, the remote may or may not have seen our keepalive
requrest)

2) The remote is getting them, and is swamped.  It's not going to get
processing our 2nd keepalive any sooner than processing our 1st.

If a walreceiver reads a "keepalive" request, Just declare that it
must reply immediately.  Then the master config can trust that a
keepalive should be replied to pretty quickly if networks is ok.  TCP
will make it get there "eventually" if it's a bad network, and the
admins have set it be very network tolerant.

The ACK might report that the salve is hopelessly behind on
fsyncing/applying it's WAL, but that's good too.  At least then the
ACK comes back, and the master knows the slave is still churning away
on the last batch of WAL, and can decide if it wants to think the
slave is too far behind and boot it out.


--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.