On Mon, Dec 6, 2010 at 9:54 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>>> This occurred to me that the timeout would be required even for
>>> asynchronous streaming replication. So, how about implementing the
>>> replication timeout feature before synchronous replication itself?
>>
>> Here is the patch. This is one of features required for synchronous
>> replication, so I added this into current CF as a part of synchronous
>> replication.
>
> Hmm, that's actually a quite different timeout than what's required for
> synchronous replication. In synchronous replication, you need to get an
> acknowledgment within a timeout. This patch only puts a timeout on how long
> we wait to have enough room in the TCP send buffer. That doesn't seem all
> that useful.
Yeah. If we rely on the TCP send buffer filling up, then the amount
of time the master takes to notice a dead standby is going to be hard
for the user to predict. I think the standby ought to send some sort
of heartbeat and the master should declare the standby dead if it
doesn't see a heartbeat soon enough. Maybe the heartbeat could even
include the receive/fsync/replay LSNs, so that sync rep can use the
same machinery but with more aggressive policies about when they must
be sent.
I also can't help noticing that this approach requires drilling a hole
through the abstraction stack. We just invented latches; if the API
is going to have to change every time someone wants to implement a
feature, we've built ourselves an awfully porous abstraction layer.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company