Re: Replication server timeout patch - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Replication server timeout patch
Date
Msg-id AANLkTim-Q1VVr9DGPsMh=p4VApSZ3Y=1QQoSDcFjCyvU@mail.gmail.com
Whole thread Raw
In response to Re: Replication server timeout patch  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Replication server timeout patch  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Thu, Feb 17, 2011 at 9:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus <josh@agliodbs.com> wrote:
>>> So, in summary, the position is that we have a timeout, but that timeout
>>> doesn't work in all cases. But it does work in some, so that seems
>>> enough for me to say "let's commit". Not committing gives us nothing at
>>> all, which is as much use as a chocolate teapot.
>>
>> Can someone summarize the cases where it does and doesn't work?
>> There's been a longish gap in this thread.
>
> The timeout doesn't work when walsender gets blocked during sending the
> WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't
> work when the standby becomes unresponsive while WAL is generated on
> the master one after another. Since walsender tries to continue sending the
> WAL while the standby is unresponsive, the send buffer gets filled up and
> the blocking send function (e.g., pq_flush) blocks the walsender.
>
> OTOH, if the standby becomes unresponsive when there is no workload
> which causes WAL, the timeout would work.

IMHO, that's so broken as to be useless.

I would really like to have a solution to this problem, though.
Relying on TCP keepalives is weak.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Add support for logging the current role
Next
From: Mark Kirkwood
Date:
Subject: WIP - Add ability to constrain backend temporary file space