Re: BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-bugs

From Amit kapila
Subject Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Date
Msg-id 6C0B27F7206C9E4CA54AE035729E9C3828532916@szxeml509-mbs
Whole thread Raw
In response to Re: BUG #7534: walreceiver takes long time to detect n/w breakdown  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-bugs
On Sunday, September 16, 2012 12:14 AM Fujii Masao wrote:
On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila <amit.kapila@huawei.com> wrote:
> On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote:
> On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila <amit.kapila@huawei.com> wrote:
>>
>> On Thursday, September 13, 2012 10:57 PM Fujii Masao
>> On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kapila@huawei.com> wrote:
>>> On Wednesday, September 12, 2012 10:15 PM Fujii Masao
>>> On Wed, Sep 12, 2012 at 8:54 PM,  <amit.kapila@huawei.com> wrote:
>>>>>>> The following bug has been logged on the website:
>
>>>>>>  I would like to implement such feature for walreceiver, but there is one
>>>>>> confusion that whether to use
>>>>>>  same configuration parameter(replication_timeout) for walrecevier as for
>>>>>> master or introduce a new
>>>>>>  configuration parameter (receiver_replication_timeout).
>>
>>>>>I like the latter. I believe some users want to set the different
>>>>>timeout values,
>>>>>for example, in the case where the master and standby servers are placed in
>>>>>the same room, but cascaded standby is placed in other continent.
>>
>>>> Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for
walreceiver.
>>>> The main changes are:
>>>> 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver.
>>>> 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit
thewalreceiver. 
>>> >    This is same as walsender functionality.
>>
>>>> As this is a feature, So I am uploading the attached patch in coming CommitFest.
>>
>>>> Suggestions/Comments?
>
>>> You also need to change walsender so that it periodically sends the heartbeat
>>> message, like walreceiver does each wal_receiver_status_interval. Otherwise,
>>> walreceiver will detect the timeout wrongly whenever there is no traffic in the
>>> master.
>
>> Doesn't current keepalive message from walsender will suffice that need?

>No. Though the keepalive interval should be smaller than the timeout,
>IIRC there is
>no way to specify the keepalive interval now.

Currently AFAICS in the code on idle system, it should send keepalive after 10s which is hardcoded value as sleeptime.
You are right that if its not configurable, and somebody configures replication_timeout as value lower than 10s then
thelogic will fail. 

So is it okay if a new config parameter similar to wal_receiver_status_interval be added and map it directly to
sleeptimein the current code. 
There will be no need for any new heartbeat message, existing keepalive will sufice that purpose.

With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Fujii Masao
Date:
Subject: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Next
From: barrybrown@sierracollege.edu
Date:
Subject: BUG #7543: Invalid table alias: DELETE FROM table *