Thread: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

From
Amit Kapila
Date:
On Wednesday, September 12, 2012 10:15 PM Fujii Masao
On Wed, Sep 12, 2012 at 8:54 PM,  <amit.kapila@huawei.com> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      7534
>> Logged by:          Amit Kapila
>> Email address:      amit.kapila@huawei.com
>> PostgreSQL version: 9.2.0
>> Operating system:   Suse 10
>> Description:
>
>> 1. Both master and standby machine are connected normally,
>> 2. then you use the command: ifconfig ip down; make the network card of
>> master and standby down,
>
>> Observation
>> master can detect connect abnormal, but the standby can't detect connect
>> abnormal and show a connected channel long time.

> What about setting keepalives_xxx libpq parameters?
>
http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKE
YWORDS

> Keepalives are not a perfect solution for the termination of connection,
but
> it would help to a certain extent. 

We have tried by enabling keepalive, but it didn't worked maybe because
walreceiver is trying to send reveiver status.
It fails in sending that after many attempts of same.

> If you need something like walreceiver-version of replication_timeout,
such feature has not been implemented yet. 
> Please feel free to implement that!
I would like to implement such feature for walreceiver, but there is one
confusion that whether to use same configuration parameter(replication_timeout) for walrecevier as for
master or introduce a new configuration parameter (receiver_replication_timeout).
The only point in having different timeout parameters for walsender and
walreceiver is for the case of standby which has both walsender and walreceiver to send logs to cascaded standby, in
such case somebody might want to have different timeout parameters for
walsender and walreceiver.OTOH it will create confusion to have too many parameters. My opinion is to
have one timeout parameter for both walsender and walrecevier.

Let me know your suggestion/opinion about same.

Note- I am marking cc to pgsql-hackers, as it will be a feature request.

With Regards,
Amit Kapila.