Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date
Msg-id CAHGQGwExi29o0D8OKzKVPpXBwxqusgH=n2T0+Chxs=QMjEFUSA@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Amit kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On Tue, Nov 13, 2012 at 1:06 PM, Amit kapila <amit.kapila@huawei.com> wrote:
> On Monday, November 12, 2012 8:23 PM Fujii Masao wrote:
> On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
>>> On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com>
>>> wrote:
>>> > On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
>>> >> On 19.10.2012 14:42, Amit kapila wrote:
>>> >> > On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
>>> >> >> Before implementing the timeout parameter, I think that it's
>>> better
>>> >> to change
>>> >> >> both pg_basebackup background process and pg_receivexlog so that
>
>>>> BTW, IIRC the walsender has no timeout mechanism during sending
>>>> backup data to pg_basebackup. So it's also useful to implement the
>>> timeout mechanism for the walsender during backup.
>>
>>> Yes, its useful, but for walsender the main problem is that it uses blocking
>>> send call to send the data.
>>> I have tried using tcp_keepalive settings, but the send call doesn't comeout
>>> incase of network break.
>>> The only way I could get it out is:
>>> change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using
>>> the command
>>                         echo "8" > /proc/sys/net/ipv4/tcp_retries2
>>> As per recommendation, its value should be at-least 8 (equivalent to 100
>>> sec)
>>
>>> Do you have any idea, how it can be achieved?
>
>> What about using pq_putmessage_noblock()?
>
> I will try this, but do you know why at first place in code the blocking mode is used to send files?
> I am asking as I am little scared that it should not break any design which was initially thought of while making
sendof files as blocking.
 

I'm afraid I don't know why. I guess that using non-blocking mode complicates
the code, so in the first version of pg_basebackup the blocking mode
was adopted.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Inadequate thought about buffer locking during hot standby replay
Next
From: Dimitri Fontaine
Date:
Subject: Re: Memory leaks in record_out and record_send