Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date
Msg-id CAHGQGwGdYJ1tJDHH+FURgaJhRR1kmpvKatpBdmgLsk7ZMhYKPA@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
>> On 19.10.2012 14:42, Amit kapila wrote:
>> > On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
>> >> Before implementing the timeout parameter, I think that it's better
>> to change
>> >> both pg_basebackup background process and pg_receivexlog so that they
>> >> send back the reply message immediately when they receive the
>> keepalive
>> >> message requesting the reply. Currently, they always ignore such
>> keepalive
>> >> message, so status interval parameter (-s) in them always must be set
>> to
>> >> the value less than replication timeout. We can avoid this
>> troublesome
>> >> parameter setting by introducing the same logic of walreceiver into
>> both
>> >> pg_basebackup background process and pg_receivexlog.
>> >
>> > Please find the patch attached to address the modification mentioned
>> by you (send immediate reply for keepalive).
>> > Both basebackup and pg_receivexlog uses the same function
>> ReceiveXLogStream, so single change for both will address the issue.
>>
>> Thanks, committed this one after shuffling it around the changes I
>> committed yesterday. I also updated the docs to not claim that -s option
>> is required to avoid timeout disconnects anymore.
>
> Thank you.
> However I think still the issue will not be completely solved.
> pg_basebackup/pg_receivexlog can still take long time to
> detect network break as they don't have timeout concept. To do that I have
> sent one proposal which is mentioned at end of mail chain:
> http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
> 53BBED@szxeml509-mbs
>
> Do you think there is any need to introduce such mechanism in
> pg_basebackup/pg_receivexlog?

Are you planning to introduce the timeout mechanism in pg_basebackup
main process? Or background process? It's useful to implement both.

BTW, IIRC the walsender has no timeout mechanism during sending
backup data to pg_basebackup. So it's also useful to implement the
timeout mechanism for the walsender during backup.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Doc patch, distinguish sections with an empty row in error code table
Next
From: Robert Haas
Date:
Subject: Re: Deferrable NOT NULL constraints in 9.3?