Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers
From | Amit kapila |
---|---|
Subject | Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown |
Date | |
Msg-id | 6C0B27F7206C9E4CA54AE035729E9C382853BBED@szxeml509-mbs Whole thread Raw |
In response to | Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
|
List | pgsql-hackers |
On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: On Wed, Oct 17, 2012 at 8:46 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >> On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote: >> On 13.10.2012 19:35, Fujii Masao wrote: >> > On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas >> > <hlinnakangas@vmware.com> wrote: >> >> Ok, thanks. Committed. >> > >> > I found one typo. The attached patch fixes that typo. >> >> Thanks, fixed. >> >> > ISTM you need to update the protocol.sgml because you added >> > the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage. > > >> >> > Is it worth adding the same mechanism (send back the reply immediately >> > if walsender request a reply) into pg_basebackup and pg_receivexlog? >> >> Good catch. Yes, they should be taught about this too. I'll look into >> doing that too. > > If you have not started and you don't have objection, I can pickup this to > complete it. > > For both (pg_basebackup and pg_receivexlog), we need to get a timeout > parameter from user in command line, as > there is no conf file here. New Option can be -t (parameter name can be > recvtimeout). > > The main changes will be in function ReceiveXlogStream(), it is a common > function for both > Pg_basebackup and pg_receivexlog. Handling will be done in same way as we > have done in walreceiver. > > Suggestions/Comments? >Before implementing the timeout parameter, I think that it's better to change >both pg_basebackup background process and pg_receivexlog so that they >send back the reply message immediately when they receive the keepalive >message requesting the reply. Currently, they always ignore such keepalive >message, so status interval parameter (-s) in them always must be set to >the value less than replication timeout. We can avoid this troublesome >parameter setting by introducing the same logic of walreceiver into both >pg_basebackup background process and pg_receivexlog. Please find the patch attached to address the modification mentioned by you (send immediate reply for keepalive). Both basebackup and pg_receivexlog uses the same function ReceiveXLogStream, so single change for both will address the issue. Now further to this for introducing timeout in pg_basebackup and pg_receivexlog: We can have mechanism similar to wal receiver timeout while streaming the data from server, but same logic can not be usedincase network goes down during getting other database file from server. The reason for the same is to receive the data files PQgetCopyData() is called in synchronous mode, so it keeps waiting forinfinite time till it gets some data. In order to solve this issue, I can think of following options: 1. Making this call also asynchronous (but now sure about impact of this). 2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite wait), we can send some finite time. This timecan be received as command line argument from respective utility and set the same in PGconn structure. In order to have timeout value in PGconn, we can have: a. Add new parameter in PGconn to indicate the receive timeout. b. Use the existing parameter connect_timeout for receive timeout also but this may lead to confusion. 3. Any other better option? Apart from above issue, there is possibility that if during connect time network goes down, then it might hang, becauseconnect_timeout by default will be NULL and connectDBComplete will start waiting inifinitely for connection to becomesuccessful. So shall we have command line argument separately for this also or any other way as you suugest. Suggestions/Comments With Regards, Amit Kapila.
Attachment
pgsql-hackers by date: