Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date
Msg-id 001301cda22b$ad43d650$07cb82f0$@kapila@huawei.com
Whole thread Raw
In response to Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Amit kapila <amit.kapila@huawei.com>)
List pgsql-hackers

> -----Original Message-----
> From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-
> owner@postgresql.org] On Behalf Of Amit kapila
> Sent: Thursday, October 04, 2012 3:43 PM
> To: Heikki Linnakangas
> Cc: Fujii Masao; pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org
> Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w
> breakdown
> 
> On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote:
> On 02.10.2012 10:36, Amit kapila wrote:
> > On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
> >>> So let's think how this should ideally work from a user's point of
> view.
> >>> I think there should be just two settings: walsender_timeout and
> >>> walreceiver_timeout. walsender_timeout specifies how long a
> >>> walsender will keep a connection open if it doesn't hear from the

> 
> Thank you for suggestions.
> I have addressed your suggestions in patch attached with this mail.
> 
> Following changes are done to support replication timeout in sender as
> well as receiver:


Testing Done for the Patch
--------------------------------
1. Verified the value of new configuration parameter and changed
configuration parameter using the show command (using Show of specific   parameter as well as show all). 
2. Verified the new configuration parameter in --describe-config. 
3. Verified the existing parameter replication_timeout's new name in
--describe-config. 
4. Start primary and standby node with default timeout, leave it for
sometime in idle situation.   It should not error out due to network break error. 
5. a. Start primary and standby node with default timeout, bring down the
network.   b. Both sender and receiver should be able to detect network break-down
almost at same time.   c. Once the network is up again, connection should get re-established
successfully. 
5. a. Start primary and standby node with wal_sender_timeout less than
wal_receiver_timeout, bring down the network.   b. Sender should be able to detect network break-down before receiver
task.   c. Once the network is up again, connection should get re-established
successfully. 
6. a. Start primary and standby node with wal_receiver_timeout less than
wal_sender_timeout, bring down the network.   b. Receiver should be able to detect network break-down before sender
task.   c. Once the network is up again, connection should get re-established
successfully. 
7. a. In 5th test case, change the value of wal_receiver_status_interval to
more than wal_receiver_timeout and hence more than       wal_sender_timeout.   b. Then bring down the network down.  c.
Sendertask should be able to detect network break-down once
 
wal_sender_timeout has lapsed.   d. Once the network is up again, connection should get re-established
successfully.  Intent of this test is to check there is no dependency of
wal_sender_timeout on wal_receiver_status_interval for detection of  Network break.

All the above tests are passed. 

With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Boszormenyi Zoltan
Date:
Subject: Re: [PATCH] Make pg_basebackup configure and start standby [Review]
Next
From: Robert Haas
Date:
Subject: Re: Raise a WARNING if a REVOKE affects nothing?