Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date
Msg-id 001601cda799$94dbf390$be93dab0$@kapila@huawei.com
Whole thread Raw
In response to Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
  On Wednesday, October 10, 2012 9:15 PM Heikki Linnakangas wrote:
> On 04.10.2012 13:12, Amit kapila wrote:
> > Following changes are done to support replication timeout in sender as
> well as receiver:
> >
> > 1. One new configuration parameter wal_receiver_timeout is added to
> detect timeout at receiver task.
> > 2. Existing parameter replication_timeout is renamed to
> wal_sender_timeout.
> 
> Ok. The other option would be to have just one GUC, I'm open to
> bikeshedding on this one. On one hand, there's no reason the timeouts
> have to the same, so it would be nice to have separate settings, but on
> the other hand, I can't imagine a case where a single setting wouldn't
> work just as well.

I think for below case, they are required to be separate:

1. M1 (Master), S1 (Standby 1), S2 (Standby 2)
2. S1 is standby for M1, and S2 is standby for S1. Basically a simple case
of cascaded replication
3. M1 and S1 are on local network but S2 is placed at geographically
different location.  (what I want to say is n/w between M1-S1 is of good speed and S1-S2 is
very slow)
4. In above case, user might want to configure different timeouts for sender
and receiver on S1.

> Attached is an updated patch. I reverted the merging of message types
> and fixed a bunch of cosmetic issues. There was one bug: in the main
> loop of walreceiver, you send the "ping" message on every wakeup after
> enough time has passed since last reception. That means that if the
> server doesn't reply promptly, you send a new ping message every 100 ms
> (NAPTIME_PER_CYCLE), until it gets a reply. Walsender had the same
> issue, but it was not quite as sever there because the naptime was
> longer. Fixed that.

Thanks.

> 
> How does this look now?

The Patch is fine and test results are also fine.

With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Is there a good reason why PL languages do not support cstring type arguments and return values ?
Next
From: Peter Geoghegan
Date:
Subject: Re: tuplesort memory usage: grow_memtuples