Home > mailing lists

Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date	October 1, 2012 19:57:43
Msg-id	CAHGQGwEd34=Z7=t9q8Xf11pmQS5a216ug7NW4V6qpuawG1crOA@mail.gmail.com Whole thread Raw
In response to	Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Oct 1, 2012 at 7:38 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Hmm, I think we need to step back a bit. I've never liked the way
> replication_timeout works, where it's the user's responsibility to set
> wal_receiver_status_interval < replication_timeout. It's not very
> user-friendly. I'd rather not copy that same design to this walreceiver
> timeout. If there's two different timeouts like that, it's even worse,
> because it's easy to confuse the two.

Agreed.

I'd like to specify the replication timeout like we do TCP keepalives, i.e.,
what about introducing something like following parameters?
   walsender_keepalives_idle   walsender_keepalives_interval   walsender_keeaplives_count   walreceiver_keepalives_idle
 walreceiver_keepalives_interval   walreceiver_keepalives_count

I believe many users are basically familiar with TCP keepalives and how to
specify it. So I think that this approach would be intuitive to users. Also
this approach includes your proposal. If you specify
   walsender_keepalives_idle = walsender_timeout / 2   walsender_keepalives_interval = -1 (disable; Ping is never sent
again if there is no reply after first Ping is sent)   walsender_keepalives_count = 1

the replication timeout works as you proposed. But of course the downside
of this approach is that the number of parameter for replication timeout is
increased from two (replication_timeout and
wal_receiver_status_interval) to six,
and those parameters are confusingly similar to existing
tcp_keepalives parameters,
which might cause another confusion to users. One idea to solve this problem is
to use existing tcp_keepalives paramters values for the replication timeout.

Regards,

-- 
Fujii Masao

pgsql-hackers by date:

From: Bruce Momjian
Date: 01 October 2012, 19:35:30
Subject: Re: WIP checksums patch

From: Jeff Davis
Date: 01 October 2012, 20:04:01
Subject: Re: WIP checksums patch

Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

Previous

Next