Re: Replication server timeout patch - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Replication server timeout patch
Date
Msg-id AANLkTik3-GETvakKDTwNXC3OVUr+w3DFMiriG2aiTguy@mail.gmail.com
Whole thread Raw
In response to Re: Replication server timeout patch  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Replication server timeout patch
List pgsql-hackers
On Sat, Mar 12, 2011 at 4:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> I think we should consider making this change for 9.1.  This is a real
>>> wart, and it's going to become even more of a problem with sync rep, I
>>> think.
>>
>> Yeah, that's a welcome! Please feel free to review the patch.
>
> I discussed this with Heikki on IM.
>
> I think we should rip all the GUC change stuff out of this patch and
> just decree that if you set a timeout, you get a timeout.  If you set
> this inconsistently with wal_receiver_status_interval, then you'll get
> lots of disconnects.  But that's your problem.  This may seem a little
> unfriendly, but the logic in here is quite complex and still isn't
> going to really provide that much protection against bad
> configurations.  The only realistic alternative I see is to define
> replication_timeout as a multiple of the client's
> wal_receiver_status_interval, but that seems quite annoyingly
> unfriendly.  A single replication_timeout that applies to all slaves
> doesn't cover every configuration someone might want, but it's simple
> and easy to understand and should cover 95% of cases.  If we find that
> it's really necessary to be able to customize it further, then we
> might go the route of adding the much-discussed standby registration
> stuff, where there's a separate config file or system table where you
> can stipulate that when a walsender with application_name=foo
> connects, you want it to get wal_receiver_status_interval=$FOO.  But I
> think that complexity can certainly wait until 9.2 or later.
>
> I also think that the default for replication_timeout should not be 0.
>  Something like 60s seems about right.  That way, if you just use the
> default settings, you'll get pretty sane behavior - a connectivity
> hiccup that lasts more than a minute will bounce the client.  We've
> already gotten reports of people who thought they were replicating
> when they really weren't, and had to fiddle with settings and struggle
> to try to make it robust.  This should make things a lot nicer for
> people out of the box, but it won't if it's disabled out of the box.
>
> On another note, there doesn't appear to be any need to change the
> return value of WaitLatchOrSocket().

Agreed. I'll change the patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: How should the waiting backends behave in sync rep?
Next
From: Simon Riggs
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,