Re: Replication server timeout patch - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Replication server timeout patch
Date
Msg-id AANLkTinWGNZundjdF5asUfFL+gWefSGvV2g0NauFcCxa@mail.gmail.com
Whole thread Raw
In response to Re: Replication server timeout patch  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Replication server timeout patch
List pgsql-hackers
On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> I think we should consider making this change for 9.1.  This is a real
>> wart, and it's going to become even more of a problem with sync rep, I
>> think.
>
> Yeah, that's a welcome! Please feel free to review the patch.

I discussed this with Heikki on IM.

I think we should rip all the GUC change stuff out of this patch and
just decree that if you set a timeout, you get a timeout.  If you set
this inconsistently with wal_receiver_status_interval, then you'll get
lots of disconnects.  But that's your problem.  This may seem a little
unfriendly, but the logic in here is quite complex and still isn't
going to really provide that much protection against bad
configurations.  The only realistic alternative I see is to define
replication_timeout as a multiple of the client's
wal_receiver_status_interval, but that seems quite annoyingly
unfriendly.  A single replication_timeout that applies to all slaves
doesn't cover every configuration someone might want, but it's simple
and easy to understand and should cover 95% of cases.  If we find that
it's really necessary to be able to customize it further, then we
might go the route of adding the much-discussed standby registration
stuff, where there's a separate config file or system table where you
can stipulate that when a walsender with application_name=foo
connects, you want it to get wal_receiver_status_interval=$FOO.  But I
think that complexity can certainly wait until 9.2 or later.

I also think that the default for replication_timeout should not be 0.Something like 60s seems about right.  That way,
ifyou just use the 
default settings, you'll get pretty sane behavior - a connectivity
hiccup that lasts more than a minute will bounce the client.  We've
already gotten reports of people who thought they were replicating
when they really weren't, and had to fiddle with settings and struggle
to try to make it robust.  This should make things a lot nicer for
people out of the box, but it won't if it's disabled out of the box.

On another note, there doesn't appear to be any need to change the
return value of WaitLatchOrSocket().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Add missing keywords to gram.y's unreserved_keywords list.
Next
From: Heikki Linnakangas
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Add missing keywords to gram.y's unreserved_keywords list.