Re: Keepalive for max_standby_delay - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Keepalive for max_standby_delay
Date
Msg-id 1275390288.21465.279.camel@ebony
Whole thread Raw
In response to Re: Keepalive for max_standby_delay  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Thanks for the review.

On Tue, 2010-06-01 at 13:36 +0300, Heikki Linnakangas wrote:

> If we really want to try to salvage max_standby_delay with a meaning 
> similar to what it has now, I think we should go with the idea some 
> people bashed around earlier and define the grace period as the 
> difference between a WAL record becoming available to the standby for 
> replay, and between replaying it. An approximation of that is to do 
> "lastIdle = gettimeofday()" in XLogPageRead() whenever it needs to wait 
> for new WAL to arrive, whether that's via streaming replication or by a 
> success return code from restore_command, and compare the difference of 
> that with current timestamp in WaitExceedsMaxStandbyDelay().

That wouldn't cope with a continuous stream of records arriving, unless
you also include the second half of the patch.

> That's very simple, doesn't require synchronized clocks, and works the 
> same with file- and stream-based setups.

Nor does it provide a mechanism for monitoring of SR. standby_delay is
explicitly defined in terms of the gap between two servers, so is a
useful real world concept. apply_delay is somewhat less interesting.

I'm sure most people would rather have monitoring and therefore the
requirement for synchronised-ish clocks, than no monitoring. If you
think no monitoring is OK, I don't, but there are other ways, so its not
a point to fight about.

> This certainly alleviates some of the problems. You still need to ensure
> that master and standby have synchronized clocks, and you still get zero 
> grace time after a long period of inactivity when not using streaming 
> replication, however.

Second issue can be added once we approve the rest of this if you like.

> Sending a keep-alive message every 100ms seems overly aggressive to me.

It's sent every wal_sender_delay. Why is that a negative?

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [RFC] A tackle to the leaky VIEWs for RLS
Next
From: Hardik Belani
Date:
Subject: Trigger function in a multi-threaded environment behavior