Re: Keepalive for max_standby_delay - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Keepalive for max_standby_delay
Date
Msg-id 1275585504.21465.3023.camel@ebony
Whole thread Raw
In response to Re: Keepalive for max_standby_delay  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Keepalive for max_standby_delay  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Keepalive for max_standby_delay  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Thu, 2010-06-03 at 12:47 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Wed, 2010-06-02 at 13:14 -0400, Tom Lane wrote:
> >> This patch seems to me to be going in fundamentally the wrong direction.
> >> It's adding complexity and overhead (far more than is needed), and it's
> >> failing utterly to resolve the objections that I raised to start with.
> 
> > Having read your proposal, it seems changing from time-on-sender to
> > time-on-receiver is a one line change to the patch.
> 
> > What else are you thinking of removing, if anything?
> 
> Basically, we need to get rid of everything that feeds timestamps from
> the WAL content into the kill-delay logic.

I understand your wish to do this, though it isn't always accurate to do
that in the case where there is a backlog.

The patch already does that *mostly* for the streaming case. The only
time it does use the WAL content timestamp is when the WAL content
timestamp is later than the oldest receive time, so in that case it is
used as a correction. (see code comments and comments below also)

> >> In particular, Simon seems to be basically refusing to do anything about
> >> the complaint that the code fails unless master and standby clocks are
> >> in close sync.  I do not believe that this is acceptable, and since he
> >> won't fix it, I guess I'll have to.
> 
> > Syncing two servers in replication is common practice, as has been
> > explained here; I'm still surprised people think otherwise.
> 
> Doesn't affect the complaint in the least: I do not find it acceptable
> to have that be *mandated* in order for our code to work sensibly.

OK, accepted.

> I would be OK with having something approaching what you want as a
> non-default optional behavior (with a clearly-documented dependency
> on having synchronized clocks).  

Yes, that's what I'd been thinking also. So that gives us a way
forwards.

standby_delay_base = apply (default) | send

determines whether the standby_delay is calculated solely with reference
to the standby server (apply) or whether times from the sending server
are used (send). Use of send implies that the clocks on primary and
standby are synchronised to within a useful accuracy, in which case it
is usual to enforce that with time synchronisation software such as ntp.

> But in any case the current behavior is
> still quite broken as regards reading stale timestamps from WAL.

Agreed. That wasn't the objective of this patch or a priority.

If you're reading from an archive, you need to set max_standby_delay
appropriately, current recommendation is -1.

We can address that concern once the main streaming case is covered, or
we can add that now.

Are you planning to work on these things now as you said? How about I
apply my patch now, then you do another set of changes afterwards to add
the other items you mentioned, since that is now 2 additional parameters
and related code?

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Did we really want to force an initdb in beta2?
Next
From: Marc Munro
Date:
Subject: Re: [pgsql-hackers] Daily digest v1.10705 (13 messages)