Re: Keepalive for max_standby_delay - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Keepalive for max_standby_delay
Date
Msg-id 4C06B3FD.4090709@enterprisedb.com
Whole thread Raw
In response to Re: Keepalive for max_standby_delay  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Keepalive for max_standby_delay
Re: Keepalive for max_standby_delay
List pgsql-hackers
On 02/06/10 20:14, Tom Lane wrote:
>  For realistic values of max_standby_delay ...

Hang on right there. What do you consider a realistic value for 
max_standby_delay? Because I'm not sure I have a grip on that myself. 5 
seconds? 5 minutes? 5 hours? I can see use cases for all of those...

> What I think might be a realistic compromise is this:
>
> 1. Separate max_standby_delay into two GUCs, say "max_streaming_delay"
> and "max_archive_delay".
>
> 2. When applying WAL that came across SR, use max_streaming_delay and
> let the time measurement be current time minus time of receipt of the
> current WAL send chunk.
>
> 3. When applying WAL that came from archive, use max_archive_delay and
> let the time measurement be current time minus time of acquisition of
> the current WAL segment from the archive.
>
> The current code's behavior in the latter case could effectively be
> modeled by setting max_archive_delay to zero, but that isn't the only
> plausible setting.  More likely DBAs would set max_archive_delay to
> something smaller than max_streaming_delay, but still positive so as to
> not kill conflicting queries instantly.

The problem with defining max_archive_delay that way is again that you 
can fall behind indefinitely. If you set it to 5 minutes, it means that 
you'll wait a maximum of 5 minutes *per WAL segment*, even if WAL is 
being generated faster.

I don't understand why you want to use a different delay when you're 
restoring from archive vs. when you're streaming (what about existing 
WAL files found in pg_xlog, BTW?). The source of WAL shouldn't make a 
difference. If it's because you assume that restoring from archive is a 
sign that you've fallen behind a lot, surely you've exceeded 
max_standby_delay then and I still don't see a need for a separate GUC.

I stand by my suggestion from yesterday: Let's define max_standby_delay 
as the difference between a piece of WAL becoming available in the 
standby, and applying it. To approximate "piece of WAL becoming 
available" for SR, we can use the mechanism with send/applyChunks from 
Simon's latest patch, or go with the simpler scheme of just resetting a 
"last caughtup timestamp" to current time whenever we have to wait for 
new WAL to arrive. When restoring from archive, likewise reset "last 
caughtup timestamp" whenever restore_command returns non-0, i.e we have 
to wait for the next WAL file to arrive.

That works the same for both SR and file-based log shipping, there's 
only one knob to set, is simple to implement and doesn't require 
synchronized clocks.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: "caught_up" status in walsender
Next
From: Peter Eisentraut
Date:
Subject: Re: How to pass around collation information