Re: Keepalive for max_standby_delay - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Keepalive for max_standby_delay |
Date | |
Msg-id | 4C06B3FD.4090709@enterprisedb.com Whole thread Raw |
In response to | Re: Keepalive for max_standby_delay (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Keepalive for max_standby_delay
Re: Keepalive for max_standby_delay |
List | pgsql-hackers |
On 02/06/10 20:14, Tom Lane wrote: > For realistic values of max_standby_delay ... Hang on right there. What do you consider a realistic value for max_standby_delay? Because I'm not sure I have a grip on that myself. 5 seconds? 5 minutes? 5 hours? I can see use cases for all of those... > What I think might be a realistic compromise is this: > > 1. Separate max_standby_delay into two GUCs, say "max_streaming_delay" > and "max_archive_delay". > > 2. When applying WAL that came across SR, use max_streaming_delay and > let the time measurement be current time minus time of receipt of the > current WAL send chunk. > > 3. When applying WAL that came from archive, use max_archive_delay and > let the time measurement be current time minus time of acquisition of > the current WAL segment from the archive. > > The current code's behavior in the latter case could effectively be > modeled by setting max_archive_delay to zero, but that isn't the only > plausible setting. More likely DBAs would set max_archive_delay to > something smaller than max_streaming_delay, but still positive so as to > not kill conflicting queries instantly. The problem with defining max_archive_delay that way is again that you can fall behind indefinitely. If you set it to 5 minutes, it means that you'll wait a maximum of 5 minutes *per WAL segment*, even if WAL is being generated faster. I don't understand why you want to use a different delay when you're restoring from archive vs. when you're streaming (what about existing WAL files found in pg_xlog, BTW?). The source of WAL shouldn't make a difference. If it's because you assume that restoring from archive is a sign that you've fallen behind a lot, surely you've exceeded max_standby_delay then and I still don't see a need for a separate GUC. I stand by my suggestion from yesterday: Let's define max_standby_delay as the difference between a piece of WAL becoming available in the standby, and applying it. To approximate "piece of WAL becoming available" for SR, we can use the mechanism with send/applyChunks from Simon's latest patch, or go with the simpler scheme of just resetting a "last caughtup timestamp" to current time whenever we have to wait for new WAL to arrive. When restoring from archive, likewise reset "last caughtup timestamp" whenever restore_command returns non-0, i.e we have to wait for the next WAL file to arrive. That works the same for both SR and file-based log shipping, there's only one knob to set, is simple to implement and doesn't require synchronized clocks. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: