Greg Smith wrote:
> > I assumed they would set max_standby_delay = -1 and be happy.
> >
>
> The admin in this situation might be happy until the first time the
> primary fails and a failover is forced, at which point there is an
> unbounded amount of recovery data to apply that was stuck waiting behind
> whatever long-running queries were active. I don't know if you've ever
> watched what happens to a pre-8.2 cold standby when you start it up with
> hundreds or thousands of backed up WAL files to process before the
> server can start, but it's not a fast process. I watched a production
> 8.1 standby get >4000 files behind once due to an archive_command bug,
> and it's not something I'd like to ever chew my nails off to again. If
> your goal was HA and you're trying to bring up the standby, the server
> is down the whole time that's going on.
>
> This is why no admin who prioritizes HA would consider
> 'max_standby_delay = -1' a reasonable setting, and those are the sort of
> users Joachim's example was discussing. Only takes one rogue query that
> runs for a long time to make the standby so far behind it's useless for
> HA purposes. And you also have to ask yourself "if recovery is halted
> while waiting for this query to run, how stale is the data on the
> standby getting?". That's true for any large setting for this
> parameter, but using -1 for the unlimited setting also gives the maximum
> possible potential for such staleness.
>
> 'max_standby_delay = -1' is really only a reasonable idea if you are
> absolutely certain all queries are going to be short, which we can't
> dismiss as an unfounded use case so it has value. I would expect you
> have to also combine it with a matching reasonable statement_timeout to
> enforce that expectation to make that situation safer.
Well, as you stated in your blog, you are going to have one of these
downsides:
o master bloato delayed recoveryo cancelled queries
Right now you can't choose "master bloat", but you can choose the other
two. I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.
Our documentation should probably just come how and state that clearly.
-- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB
http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do