On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
> >> A related issue is that currently the recovery PANICs if it runs out of
> >> recovery procs. I think that's not acceptable, regardless of whether we
> >> use slotids or some other method to avoid it in normal operation,
> >> because it means you can't recover at all if you set max_connections too
> >> low in the standby (or in the primary, and you have to recover from
> >> crash), or you run out of recovery procs because of an abort failed in
> >> the primary like discussed on that thread.
> >
> >> The standby should just
> >> fast-forward to the next running-xacts record in that case.
> >
> > What do you mean by "fast forward"?
>
> I mean the standby should stop trying to track the in progress
> transactions in recovery procs, and apply the WAL records like it does
> before the consistent state is reached.
If you say something is not acceptable you need to say what behaviour
you do find acceptable in its place. It's good to come up with new
ideas, but it's not good to ignore the problems the new ideas have. This
is a general point that applies both here and to your proposals with
synch rep. It's much harder to say how it should work in a way that
covers all the requirements and points others have made, otherwise its
just handwaving.
So, if we don't PANIC, how should we behave?
Without full information on running-xacts we would be unable to take a
snapshot, so should:
* backends be forcibly disconnected?
* backends hang waiting for snapshot info to be re-available again in X
minutes worth of WAL time?
* backends throw an ERROR: unable to provide snapshot at this time,
DETAIL: retry your statement later.
...other alternatives
and possibly prevent new connections.
If max_connections is higher on primary then the standby will *never* be
available for querying. Should we have multiple ERRORs depending upon
whether the situation is hopefully-temporary or looks-permanent?
Don't assume I want the PANIC. That clearly needs to be revisited if we
change slotids. I just want to make a balanced judgement based upon a
full consideration of the options.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support