On Fri, 2010-10-08 at 10:11 -0400, Tom Lane wrote:
> 1. a unique identifier for each standby (not just role names that
> multiple standbys might share);
That is difficult because each standby is identical. If a standby goes
down, people can regenerate a new standby by taking a copy from another
standby. What number do we give this new standby?...
> 2. state on the master associated with each possible standby -- not just
> the ones currently connected.
>
> Both of those are perhaps possible, but the sense I have of the
> discussion is that people want to avoid them.
Yes, I really want to avoid such issues and likely complexities we get
into trying to solve them. In reality they should not be common because
it only happens if the sysadmin has not configured sufficient number of
redundant standbys.
My proposed design is that the timeout does not cause the standby to be
"marked as degraded". It is up to the user to decide whether they wait,
or whether they progress without sync rep. Or sysadmin can release the
waiters via a function call.
If the cluster does become degraded the sysadmin just generates a new
standby and plugs in back into the cluster and away we go. Simple, no
state to be recorded and no state to get screwed up either. I don't
think we should be spending too much time trying to help people that say
they want additional durability guarantees but do not match that with
sufficient hardware resources to make it happen smoothly.
If we do try to tackle those problems who will be able to validate our
code actually works?
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services