On Tue, 2010-10-05 at 11:30 -0400, Steve Singer wrote:
> Also on the topic of failover how do we want to deal with the master
> failing over. Say M->{S1,S2} and M fails and we promote S1 to M1. Can
> M1->S2? What if S2 was further along in processing than S1 when M
> failed? I don't think we want to take on this complexity for 9.1 but
> this means that after M fails you won't have a synchronous replica until
> you rebuild or somehow reset S2.
Those are problems that can be resolved, but that is the current state.
The trick, I guess, is to promote the correct standby.
Those are generic issues, not related to any specific patch. Thanks for
keeping those issues in the limelight.
> > == Path Minimization ==
> >
> > We want to be able to minimize and control the path of data transfer,
> > * so that the current master doesn't have initiate transfer to all
> > dependent nodes, thereby reducing overhead on master
> > * so that if the path from current master to descendent is expensive we
> > would minimize network costs.
> >
> > This requirement is commonly known as "relaying".
> >
> > In its most simply stated form, we want one standby to be able to get
> > WAL data from another standby. e.g. M -> S -> S. Stating the problem in
> > that way misses out on the actual requirement, since people would like
> > the arrangement to be robust in case of failures of M or any S. If we
> > specify the exact arrangement of paths then we need to respecify the
> > arrangement of paths if a server goes down.
>
> Are we going to allow these paths to be reconfigured on a live cluster?
> If we have M->S1->S2 and we want to reconfigure S2 to read from M then
> S2 needs to get the data that has already been committed on S1 from
> somewhere (either S1 or M). This has solutions but it adds to the
> complexity. Maybe not for 9.1
If you switch from M -> S1 -> S2 to M -> (S1, S2) it should work fine.
At the moment that needs a shutdown/restart, but that could easily be
done with a disconnect/reconnect following a file reload.
The problem is how much WAL is stored on (any) node. Currently that is
wal_keep_segments, which doesn't work very well, but I've seen no better
ideas that cover all important cases.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services