Re: Sync Rep at Oct 5 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Sync Rep at Oct 5
Date
Msg-id 1286348806.2304.9.camel@ebony
Whole thread Raw
In response to Re: Sync Rep at Oct 5  (Steve Singer <ssinger@ca.afilias.info>)
Responses Re: Sync Rep at Oct 5
List pgsql-hackers
On Tue, 2010-10-05 at 11:30 -0400, Steve Singer wrote:

> Also on the topic of failover how do we want to deal with the master 
> failing over.   Say M->{S1,S2} and M fails and we promote S1 to M1.  Can 
> M1->S2?     What if S2 was further along in processing than S1 when M 
> failed?  I don't think we want to take on this complexity for 9.1 but 
> this means that after M fails you won't have a synchronous replica until 
> you rebuild or somehow reset S2.

Those are problems that can be resolved, but that is the current state.
The trick, I guess, is to promote the correct standby.

Those are generic issues, not related to any specific patch. Thanks for
keeping those issues in the limelight.

> > == Path Minimization ==
> >
> > We want to be able to minimize and control the path of data transfer,
> > * so that the current master doesn't have initiate transfer to all
> > dependent nodes, thereby reducing overhead on master
> > * so that if the path from current master to descendent is expensive we
> > would minimize network costs.
> >
> > This requirement is commonly known as "relaying".
> >
> > In its most simply stated form, we want one standby to be able to get
> > WAL data from another standby. e.g. M ->  S ->  S. Stating the problem in
> > that way misses out on the actual requirement, since people would like
> > the arrangement to be robust in case of failures of M or any S. If we
> > specify the exact arrangement of paths then we need to respecify the
> > arrangement of paths if a server goes down.
> 
> Are we going to allow these paths to be reconfigured on a live cluster? 
> If we have M->S1->S2 and we want to reconfigure S2 to read from M then 
> S2 needs to get the data that has already been committed on S1 from 
> somewhere (either S1 or M).  This has solutions but it adds to the 
> complexity.  Maybe not for 9.1

If you switch from M -> S1 -> S2 to M -> (S1, S2) it should work fine.
At the moment that needs a shutdown/restart, but that could easily be
done with a disconnect/reconnect following a file reload.

The problem is how much WAL is stored on (any) node. Currently that is
wal_keep_segments, which doesn't work very well, but I've seen no better
ideas that cover all important cases.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services




pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: Re: Issues with Quorum Commit
Next
From: Dimitri Fontaine
Date:
Subject: Re: is sync rep stalled?