Re: Sync Rep at Oct 5 - Mailing list pgsql-hackers

From Steve Singer
Subject Re: Sync Rep at Oct 5
Date
Msg-id 4CAB4496.4080408@ca.afilias.info
Whole thread Raw
In response to Sync Rep at Oct 5  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Sync Rep at Oct 5  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Sync Rep at Oct 5  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 10-10-05 04:32 AM, Simon Riggs wrote:
>
> This is an attempt to compile everybody's stated viewpoints and come to
> an understanding about where we are and where we want to go. The idea
> from here is that we discuss what we are trying to achieve
> (requirements) and then later come back to how (design).

Great start on summarizing the discussions.  Getting a summary of the 
requirements in one place will help people who haven't been diligent in 
following all the sync-rep email threads stay involved.

<snip>

> == Failover Configuration Minimisation ==
>
> An important aspect of robustness is the ability to specify a
> configuration that will remain in place even though 1 or more servers
> have gone down.
>
> It is desirable to specify sync rep requirements such that we do not
> refer to individual servers, if possible. Each such rule necessarily
> requires an "else" condition, possibly multiple else conditions.
>
> It is desirable to avoid both of these
> * the need to have different configuration files on each node
> * the need to have configurations that only become active in case of
> failure. These are known to be hard to test and very likely to be
> misconfigured in the event of failover [I know a bank that was down for
> a whole week when standby server's config was wrong and had never been
> fully tested. The error was simple and obvious, but the fault showed
> itself as a sporadic error that was difficult to trace]
>

Also on the topic of failover how do we want to deal with the master 
failing over.   Say M->{S1,S2} and M fails and we promote S1 to M1.  Can 
M1->S2?     What if S2 was further along in processing than S1 when M 
failed?  I don't think we want to take on this complexity for 9.1 but 
this means that after M fails you won't have a synchronous replica until 
you rebuild or somehow reset S2.




> == Sync Rep Performance ==
>
> Sync Rep is a potential performance hit, and that hit is known to
> increase as geographical distance increases.
>
> We want to be able to specify the performance of some nodes so that we
> have 4 levels of robustness:
> async - doesn't wait for sync
> recv - syncs when messages received by standby
> fsync - syncs when messages written to disk by standby
> apply - sync when messages applied to standby

Will read-only queries running on a slave hold up transactions from 
being applied on that slave?   I suspect that for most people running 
with 'apply' they would want the answer to be 'no'.  Are we going to 
revisit the standby query cancellation discussion?



> == Path Minimization ==
>
> We want to be able to minimize and control the path of data transfer,
> * so that the current master doesn't have initiate transfer to all
> dependent nodes, thereby reducing overhead on master
> * so that if the path from current master to descendent is expensive we
> would minimize network costs.
>
> This requirement is commonly known as "relaying".
>
> In its most simply stated form, we want one standby to be able to get
> WAL data from another standby. e.g. M ->  S ->  S. Stating the problem in
> that way misses out on the actual requirement, since people would like
> the arrangement to be robust in case of failures of M or any S. If we
> specify the exact arrangement of paths then we need to respecify the
> arrangement of paths if a server goes down.

Are we going to allow these paths to be reconfigured on a live cluster? 
If we have M->S1->S2 and we want to reconfigure S2 to read from M then 
S2 needs to get the data that has already been committed on S1 from 
somewhere (either S1 or M).  This has solutions but it adds to the 
complexity.  Maybe not for 9.1






pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: top-level DML under CTEs
Next
From: Simon Riggs
Date:
Subject: Re: standby registration (was: is sync rep stalled?)