Re: Proposal: Commit timestamp - Mailing list pgsql-hackers

From Theo Schlossnagle
Subject Re: Proposal: Commit timestamp
Date
Msg-id E481BDD9-11A3-406B-8B3C-E04DE26A3350@omniti.com
Whole thread Raw
In response to Re: Proposal: Commit timestamp  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: Proposal: Commit timestamp  (Markus Schiltknecht <markus@bluegap.ch>)
List pgsql-hackers
On Feb 4, 2007, at 1:36 PM, Jan Wieck wrote:

> On 2/4/2007 10:53 AM, Theo Schlossnagle wrote:
>> As the clock must be incremented clusterwide, the need for it to  
>> be  insync with the system clock (on any or all of the systems)  
>> is  obviated.  In fact, as you can't guarantee the synchronicity  
>> means  that it can be confusing -- one expects a time-based clock  
>> to be  accurate to the time.  A counter-based clock has no such  
>> expectations.
>
> For the fourth time, the clock is in the mix to allow to continue  
> during a network outage. All your arguments seem to assume 100%  
> network uptime. There will be no clusterwide clock or clusterwide  
> increment when you lose connection. How does your idea cope with that?

That's exactly what a quorum algorithm is for.

> Obviously the counters will immediately drift apart based on the  
> transaction load of the nodes as soon as the network goes down. And  
> in order to avoid this "clock" confusion and wrong expectation,  
> you'd rather have a system with such a simple, non-clock based  
> counter and accept that it starts behaving totally wonky when the  
> cluster reconnects after a network outage? I rather confuse a few  
> people than having a last update wins conflict resolution that  
> basically rolls dice to determine "last".

If your cluster partition and you have hours of independent action  
and upon merge you apply a conflict resolution algorithm that has  
enormous effect undoing portions of the last several hours of work on  
the nodes, you wouldn't call that "wonky?"

For sane disconnected (or more generally, partitioned) operation in  
multi-master environments, a quorum for the dataset must be  
established.  Now, one can consider the "database" to be the  
dataset.  So, on network partitions those in "the" quorum are allowed  
to progress with data modification and others only read.  However,  
there is no reason why the dataset _must_ be the database and that  
multiple datasets _must_ share the same quorum algorithm.  You could  
easily classify certain tables or schema or partitions into a  
specific dataset and apply a suitable quorum algorithm to that and a  
different quorum algorithm to other disjoint data sets.


// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/




pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Proposal: Commit timestamp
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] Fix "database is ready" race condition