Re: Proposal: Commit timestamp - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: Proposal: Commit timestamp
Date
Msg-id 45CA7926.6070905@Yahoo.com
Whole thread Raw
In response to Re: Proposal: Commit timestamp  (Markus Schiltknecht <markus@bluegap.ch>)
Responses Re: Proposal: Commit timestamp
List pgsql-hackers
On 2/7/2007 12:54 PM, Markus Schiltknecht wrote:
> Hi,
> 
> Jan Wieck wrote:
>> Are we still discussing if the Postgres backend may provide support for
>> a commit timestamp, that follows the rules for Lamport timestamps in a
>> multi-node cluster?
> 
> No. And I think you know my opinion about that by now. ;-)

Then let me give you a little puzzle just for the fun of it.

A database containing customer contact information (among other things) 
is a two node multimaster system. One is serving the customer web 
portal, the other is used by the company staff including the call 
center. At 13:45 the two servers lose connectivity to each other, yet 
the internal staff can access the internal server while the web portal 
is accessible from the outside. At 13:50 customer A updates their credit 
card information through the web portal, while customer B does the same 
through the call center. At 13:55 both customers change their mind to 
use yet another credit card, now customer A phones the call center while 
customer B does it via the internet.

At 14:00 the two servers reconnect and go through the conflict 
resolution. How do you intend to solve both conflicts without using any 
"clock", because that seems to be a stopword causing instant rejection 
of whatever you propose. Needless to say, both customers will be 
dissatisfied if you charge the "wrong" credit card during your next 
billing cycle.


> 
>> It seems more like we are drifting into what type of
>> replication system I should design to please most people.
> 
> Nobody is telling you what you should do. You're free to do whatever you 
> want to.
> 
> I'm only trying to get a discussion going, because a) I'm interested in 
> how you plan to solve these problems and b) in the past, most people 
> were complaining that all the different replication efforts didn't try 
> to work together. I'm slowly trying to open up and discuss what I'm 
> doing with Postgres-R on the lists.

Which is a good discussion because one of the reasons why I stopped 
looking into Postgres-R is the fact that is based on the idea to push 
all the replication information through a system that generates a global 
serialized message queue. That by itself isn't the problem, but the fact 
that implementing a global serialized message queue has serious 
throughput issues that are (among other details) linked to the speed of 
light.

I am trying to start with a system, that doesn't rely on such a 
mechanism for everything. I do intend to add an option later, that 
allows to declare a UNIQUE NOT NULL constraint to be synchronous. What 
that means is, that any INSERT, UPDATE, DELETE and SELECT FOR UPDATE 
will require the node to currently be a member of the (quorum or 
priority defined) majority of the cluster. An advisory lock system, 
based on a total order group communication, will grant the lock to the 
unique key values on a first come, first serve base. Every node in the 
cluster will keep those keys as "locked" until the asynchronous 
replication stream reports the locking transaction as ended. If another 
remote transaction in the meantime requires updating such key, the 
incoming stream from that node will be on hold until the lock is 
cleared. This is to protect agains node B replicating a transaction from 
node A and a later update on node B arrives on C before C got the first 
event from A. A node that got disconnected from the cluster must rebuild 
the current advisory lock list upon reconnecting to the cluster.

I think that this will be a way to overcome Postgres-R's communication 
bottleneck, as well as allowing limited update activity even during a 
completely disconnected state of a node. Synchronous or group 
communication messages are reduced to the cases, where the application 
cannot be implemented in a conflict free way, like allocating a natural 
primary key. There is absolutely no need to synchronize for example 
creating a sales order. An application can use global unique ID's for 
the order number. And everything possibly referenced by an order (items, 
customers, ...) is stored in a way that the references are never 
updated. Deletes to those possibly referenced objects are implemented in 
a two step process, where they are first marked obsolete, and later on 
things that have been marked obsolete for X long are deleted. A REPLICA 
TRIGGER on inserting an order will simply reset the obsolete flag of 
referenced objects. If a node is disconnected longer than X, you have a 
problem - hunt down the guy who defined X.

> Just yesterday at the SFPUG meeting, I've experienced how confusing it 
> is for the users to have such a broad variety of (existing and upcoming) 
> replication solutions. And I'm all for working together and probably 
> even for merging different replication solutions.

Merging certain ideas to come up with an async/sync hybrid? Seems to me 
we have similar enough ideas to need conflict resolution, because we had 
them simultaneously but communicate them asynchronously.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Status of autovacuum and the sporadic stats failures ?
Next
From: Markus Schiltknecht
Date:
Subject: Re: Proposal: Commit timestamp