Re: Proposal: Commit timestamp - Mailing list pgsql-hackers
From | Markus Schiltknecht |
---|---|
Subject | Re: Proposal: Commit timestamp |
Date | |
Msg-id | 45CA8AA8.9070805@bluegap.ch Whole thread Raw |
In response to | Re: Proposal: Commit timestamp (Jan Wieck <JanWieck@Yahoo.com>) |
Responses |
Re: Proposal: Commit timestamp
|
List | pgsql-hackers |
Hi, Jan Wieck wrote: > Then let me give you a little puzzle just for the fun of it. > > A database containing customer contact information (among other things) > is a two node multimaster system. One is serving the customer web > portal, the other is used by the company staff including the call > center. At 13:45 the two servers lose connectivity to each other, yet > the internal staff can access the internal server while the web portal > is accessible from the outside. At 13:50 customer A updates their credit > card information through the web portal, while customer B does the same > through the call center. At 13:55 both customers change their mind to > use yet another credit card, now customer A phones the call center while > customer B does it via the internet. Phew, a mind twister... one customer would already be enough to trigger that sort of conflict... > At 14:00 the two servers reconnect and go through the conflict > resolution. How do you intend to solve both conflicts without using any > "clock", because that seems to be a stopword causing instant rejection > of whatever you propose. Needless to say, both customers will be > dissatisfied if you charge the "wrong" credit card during your next > billing cycle. Correct. But do these cases satisfy storing timestamps to each and every transaction you do? That's what I doubt, not the usefulness of time based conflict resolution for certain cases. You can always add a time based conflict resolution, by adding a timestamp column and decide upon that one. I'd guess that the overall costs are lower that way. But you've withdrawn that proposal already, so... > Which is a good discussion because one of the reasons why I stopped > looking into Postgres-R is the fact that is based on the idea to push > all the replication information through a system that generates a global > serialized message queue. That by itself isn't the problem, but the fact > that implementing a global serialized message queue has serious > throughput issues that are (among other details) linked to the speed of > light. Agreed. Nevertheless, there are use cases for such systems, because they put less limitations to the application. One could even argue, that your above example would be one ;-) > I am trying to start with a system, that doesn't rely on such a > mechanism for everything. I do intend to add an option later, that > allows to declare a UNIQUE NOT NULL constraint to be synchronous. What > that means is, that any INSERT, UPDATE, DELETE and SELECT FOR UPDATE > will require the node to currently be a member of the (quorum or > priority defined) majority of the cluster. Sounds reasonable. > An advisory lock system, > based on a total order group communication, will grant the lock to the > unique key values on a first come, first serve base. Every node in the > cluster will keep those keys as "locked" until the asynchronous > replication stream reports the locking transaction as ended. If another > remote transaction in the meantime requires updating such key, the > incoming stream from that node will be on hold until the lock is > cleared. This is to protect agains node B replicating a transaction from > node A and a later update on node B arrives on C before C got the first > event from A. A node that got disconnected from the cluster must rebuild > the current advisory lock list upon reconnecting to the cluster. Yeah, this is a convenient way to replicate sequences via a GCS. > I think that this will be a way to overcome Postgres-R's communication > bottleneck, as well as allowing limited update activity even during a > completely disconnected state of a node. Synchronous or group > communication messages are reduced to the cases, where the application > cannot be implemented in a conflict free way, like allocating a natural > primary key. There is absolutely no need to synchronize for example > creating a sales order. Agreed, such cases can easily be optimized. But you have to be aware of he limitations these optimizations cause. Postgres-R is much more targeted at very general use cases. > An application can use global unique ID's for > the order number. And everything possibly referenced by an order (items, > customers, ...) is stored in a way that the references are never > updated. Deletes to those possibly referenced objects are implemented in > a two step process, where they are first marked obsolete, and later on > things that have been marked obsolete for X long are deleted. A REPLICA > TRIGGER on inserting an order will simply reset the obsolete flag of > referenced objects. If a node is disconnected longer than X, you have a > problem - hunt down the guy who defined X. Yeah, that's another very nice optimization. Again, as long as you know the limitations, that's all well and fine. > Merging certain ideas to come up with an async/sync hybrid? Seems to me > we have similar enough ideas to need conflict resolution, because we had > them simultaneously but communicate them asynchronously. Huh? Sorry, I didn't get what you're trying to say here. Regards Markus
pgsql-hackers by date: