Re: Proposal: Commit timestamp - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: Proposal: Commit timestamp
Date
Msg-id 45C5F673.2070307@Yahoo.com
Whole thread Raw
In response to Re: Proposal: Commit timestamp  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: Proposal: Commit timestamp  (Theo Schlossnagle <jesus@omniti.com>)
List pgsql-hackers
On 2/4/2007 3:16 AM, Peter Eisentraut wrote:
> Jan Wieck wrote:
>> This is all that is needed for last update wins resolution. And as
>> said before, the only reason the clock is involved in this is so that
>> nodes can continue autonomously when they lose connection without
>> conflict resolution going crazy later on, which it would do if they
>> were simple counters. It doesn't require microsecond synchronized
>> clocks and the system clock isn't just used as a Lamport timestamp.
> 
> Earlier you said that "one assumption is that all servers in the 
> multimaster cluster are ntp synchronized", which already rung the alarm 
> bells in me.  Now that I read this you appear to require 
> synchronization not on the microsecond level but on some level.  I 
> think that would be pretty hard to manage for an administrator, seeing 
> that NTP typically cannot provide such guarantees.

Synchronization to some degree is wanted to avoid totally unexpected 
behavior. The conflict resolution algorithm itself can perfectly fine 
live with counters, but I guess you wouldn't want the result of it. If 
you update a record on one node, then 10 minutes later you update the 
same record on another node. Unfortunately, the nodes had no 
communication and because the first node is much busier, its counter is 
way advanced ... this would mean the 10 minutes later update would get 
lost in the conflict resolution when the nodes reestablish 
communication. They would have the same data at the end, just not what 
any sane person would expect.

This behavior will kick in whenever the cross node conflicting updates 
happen close enough so that the time difference between the clocks can 
affect it. So if you update the logical same row on two nodes within a 
tenth of a second, and the clocks are more than that apart, the conflict 
resolution can result in the older row to survive. Clock synchronization 
is simply used to minimize this.

The system clock is used only to keep the counters somewhat synchronized 
in the case of connection loss to retain some degree of "last update" 
meaning. Without that, continuing autonomously during a network outage 
is just not practical.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: libpq docs about PQfreemem
Next
From: Theo Schlossnagle
Date:
Subject: Re: Proposal: Commit timestamp