Thread: terms for database replication: synchronous vs eager

terms for database replication: synchronous vs eager

From

Markus Schiltknecht

Date:

07 September 2007, 12:02:33

Hi,

I'm asking for advice and hints regarding terms in database replication, 
especially WRT Postgres-R. (Sorry for crossposting, but I fear not 
reaching enough people on the Postgres-R ML alone)

I'm struggling on how to classify the Postgres-R algorithm. Up until 
recently, most people thought of it as synchronous replication, but it's 
not synchronous in the strong (and very common) sense. I.e. after a node 
confirms to have committed a transaction, other nodes didn't necessarily 
commit already. (They only promise that they *will* commit without 
conflicts).

This violates the common understanding of synchrony, because you can't 
commit on a node A and then query another node B and expect it be 
coherent immediately.

None the less, Postgres-R is eager (or pessimistic?) in the sense that 
it replicates *before* committing, so as to avoid divergence. In [1] 
I've tried to make that distinction clear, and I'm currently advocating 
for using synchronous only in the very strong (and commonly used) sense. 
I've choosen the word 'eager' to mean 'replicates before committing'.

According to that definitions, Postgres-R is async but eager.

Do these definitions violate any common meaning? Maybe in other areas 
like distributed storage or lock managers?

Regards

Markus

[1]: Terms and Definitions of Database Replication
http://www.postgres-r.org/documentation/terms

Re: terms for database replication: synchronous vs eager

From

Jan Wieck

Date:

13 September 2007, 21:55:19

On 9/7/2007 11:01 AM, Markus Schiltknecht wrote:
> Hi,
> 
> I'm asking for advice and hints regarding terms in database replication, 
> especially WRT Postgres-R. (Sorry for crossposting, but I fear not 
> reaching enough people on the Postgres-R ML alone)
> 
> I'm struggling on how to classify the Postgres-R algorithm. Up until 
> recently, most people thought of it as synchronous replication, but it's 
> not synchronous in the strong (and very common) sense. I.e. after a node 
> confirms to have committed a transaction, other nodes didn't necessarily 
> commit already. (They only promise that they *will* commit without 
> conflicts).
> 
> This violates the common understanding of synchrony, because you can't 
> commit on a node A and then query another node B and expect it be 
> coherent immediately.

That's right. And there is no guarantee about the lag at all. So you can 
find "old" data on node B long after you committed a change to node A.

> None the less, Postgres-R is eager (or pessimistic?) in the sense that 
> it replicates *before* committing, so as to avoid divergence. In [1] 
> I've tried to make that distinction clear, and I'm currently advocating 
> for using synchronous only in the very strong (and commonly used) sense. 
> I've choosen the word 'eager' to mean 'replicates before committing'.>> According to that definitions, Postgres-R is
asyncbut eager.

Postgres-R is an asynchronous replication system by all means. It only 
makes sure that the workset data (that's what Postgres-R calls the 
replication log for one transaction) has been received by a group 
communication system supporting total order and that the group 
communication system decided it to be the transaction that (logically) 
happened before any possibly conflicting concurrent transaction.

This is the wonderful idea how Postgres-R will have a failsafe conflict 
resolution mechanism in an asynchronous system.

I don't know what you associate with the word "eager". All I see is that 
Postgres-R makes sure that some other process, which might still reside 
on the same hardware as the DB, is now in charge of delivery. Nobody 
said that the GC implementation cannot have made the decision about the 
total order of two workset messages and already reported that to the 
local client application before those messages ever got transmitted over 
the wire.

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: terms for database replication: synchronous vs eager

From

Markus Schiltknecht

Date:

14 September 2007, 06:39:37

Hello Jan,

thank you for your feedback.

Jan Wieck wrote:
> On 9/7/2007 11:01 AM, Markus Schiltknecht wrote:
>> This violates the common understanding of synchrony, because you can't 
>> commit on a node A and then query another node B and expect it be 
>> coherent immediately.
> 
> That's right. And there is no guarantee about the lag at all. So you can 
> find "old" data on node B long after you committed a change to node A.

I'm in doubt about the "long after". In practice you'll mostly have 
nodes which perform about equally fast. And as the origin node has to do 
more processing, than a node which solely replays a transaction, it's 
trivial to balance the load.

Additionally, a node which lags behind is unable to commit any 
(conflicting) local transactions before having caught up (due to the GCS 
total ordering). So this is even somewhat self regulating.

> Postgres-R is an asynchronous replication system by all means. It only 
> makes sure that the workset data (that's what Postgres-R calls the 
> replication log for one transaction)

It's most often referred to as the "writeset".

> has been received by a group 
> communication system supporting total order and that the group 
> communication system decided it to be the transaction that (logically) 
> happened before any possibly conflicting concurrent transaction.

Correct. That's as far as the Postgres-R algorithm goes.

I should have been more precise on what I'm talking about, as I'm 
continuing to develop Postgres-R (the software). That might be another 
area where a new name should be introduced to differentiate between 
Postgres-R, the original algorithm and my continuous work on the 
software, implementing the algorithm.

> This is the wonderful idea how Postgres-R will have a failsafe conflict 
> resolution mechanism in an asynchronous system.
> 
> I don't know what you associate with the word "eager".

I'm speaking of the property, that a transaction is replicated before 
commit, so as to avoid later conflicts. IMO, this is the only real 
requirement people have when requesting synchronous replication: most 
people don't need synchrony, but they need reliable commit guarantees.

I've noticed that you are simply speaking of a "failsafe conflict 
resolution mechanism". I dislike that description, because is does not 
say anything about *when* the conflict resolution happens WRT commit. 
And there may well be lazy failsafe conflict resolutions mechanisms 
(i.e. for a counter), which reconciliate after commit.

I'd like to have a simple term, so that we could say: you probably don't 
need fully synchronous replication, but eager replication may already 
serve you well.

> All I see is that 
> Postgres-R makes sure that some other process, which might still reside 
> on the same hardware as the DB, is now in charge of delivery. 

..and Postgres-R waits until that other process confirms the delivery, 
whatever exactly that means. See below.

This delay before commit is important. It is what makes Postgres-R 
eager, according to my definition of it. I'm open for better terms.

> Nobody 
> said that the GC implementation cannot have made the decision about the 
> total order of two workset messages and already reported that to the 
> local client application before those messages ever got transmitted over 
> the wire.

While this is certainly true in theory, it does not make sense in 
practice. It would mean letting the GCS decide on a message ordering 
without having delivered the messages to be ordered. That would be 
troublesome for the GCS, because it could loose an already ordered 
message. Most GCS start their ordering algorithm by sending out the 
message to be ordered.

Anyway, as I've described on -hackers before, I'm intending to decouple 
replication from log writing. Thus not requiring the GCS to provide any 
delivery guarantees at all (GCSs are complicated enough already!). That 
would allow the user to decouple transaction processing nodes from log 
writing nodes. Those tasks have different I/O requirements anyway. And 
what would more that two or three replicas of the transaction logs be 
good for anyway? Think of them as an efficient backup - you won't need 
it until your complete cluster goes down.

Regards

Markus

Re: terms for database replication: synchronous vs eager

From

Chris Browne

Date:

14 September 2007, 12:04:05

JanWieck@Yahoo.com (Jan Wieck) writes:
> On 9/7/2007 11:01 AM, Markus Schiltknecht wrote:
>> None the less, Postgres-R is eager (or pessimistic?) in the sense
>> that it replicates *before* committing, so as to avoid
>> divergence. In [1] I've tried to make that distinction clear, and
>> I'm currently advocating for using synchronous only in the very
>> strong (and commonly used) sense. I've choosen the word 'eager' to
>> mean 'replicates before committing'.
>>
>> According to that definitions, Postgres-R is async but eager.
>
> Postgres-R is an asynchronous replication system by all means. It only
> makes sure that the workset data (that's what Postgres-R calls the
> replication log for one transaction) has been received by a group
> communication system supporting total order and that the group
> communication system decided it to be the transaction that (logically)
> happened before any possibly conflicting concurrent transaction.
>
> This is the wonderful idea how Postgres-R will have a failsafe
> conflict resolution mechanism in an asynchronous system.
>
> I don't know what you associate with the word "eager". All I see is
> that Postgres-R makes sure that some other process, which might still
> reside on the same hardware as the DB, is now in charge of
> delivery. Nobody said that the GC implementation cannot have made the
> decision about the total order of two workset messages and already
> reported that to the local client application before those messages
> ever got transmitted over the wire.

The approach that was going to be taken, in Slony-II, to apply locks
as early as possible so as to find conflicts as soon as possible,
rather than waiting, seems "eager" to me.

But I'm not sure to what extent that notion has been drawn into the
Postgres-R work...
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://www3.sympatico.ca/cbbrowne/slony.html
Rules of the Evil Overlord #37. "If my trusted lieutenant tells me my
Legions of Terror are losing a  battle, I will believe him. After all,
he's my trusted lieutenant." <http://www.eviloverlord.com/>

Re: terms for database replication: synchronous vs eager

From

Markus Schiltknecht

Date:

14 September 2007, 12:59:32

Hi,

Chris Browne wrote:
> The approach that was going to be taken, in Slony-II, to apply locks
> as early as possible so as to find conflicts as soon as possible,
> rather than waiting, seems "eager" to me.

Agreed. WRT locking, one might also call it "pessimistic", but that 
sounds so... negative.

I find the "as soon as possible" bit rather weak, instead it's exactly 
"before the origin node confirms commit". Of course only conflicts which 
could possibly lead to an abort of the transaction in question are taken 
into account. A possible definition may be:
  "Eager replication systems do only confirm the commit of a transaction   after they have checked for cross-node
conflicts,which could require   the transaction to abort.  (While lazy systems may confirm the commit   before)."

Note how much less restrictive that definition is, that that of a fully 
synchronous system.
> But I'm not sure to what extent that notion has been drawn into the> Postgres-R work...

My current variant of Postgres-R goes the very same path, using MVCC 
instead of locking wherever possible (with the very same effect, but 
allowing more concurrency :-) ).

Regards

Markus