Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers

From Markus Wanner
Subject Re: Sync Rep: First Thoughts on Code
Date
Msg-id 4943ED9D.3020005@bluegap.ch
Whole thread Raw
In response to Re: Sync Rep: First Thoughts on Code  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Sync Rep: First Thoughts on Code  ("Robert Haas" <robertmhaas@gmail.com>)
Re: Sync Rep: First Thoughts on Code  (Aidan Van Dyk <aidan@highrise.ca>)
List pgsql-hackers
Hi,

Simon Riggs wrote:
> On Sat, 2008-12-13 at 14:07 +0100, Markus Wanner wrote:
>> Speaking of a "synchronous commit"
>> is utterly misleading, because the commit itself is exactly the thing
>> that's *not* synchronous.
> 
> Not really sure where you're going here.

I'm pointing to a potential misunderstanding, trying to help to prevent
you from running into the same issues and discussions as I did.

I've learned the hard way, that the Postgres-R algorithm is not fully
synchronous (in the strict sense). This caused confusion for people who
take the word "synchronous" by its original meaning. The algorithm
proposed here seems similar enough to potentially cause the same confusion.

As I see it now, I think it's well worth to point out the difference,
from both, the technical as well as from the marketing perspective. The
former for better understanding, the later to prevent users from
thinking it must be slow per definition. Arguing that your approach is
not fully synchronous definitely helps defending that concern.

However, I'm just now realizing, that the difference is only relevant as
soon as you begin to allow read-only access on the slave. AFAIK that's
among the goals of this effort, no?

> "synchronous replication" is
> used exactly as described in the Wikipedia entry here:
> http://en.wikipedia.org/wiki/Database_replication

That article describes pretty much all variants of replication, what
exactly are you referring to?

Under "Database Replication > Multi-Master replication" it describes
eager vs lazy variants, which is IMO a more appropriate and useful
distinction than sync vs async. (But that's admittedly a sentence I've
contributed myself, IIRC).

Under "Storage Replication > Synchronous Replication" one can read:
"Write is not considered complete until acknowledgement by both local
and remote storage." For the proposed approach this might hold true for
WAL writing. However, the user certainly doesn't care how synchronous
the log is shipped nor written, is as long as she doesn't see the
changes on the slave.

That's the difference between fully synchronous and eager (or virtually
or approximately synchronous) algorithms. You seem to refer to both as
"synchronous". Phrases like "synchronous commit" or "synchronous data
transfer" do not help me to understand what exactly you are talking about.

Explaining that the slave commits (and therefore makes the transactions
visible) asynchronously would help. And it would prevent disappointment
for users who expect changes to be immediately visible on the slave.

> No two word phrase is going to accurately sum up the complexity and
> potential for data loss in these situations. DRBD saw that too and just
> called them A, B and C and then describe them more accurately.

Agreed. I've chosen lazy, eager and sync, so far. I'm open for better
terms, and I leave it up to you to call your variants whatever you like.
But to understand what you are talking about, I'd prefer to get to know
these distinctions crisp and clear.

> But I don't think we should say "PostgreSQL just implemented algorithm
> B" which is just unhelpful. I don't think its "marketing" to refer to it
> by the phrase most commonly used for the technology we are building.

I certainly agree to using such terms. Unfortunately, in my experience,
synchronous replication is commonly used to mean that transactions are
guaranteed to be immediately visible on remote nodes after the client
got commit acknowledgment. That's the cause for confusion I'm envisioning.


I'm hoping to be somewhat helpful to this effort of getting a log
shipping replication variant into Postgres. It can only be beneficial
for Postgres-R in that we gain field experience with ..uhm.. this
special kind of replication, however we name it.

I'm already on xmas vacation, so I won't bother you any further on this
issue. Have fun coding and make sure to enjoy this time of the year.

All the best.

Markus Wanner



pgsql-hackers by date:

Previous
From: Grzegorz Jaskiewicz
Date:
Subject: Re: WIP: default values for function parameters
Next
From: "Robert Haas"
Date:
Subject: Re: WIP: default values for function parameters