Re: Bigtime scaling of Postgresql (cluster and stuff I suppose) - Mailing list pgsql-general

From Markus Schiltknecht
Subject Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)
Date
Msg-id 46D4196C.90401@bluegap.ch
Whole thread Raw
In response to Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)  (Bill Moran <wmoran@potentialtech.com>)
Responses Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)
List pgsql-general
Hi,

Bill Moran wrote:
> First off, "clustering" is a word that is too vague to be useful, so
> I'll stop using it.  There's multi-master replication, where every
> database is read-write, then there's master-slave replication, where
> only one server is read-write and the rest are read-only.  You can
> add failover capabilities to master-slave replication.  Then there's
> synchronous replication, where all servers are guaranteed to get
> updates at the same time.  And asynchronous replication, where other
> servers may take a while to get updates.  These descriptions aren't
> really specific to PostgreSQL -- every database replication system
> has to make design decisions about which approaches to support.

Good explanation!

> Synchronous replication is only
> really used when two servers are right next to each other with a
> high-speed link (probably gigabit) between them.

Why is that so? There's certainly very valuable data which would gain
from an inter-continental database system. For money transfers, for
example, I'd rather wait half a second for a round trip around the
world, to make sure the RDBS does not 'loose' my money.

> PostgreSQL-R is in development, and targeted to allow multi-master,
> asynchronous replication without rewriting your application.  As
> far as I know, it works, but it's still beta.

Sorry, this is nitpicking, but for some reason (see current naming
discussion on -advocacy :-) ), it's "Postgres-R".

Additionally, Postgres-R is considered to be a *synchronous* replication
system, because once you get your commit confirmation, your transaction
is guaranteed to be deliverable and *committable* on all running nodes
(i.e. it's durable and consistent). Or put it another way: asynchronous
systems have to deal with conflicting, but already committed
transactions - Postgres-R does not.

Certainly, this is slightly less restrictive than saying that a
transaction needs to be *committed* on all nodes, before confirming the
commit to the client. But as long as a database session is tied to a
node, this optimization does not alter any transactional semantics. And
despite that limitation, which is mostly the case in reality anyway, I
still consider this to be synchronous replication.

[ To get a strictly synchronous system with Postgres-R, you'd have to
delay read only transactions on a node which hasn't applied all remote
transactions, yet. In most cases, that's unwanted. Instead, a consistent
snapshot is enough, just as if the transaction started *before* the
remote ones which still need to be applied. ]

> BTW: does anyone know of a link that describes these high-level concepts?
> If not, I think I'll write this up formally and post it.

Hm.. somewhen before 8.3 was released, we had lots of discussions on
-docs about the "high availability and replication" section of the
PostgreSQL documentation. I'd have liked to add these fundamental
concepts, but Bruce - rightly - wanted to keep focused on existing
solutions. And unfortunately, most existing solutions are async,
single-master. So explaining all these wonderful theoretic concepts only
to state that there are no real solutions would have been silly.

Regards

Markus


pgsql-general by date:

Previous
From: "A. Kretschmer"
Date:
Subject: Re: One database vs. hundreds?
Next
From: Kevin Kempter
Date:
Subject: Re: One database vs. hundreds?