Re: Bigtime scaling of Postgresql (cluster and stuff I suppose) - Mailing list pgsql-general
From | Markus Schiltknecht |
---|---|
Subject | Re: Bigtime scaling of Postgresql (cluster and stuff I suppose) |
Date | |
Msg-id | 46D4196C.90401@bluegap.ch Whole thread Raw |
In response to | Re: Bigtime scaling of Postgresql (cluster and stuff I suppose) (Bill Moran <wmoran@potentialtech.com>) |
Responses |
Re: Bigtime scaling of Postgresql (cluster and stuff I
suppose)
|
List | pgsql-general |
Hi, Bill Moran wrote: > First off, "clustering" is a word that is too vague to be useful, so > I'll stop using it. There's multi-master replication, where every > database is read-write, then there's master-slave replication, where > only one server is read-write and the rest are read-only. You can > add failover capabilities to master-slave replication. Then there's > synchronous replication, where all servers are guaranteed to get > updates at the same time. And asynchronous replication, where other > servers may take a while to get updates. These descriptions aren't > really specific to PostgreSQL -- every database replication system > has to make design decisions about which approaches to support. Good explanation! > Synchronous replication is only > really used when two servers are right next to each other with a > high-speed link (probably gigabit) between them. Why is that so? There's certainly very valuable data which would gain from an inter-continental database system. For money transfers, for example, I'd rather wait half a second for a round trip around the world, to make sure the RDBS does not 'loose' my money. > PostgreSQL-R is in development, and targeted to allow multi-master, > asynchronous replication without rewriting your application. As > far as I know, it works, but it's still beta. Sorry, this is nitpicking, but for some reason (see current naming discussion on -advocacy :-) ), it's "Postgres-R". Additionally, Postgres-R is considered to be a *synchronous* replication system, because once you get your commit confirmation, your transaction is guaranteed to be deliverable and *committable* on all running nodes (i.e. it's durable and consistent). Or put it another way: asynchronous systems have to deal with conflicting, but already committed transactions - Postgres-R does not. Certainly, this is slightly less restrictive than saying that a transaction needs to be *committed* on all nodes, before confirming the commit to the client. But as long as a database session is tied to a node, this optimization does not alter any transactional semantics. And despite that limitation, which is mostly the case in reality anyway, I still consider this to be synchronous replication. [ To get a strictly synchronous system with Postgres-R, you'd have to delay read only transactions on a node which hasn't applied all remote transactions, yet. In most cases, that's unwanted. Instead, a consistent snapshot is enough, just as if the transaction started *before* the remote ones which still need to be applied. ] > BTW: does anyone know of a link that describes these high-level concepts? > If not, I think I'll write this up formally and post it. Hm.. somewhen before 8.3 was released, we had lots of discussions on -docs about the "high availability and replication" section of the PostgreSQL documentation. I'd have liked to add these fundamental concepts, but Bruce - rightly - wanted to keep focused on existing solutions. And unfortunately, most existing solutions are async, single-master. So explaining all these wonderful theoretic concepts only to state that there are no real solutions would have been silly. Regards Markus
pgsql-general by date: