Re: Replication - Mailing list pgsql-general

From Craig Ringer
Subject Re: Replication
Date
Msg-id 1245719379.32535.22.camel@tillium.localnet
Whole thread Raw
In response to Re: Replication  (Gerry Reno <greno@verizon.net>)
Responses Re: Replication
List pgsql-general
On Mon, 2009-06-22 at 20:48 -0400, Gerry Reno wrote:

> > Anyway, you seem to be unaware that built-in replication for
> > PostgreSQL already is moving along, with an implementation that's just
> > not quite production quality yet, and might make into the next version
> > after 8.4 if things go well.

> No, I'm aware of this basic builtin replication. It was rather
> disappointing to see it moved out of the 8.4 release. We need something
> more that just basic master-slave replication which is all this simple
> builtin replication will provide. We need a real replication solution
> that can handle statement-based and row-based replication. Multi-master
> replication. Full cyclic replication chain setups. Simple master-slave
> just doesn't cut it.

Statement-based replication is, frankly, scary.

Personally I'd only be willing to use it if the database would guarantee
to throw an exception when any statement that may produce different
results on master and slave(s) was issued, like the
limit-without-order-by case mentioned on the MySQL replication docs.

Even then I don't really understand how it can produce consistent
replicas in the face of, say, two concurrent statements both pulling
values from a sequence. There would need to be some sort of side channel
to allow the master to tell the slave about how it allocated values from
the sequence.

My overall sentiment is "ick".

Re multi-master replication, out of interest: what needs does it satisfy
for you that master-slave doesn't?

- Scaling number of clients / read throughput in read-mostly workloads?

- Client-transparent fault-tolerance?

- ... ?

What limitations of master-slave replication with read-only slaves
present roadblocks for you?

- Client must connect to master for writes, otherwise master or slave,
  so must be more aware of connection management

- Client drivers have no way to transparently discover active master,
  must be told master hostname/ip

- ... ?

I personally find it difficult to understand how multi-master
replication can add much to throughput on write-heavy workloads. DBs are
often I/O limited after all, and if each master must write all the
others' changes you may not see much of a performance win in write heavy
environments. So: I presume multi-master replication is useful mainly in
read-mostly workloads ? Or do you expect throughput gains in write-heavy
workloads too?

If the latter, is it really multiple master replication you want rather
than a non-replica clustered database, where writes to one node don't
get replicated to the other nodes, they just get notified via some sort
of cache coherence protocol?

I guess my point is that personally I think it'd be helpful to know
_why_ you need more than what's on offer. What specific features pose
problems or would benefit you, how, and why. Etc.

> > That's probably why it's not on the survey--everybody knows that's
> > important and it's already being worked on actively.
> Ok, I just felt it should still be there. But, I hope development
> understands just how important good replication really is.

"development" appear to be well aware. They're also generally very
willing to accept help, testing, and users who're willing to trial early
efforts. Hint, hint. Donations of paid developer time to work on a
project you find to be commercially important probably wouldn't go
astray either.

--
Craig Ringer


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Replication
Next
From: Andrew Smith
Date:
Subject: Re: Trigger Function and backup