HA and Replication - how to choose among all the available solutions - Mailing list pgsql-general

From Pascal Cohen
Subject HA and Replication - how to choose among all the available solutions
Date
Msg-id 47508107.5090900@wimba.com
Whole thread Raw
Responses Re: HA and Replication - how to choose among all the available solutions  (Pascal Cohen <pcohen@wimba.com>)
Re: HA and Replication - how to choose among all the available solutions  (Andrew Sullivan <ajs@crankycanuck.ca>)
List pgsql-general
Hello

I am facing a probably very common problem. I made a search in the
recent archives and could find many posts related to my issue. But I did
not get exactly "the answer" to my question.
How can I chose among the different existing tools to enable High
Availability and Load-balancing/Replication.
In fact our situation is common. We have a DB that is growing and we
need to ensure High-availability.
That needs some standby solution.
On the other hand we have few updates and many reads and when we are
trying some simulations, the number of requests may become too high for
our server (I mean requests fast - less than 4ms with explain run-
starts to be logged with a threshold to 200ms) so we have thought to
replication to balance the request load among several machines. We can
afford some delay among the DB (which means sync replication is welcome
but not mandatory and should probably be avoided if we want to improve
the perfs).
I had mainly a look at several solutions like:
PGCluster - but I could not get some fresh stuff
Slony-I - seems mature and used
PgPool-II - active and interesting project
PostgreSQL-R - was not able to test it.
C-JDBC/Sequoia.

I mainly focused on PgPool-II and Slony which seemed to be active projects.
I also looked at the Log-Shipping section in PostgreSQL documentation to
ensure high-availability.

Concerning the high-availability issue, I can in fact imagine using
log-shipping, or Slony-I or PgPool-II in multi-master mode with only one
server available.
But I don't know how to chose a solution. I don't have much experience
and can't see precisely the advantages and the drawbacks of each solution.

Concerning Scalability, I think I would need PgPool-II connected in
front to at least load balance select to the different servers. But I
can use the "multi-master" solution or rely on Slony to replicate data
while PgPool balances the select and only sends updates to the Master.
In fact that last solution makes me more comfortable because I fear that
multi-master can lead to complex conflict resolution while a single
master avoids such concern.
With a single master I can't have any data mismatch case I was able to
simulate with PgPool-II (inserting manually in a DB a duplicate value or
adding a trigger to simulate an exception).
What is the experience concerning that kind of question?

Mixing both Scalability and High-availability: with PgPool-II running in
"multi-master" solution, no real issue. With single master it is less
obvious :(
I had imagined a mixed solution using a Master relying on a warm standby
using log-shipping and Linux-HB while Replicated slaves would be
replicated using Slony and load balancing would be managed by PgPool-II.
But perhaps it is stupid and using only PgPool-II is safe enough.
And why should I chose Slony instead of Log-shipping to update the slaves ?

The hardest in the situation is to make the (or at least a) good choice
among all the available solutions and to chose a realistic one

That are many questions in this quite long post.
Thanks for any help or advice you could give me.

Pascal


pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: PL/pgSQL and SETOF
Next
From: "Pavel Stehule"
Date:
Subject: Re: PL/pgSQL and SETOF