Thread: pgpool2 vs sequoia

pgpool2 vs sequoia

From

mljv@planwerk6.de

Date:

02 August 2007, 09:54:23

Hi,

i would like to use a statement replication for postgresql

i have found the following solutions:
- pgpool
- pgpool2
- sequoia (jdbc, formerly known as c-jdbc)

pgpool has only two nodes, so this is not an option.

I will never change my underlying database, of course :-)
So i dont have any advantage of the sequoia jdbc abstraction. i'd rather
sometimes use other tools than jdbc to access the database with some perl
scripts, so this is an advantage for pgpool2. but sequoia looks very nice at
first glance.

What are other advantages/disadvantages in respect of
- performance/scalability
- stability
- support / ease-of-use / etc.

i would be very happy if someone can share his/her expierence with such a
solution.

kind regards,
janning

Re: pgpool2 vs sequoia

From

"Andy Dale"

Date:

02 August 2007, 10:04:16

Hi,

You might also want to check out HA-JDBC at http://ha-jdbc.sourceforge.net

Cheers,

Andy

On 02/08/07, mljv@planwerk6.de <mljv@planwerk6.de> wrote:

Hi,

i would like to use a statement replication for postgresql

i have found the following solutions:
- pgpool
- pgpool2
- sequoia (jdbc, formerly known as c-jdbc)

pgpool has only two nodes, so this is not an option.

I will never change my underlying database, of course :-)
So i dont have any advantage of the sequoia jdbc abstraction. i'd rather
sometimes use other tools than jdbc to access the database with some perl
scripts, so this is an advantage for pgpool2. but sequoia looks very nice at
first glance.

What are other advantages/disadvantages in respect of
- performance/scalability
- stability
- support / ease-of-use / etc.

i would be very happy if someone can share his/her expierence with such a
solution.

kind regards,
janning

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: pgpool2 vs sequoia

From

mljv@planwerk6.de

Date:

02 August 2007, 11:31:08

Am Donnerstag, 2. August 2007 12:04 schrieb Andy Dale:
> Hi,
>
> You might also want to check out HA-JDBC at http://ha-jdbc.sourceforge.net

thanks for this suggestion, so i have three options to choose from:

- pgpool2
- sequoia
- ha-jdbc

Can someone share his experience on these?

kind regards
janning

Re: pgpool2 vs sequoia

From

"Andy Dale"

Date:

02 August 2007, 12:34:46

Hi,

I have some experience with HA-JDBC and on the whole it is pretty good (very easy to setup), with it's only sight weakness being it synchronisation (renders the DB read only) but this is to be improved in the future. I have tried to setup sequioa but it is pretty complex (and more heavyweight than ha-jdbc), and i never quite managed to get it working exactly as i wanted it to.

Cheers,

Andy

On 02/08/07, mljv@planwerk6.de <mljv@planwerk6.de> wrote:

Am Donnerstag, 2. August 2007 12:04 schrieb Andy Dale:
> Hi,
>
> You might also want to check out HA-JDBC at http://ha-jdbc.sourceforge.net

thanks for this suggestion, so i have three options to choose from:

- pgpool2
- sequoia
- ha-jdbc

Can someone share his experience on these?

kind regards
janning

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Re: pgpool2 vs sequoia

From

mljv@planwerk6.de

Date:

03 August 2007, 07:21:21

Am Donnerstag, 2. August 2007 22:37 schrieben Sie:
> On Thu, Aug 02, 2007 at 11:58:40AM +0200, mljv@planwerk6.de wrote:
> > Hi,
> >
> > i would like to use a statement replication for postgresql
>
> Why?

i have read
http://www.postgresql.org/docs/8.2/interactive/high-availability.html

i want 4 synchronous databases with load balancing, so my application has high
performance and high availability.

I am using "Data Partitioning" at this time. Each db has the same schema but
different data at the moment. But i am not very happy with it as few tables
must be kept synchronous and i don't have any failover or load balancing.
"statement replication" is my the only way, to have a synchronous replication
with load balancing, isn't it?. At least the one which i understand best.

kind regards,
Janning

Re: pgpool2 vs sequoia

From

David Fetter

Date:

03 August 2007, 15:50:24

On Fri, Aug 03, 2007 at 09:25:41AM +0200, mljv@planwerk6.de wrote:
> Am Donnerstag, 2. August 2007 22:37 schrieben Sie:
> > On Thu, Aug 02, 2007 at 11:58:40AM +0200, mljv@planwerk6.de wrote:
> > > Hi,
> > >
> > > i would like to use a statement replication for postgresql
> >
> > Why?
>
> i have read
> http://www.postgresql.org/docs/8.2/interactive/high-availability.html
>
> i want 4 synchronous databases with load balancing, so my
> application has high performance and high availability.

Very few people actually need synchronous replication, and those who
do buy Oracle's RAC (and curse it) or use DB2's offering (and also
curse it ;).  For most purposes, fast asynchronous replication is good
enough.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

Re: pgpool2 vs sequoia

From

"Joshua D. Drake"

Date:

03 August 2007, 16:02:58

David Fetter wrote:
> On Fri, Aug 03, 2007 at 09:25:41AM +0200, mljv@planwerk6.de wrote:
>> Am Donnerstag, 2. August 2007 22:37 schrieben Sie:
>>> On Thu, Aug 02, 2007 at 11:58:40AM +0200, mljv@planwerk6.de wrote:
>>>> Hi,
>>>>
>>>> i would like to use a statement replication for postgresql
>>> Why?
>> i have read
>> http://www.postgresql.org/docs/8.2/interactive/high-availability.html
>>
>> i want 4 synchronous databases with load balancing, so my
>> application has high performance and high availability.

synchronous replication <> high performance without lots of $$$$$$

Use a caching server or something like plproxy.

>
> Very few people actually need synchronous replication, and those who
> do buy Oracle's RAC (and curse it) or use DB2's offering (and also
> curse it ;).  For most purposes, fast asynchronous replication is good
> enough.
>
> Cheers,
> David.


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: pgpool2 vs sequoia

From

Markus Schiltknecht

Date:

06 August 2007, 12:16:12

Hi,

David Fetter wrote:
> Very few people actually need synchronous replication, and those who
> do buy Oracle's RAC (and curse it) or use DB2's offering (and also
> curse it ;).  For most purposes, fast asynchronous replication is good
> enough.

While this is certainly true, please keep in mind that async replication
always brings up the potential of conflicts, per definition - no matter
how fast it is.

IMO, it would often be a lot simpler and less expensive to use sync
replication and bite the bullet of a small commit delay (depending on
the interconnect) - but not having to deal with conflicts.

OTOH, of course there's no real (at least no OSS) solution to sync
replication, so this is just theory. I'm trying to change that with
Postgres-R [1].

As a second note, I might add that all of this really only applies to
writing transactions. Read-only transactions are, of course, not
affected by replication and can be balanced across multiple servers with
both types of replication. Only sync replication guarantees consistent
snapshots, though. Which is the reason for conflicts...

But again, this is just gray theory. And practically speaking, I'm
giving you the same general advice: prefer async replication, because
there are solutions, which are mature and used in production.

Regards

Markus

[1]: For more information, see: www.postgres-r.org

Re: pgpool2 vs sequoia

From

mljv@planwerk6.de

Date:

06 August 2007, 17:12:48

Thank you guys! But now i am clueless as before. please, enlighten me:

i need about 200 concurrent db connections at peak time and - at the moment -
i only have cheap hardware (2-4 GB Ram, Dual Opteron CPU, SATA Disks, RAID 1)
My database has a size of 11 GigaByte, largest table has 100.000.000 records
All queries are already optimized, i do regular reindex and clusetring of
tables and indizes.

the last few years we ran with horizontal partitioning. i always ran into
problems with horizontal partioning because few tables must be shared across
the databases and sometimes things are moving and i got lot of trouble with
my primary keys being the same on different nodes.

We have only few write operations and almost no write operations at peak time
We have lots of read operations and a peak time with even more read
operations.  We are now using Hibernate/JDBC to connect to the database
(before we used PHP)

At the moment i see the following solutions:
1 synchronous replication: pgpool2 (or sequoia)
2 horizontal partitioning
3 better hardware
4 asynchronous replication: slony

option 1 looks so easy and smart so i decided to go with solution 1. But you
told me not to do so.

option 2 was already used and had lots of problems. And it's not an easy task
to use horizontal partioning with hibernate (it works but it's not
trustworthy IMHO)

option 3 would be nice but my hosting provider has only cheap hardware and as
we have not much experience with enterprise hardware we fear to buy something
really expensive.

So your advice is option 4 and use "Master-Slave Replication" with Slony-I,
right? But i fear slony. i don't know why. (BTW: Is there a PDF version of
slony documentation anywhere? )

Or is pgpool2 still a reasonable solution in my setup? It looks so smart and
so easy to setup.

kind regards,
janning

Am Montag, 6. August 2007 14:16 schrieb Markus Schiltknecht:
> Hi,
>
> David Fetter wrote:
> > Very few people actually need synchronous replication, and those who
> > do buy Oracle's RAC (and curse it) or use DB2's offering (and also
> > curse it ;).  For most purposes, fast asynchronous replication is good
> > enough.
>
> While this is certainly true, please keep in mind that async replication
> always brings up the potential of conflicts, per definition - no matter
> how fast it is.
>
> IMO, it would often be a lot simpler and less expensive to use sync
> replication and bite the bullet of a small commit delay (depending on
> the interconnect) - but not having to deal with conflicts.
>
> OTOH, of course there's no real (at least no OSS) solution to sync
> replication, so this is just theory. I'm trying to change that with
> Postgres-R [1].
>
> As a second note, I might add that all of this really only applies to
> writing transactions. Read-only transactions are, of course, not
> affected by replication and can be balanced across multiple servers with
> both types of replication. Only sync replication guarantees consistent
> snapshots, though. Which is the reason for conflicts...
>
> But again, this is just gray theory. And practically speaking, I'm
> giving you the same general advice: prefer async replication, because
> there are solutions, which are mature and used in production.
>
> Regards
>
> Markus
>
> [1]: For more information, see: www.postgres-r.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: pgpool2 vs sequoia

From

"Alexander Staubo"

Date:

06 August 2007, 18:06:18

On 8/6/07, mljv@planwerk6.de <mljv@planwerk6.de> wrote:
> the last few years we ran with horizontal partitioning. i always ran into
> problems with horizontal partioning because few tables must be shared across
> the databases and sometimes things are moving and i got lot of trouble with
> my primary keys being the same on different nodes.

Note that pgpool2 can be used to implement transparent horizontal
partitioning. Have you looked at it?

> At the moment i see the following solutions:
> 1 synchronous replication: pgpool2 (or sequoia)
> 2 horizontal partitioning
> 3 better hardware
> 4 asynchronous replication: slony

For #1 there's also PGCluster (which, incidentally, is not the same as
PGCluster-II, a shared-disk solution), which does synchronous
multimaster replication.

The project has historically looked a bit dead, but they just released
a new version and moved to a Trac-based web site at
http://www.pgcluster.org/.

One major downside to PGCluster is that it uses a modified version of
PostgreSQL, and it usually lags a few releases behind.

Alexander.