Thread: postgresql roadmap for horizontal scalability?

postgresql roadmap for horizontal scalability?

From

"Timasmith"

Date:

03 December 2006, 18:24:18

Hi,

This database has exceeded all expectations and perhaps I am getting
ahead of myself but is there anything in the roadmap to match Oracles
ability to cluster databases across multiple servers for concurrent
access?

The reason I ask is for the architecture of an open source database
project I am working on.  It is tempting to ignore caching altogether
and push the onus on to the database to perform more database queries.
This is makes life easier and allows the focus to be on the business
logic.

However if Postgresql is the datababase and the only means to scale is
to buy a bigger single server then that could become a problem of cost
to the community.  It would be cheaper to scale across multiple smaller
servers (I think).

The application I am working on is OLTP with say 90% reads to writes
ratio.

Re: postgresql roadmap for horizontal scalability?

From

Josh Berkus

Date:

03 December 2006, 18:47:26

Timasmith,

> This database has exceeded all expectations and perhaps I am getting
> ahead of myself but is there anything in the roadmap to match Oracles
> ability to cluster databases across multiple servers for concurrent
> access?

There are several efforts to do *better than* RAC, which is actually not very 
effective (RAC scales very poorly, and not at all for some applications).  
These include pgPool, Skytools, pgCluster, Replicator, Postgres-R and 
Sequoia.

> The reason I ask is for the architecture of an open source database
> project I am working on.  It is tempting to ignore caching altogether
> and push the onus on to the database to perform more database queries.
> This is makes life easier and allows the focus to be on the business
> logic.

Yes, except even on a clustered system proper use of caching will *always* at 
least double database peformance.  There's simply no way to make an ACID 
RDBMS anywhere near as fast as a simple thing like a cache.

> However if Postgresql is the datababase and the only means to scale is
> to buy a bigger single server then that could become a problem of cost
> to the community.  It would be cheaper to scale across multiple smaller
> servers (I think).

Yes, it would.  However, scalable OLTP clustering is a problem not yet solved 
by anyone in the industry.  It may not be solvable at all.

> The application I am working on is OLTP with say 90% reads to writes
> ratio.

Hmmm ... that makes it more like web than OLTP.  You may be able to use a 
query replication system like pgPool or Sequoia.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: postgresql roadmap for horizontal scalability?

From

"Mike Rylander"

Date:

03 December 2006, 19:40:17

On 1 Dec 2006 09:43:57 -0800, Timasmith <timasmith@hotmail.com> wrote:
> Hi,
>
> This database has exceeded all expectations and perhaps I am getting
> ahead of myself but is there anything in the roadmap to match Oracles
> ability to cluster databases across multiple servers for concurrent
> access?
>

My preferred solution is Slony-1 + pgPool (more below).

> The reason I ask is for the architecture of an open source database
> project I am working on.  It is tempting to ignore caching altogether
> and push the onus on to the database to perform more database queries.
> This is makes life easier and allows the focus to be on the business
> logic.
>
> However if Postgresql is the datababase and the only means to scale is
> to buy a bigger single server then that could become a problem of cost
> to the community.  It would be cheaper to scale across multiple smaller
> servers (I think).

It surely is, depending on your sync speed requirements.

>
> The application I am working on is OLTP with say 90% reads to writes
> ratio.
>

That's very similar to the read/write ratio of my app.  Here's what we do:

We've set up 1 master read/write DB 2 (soon to be 3) slave read-only DBs cascading pgPool setup in "external
replication"mode

We spread the load over all machines using the built in pgPool
weighting algorithm, giving the master machine about 1/6 of the read
only traffic (it also gets all in-transaction traffic per the pgPool
config) and spread the rest of the load over the other machines.
Behind the cluster, taking care of keeping everyone in sync, is a
simple Slony-1 setup.  It's fairly simple to get going once you wrap
your head around it, and we've yet to see a single problem that we
didn't cause by not reading the documentation.

One thing to watch out for, though, is the replication lag.  We see
anywhere from 0.5 to 3 seconds of lag on our setup, and we expect to
start seeing a little more when we add the third slave machine.  The
one sure way to counteract this is to always go to the master whenever
you're building a page that changes any data.  Those should be few and
far between, realistically, but will cause more than the expected load
that the pgPool config would suggest.  Giving the vast majority of the
read-only load to the slaves has mitigated the load issue for us, and
I expect would do the same for you.

Hope that helps. :)

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

-- 
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: postgresql roadmap for horizontal scalability?

From

"Timasmith"

Date:

04 December 2006, 14:37:47

"Mike Rylander" wrote:
> On 1 Dec 2006 09:43:57 -0800, Timasmith <timasmith@hotmail.com> wrote:
> > Hi,
> >
> > This database has exceeded all expectations and perhaps I am getting
> > ahead of myself but is there anything in the roadmap to match Oracles
> > ability to cluster databases across multiple servers for concurrent
> > access?
> >
>
> My preferred solution is Slony-1 + pgPool (more below).
>
> > The reason I ask is for the architecture of an open source database
> > project I am working on.  It is tempting to ignore caching altogether
> > and push the onus on to the database to perform more database queries.
> > This is makes life easier and allows the focus to be on the business
> > logic.
> >
> > However if Postgresql is the datababase and the only means to scale is
> > to buy a bigger single server then that could become a problem of cost
> > to the community.  It would be cheaper to scale across multiple smaller
> > servers (I think).
>
> It surely is, depending on your sync speed requirements.
>
> >
> > The application I am working on is OLTP with say 90% reads to writes
> > ratio.
> >
>
> That's very similar to the read/write ratio of my app.  Here's what we do:
>
> We've set up
>   1 master read/write DB
>   2 (soon to be 3) slave read-only DBs
>   cascading pgPool setup in "external replication" mode
>
> We spread the load over all machines using the built in pgPool
> weighting algorithm, giving the master machine about 1/6 of the read
> only traffic (it also gets all in-transaction traffic per the pgPool
> config) and spread the rest of the load over the other machines.
> Behind the cluster, taking care of keeping everyone in sync, is a
> simple Slony-1 setup.  It's fairly simple to get going once you wrap
> your head around it, and we've yet to see a single problem that we
> didn't cause by not reading the documentation.
>
> One thing to watch out for, though, is the replication lag.  We see
> anywhere from 0.5 to 3 seconds of lag on our setup, and we expect to
> start seeing a little more when we add the third slave machine.  The
> one sure way to counteract this is to always go to the master whenever
> you're building a page that changes any data.  Those should be few and
> far between, realistically, but will cause more than the expected load
> that the pgPool config would suggest.  Giving the vast majority of the
> read-only load to the slaves has mitigated the load issue for us, and
> I expect would do the same for you.
>
> Hope that helps. :)
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
> >
>
>
> --
> Mike Rylander
> mrylander@gmail.com
> GPLS -- PINES Development
> Database Developer
> http://open-ils.org
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

That is awesome.  I am not worried about the time to replicate - this
is supposed to be static, very infrequently changing data (like form
controls, security etc.)  so there is no expectation changes are real
time.

I am envisioning a powerful,.highly available central server and then a
slew of cheap servers.

It is not so much that I would not using caching at all but it does
open the door to focus on 'fat client caching' only - that reduces
introducing synchronized code (that may introduce errors) on the
application server.

Re: postgresql roadmap for horizontal scalability?

From

"Andrew Hammond"

Date:

04 December 2006, 15:57:10

Timasmith wrote:
>
> That is awesome.  I am not worried about the time to replicate - this
> is supposed to be static, very infrequently changing data (like form
> controls, security etc.)  so there is no expectation changes are real
> time.

Be very sure that this is covered in the specifications or you may end
up with PHB types asking you why the changes aren't propagated in real
time.

> I am envisioning a powerful,.highly available central server and then a
> slew of cheap servers.

If you intend to use the slony based approach above, then I suggest
that your cheap servers need to be reliable. In the event of a single
node's failure, cruft will begin accumulating on all the nodes until
that node is either dropped or brought back online and catches back up.

Drew

Re: postgresql roadmap for horizontal scalability?

From

Bruce Momjian

Date:

04 December 2006, 21:03:07

Is there any of this that is not already in the 8.2 docs?
http://developer.postgresql.org/pgdocs/postgres/high-availability.html

---------------------------------------------------------------------------

Timasmith wrote:
> "Mike Rylander" wrote:
> > On 1 Dec 2006 09:43:57 -0800, Timasmith <timasmith@hotmail.com> wrote:
> > > Hi,
> > >
> > > This database has exceeded all expectations and perhaps I am getting
> > > ahead of myself but is there anything in the roadmap to match Oracles
> > > ability to cluster databases across multiple servers for concurrent
> > > access?
> > >
> >
> > My preferred solution is Slony-1 + pgPool (more below).
> >
> > > The reason I ask is for the architecture of an open source database
> > > project I am working on.  It is tempting to ignore caching altogether
> > > and push the onus on to the database to perform more database queries.
> > > This is makes life easier and allows the focus to be on the business
> > > logic.
> > >
> > > However if Postgresql is the datababase and the only means to scale is
> > > to buy a bigger single server then that could become a problem of cost
> > > to the community.  It would be cheaper to scale across multiple smaller
> > > servers (I think).
> >
> > It surely is, depending on your sync speed requirements.
> >
> > >
> > > The application I am working on is OLTP with say 90% reads to writes
> > > ratio.
> > >
> >
> > That's very similar to the read/write ratio of my app.  Here's what we do:
> >
> > We've set up
> >   1 master read/write DB
> >   2 (soon to be 3) slave read-only DBs
> >   cascading pgPool setup in "external replication" mode
> >
> > We spread the load over all machines using the built in pgPool
> > weighting algorithm, giving the master machine about 1/6 of the read
> > only traffic (it also gets all in-transaction traffic per the pgPool
> > config) and spread the rest of the load over the other machines.
> > Behind the cluster, taking care of keeping everyone in sync, is a
> > simple Slony-1 setup.  It's fairly simple to get going once you wrap
> > your head around it, and we've yet to see a single problem that we
> > didn't cause by not reading the documentation.
> >
> > One thing to watch out for, though, is the replication lag.  We see
> > anywhere from 0.5 to 3 seconds of lag on our setup, and we expect to
> > start seeing a little more when we add the third slave machine.  The
> > one sure way to counteract this is to always go to the master whenever
> > you're building a page that changes any data.  Those should be few and
> > far between, realistically, but will cause more than the expected load
> > that the pgPool config would suggest.  Giving the vast majority of the
> > read-only load to the slaves has mitigated the load issue for us, and
> > I expect would do the same for you.
> >
> > Hope that helps. :)
> >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Have you searched our list archives?
> > >
> > >                http://archives.postgresql.org
> > >
> >
> >
> > --
> > Mike Rylander
> > mrylander@gmail.com
> > GPLS -- PINES Development
> > Database Developer
> > http://open-ils.org
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> 
> That is awesome.  I am not worried about the time to replicate - this
> is supposed to be static, very infrequently changing data (like form
> controls, security etc.)  so there is no expectation changes are real
> time.
> 
> I am envisioning a powerful,.highly available central server and then a
> slew of cheap servers.
> 
> It is not so much that I would not using caching at all but it does
> open the door to focus on 'fat client caching' only - that reduces
> introducing synchronized code (that may introduce errors) on the
> application server.
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
> 
>                 http://www.postgresql.org/about/donate

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: postgresql roadmap for horizontal scalability?

From

Peter Eisentraut

Date:

06 December 2006, 14:00:47

Am Freitag, 1. Dezember 2006 18:43 schrieb Timasmith:
> This database has exceeded all expectations and perhaps I am getting
> ahead of myself but is there anything in the roadmap to match Oracles
> ability to cluster databases across multiple servers for concurrent
> access?

I have got some ideas on how to get there, but it would likely be a multi-year 
effort to complete.  In the meantime, there are other solutions available 
which may work better or worse than what Oracle has.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: postgresql roadmap for horizontal scalability?

From

"Simon Riggs"

Date:

10 December 2006, 22:20:19

On Fri, 2006-12-01 at 09:43 -0800, Timasmith wrote:

> This database has exceeded all expectations 
> and perhaps I am getting
> ahead of myself but is there anything in the roadmap to match Oracles
> ability to cluster databases across multiple servers for concurrent
> access?

My experience of Oracle clustering is that database design is incredibly
difficult and requires very careful segregation of transactions to make
it work well. RAC is much better than Oracle Parallel Server (OPS) which
had such a (deservedly) bad reputation they had to change the name. Some
applications work well with it, others work much worse than a
make-one-large-SMP approach, but its fairly hard to be certain which it
will be for any particular app.

So, my take on clustering is that it will be a huge drain on development
resources to get it to work reasonably well (Oracle took 10 years and 3
major versions, note) and that other explicit data
duplication/replication techniques are almost as effective. They are
available now.

I think its clustering is do-able and will come, but not for a while
yet, methinks.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com