Thread: RFC: Very large scale postgres support

RFC: Very large scale postgres support

From
"Alex J. Avriette"
Date:
Recently I was tasked with creating a "distribution system" for
postgres nodes here at work. This would allow us to simply bring up a
new box, push postgres to it, and have a new database.

At the same time, we have started to approach the limits of what we can
do with postgres on one machine. Our platform presently is the HP
DL380. It is a reasonably fast machine, but in order to eke more
performance out of postgres, we are going to have to upgrade the
hardware substantially.

So the subject came up, wouldn't it be nice if, with replication and
proxies, we could create postgres clusters? When we need more
throughput, to just put a new box in the cluster, dist a psotgres
instance to it, and tell the proxy about it. This is a very attractive
idea for us, from a scalability standpoint. It means that we don't have
to buy $300,000 servers when we max out our 2- or 4- cpu machines (in
the past, I would have suggested a Sun V880 for this database, but we
are using Linux on x86).

We are left with one last option, and that is re-engineering our
application to distribute load across several instances of postgres
which are operating without any real knowledge of eachother. I worry,
though, that as our needs increase further, these application redesigns
will become asymptotic.

I find myself wondering what other people are doing with postgres that
this doesn't seem to have come up. When one searches for postgres
clustering on google, they will find lots of HA products. However,
nobody seems to be attempting to create very high throughput clusters.

I feel that it would be a very good thing if some thinking on this
subject was done. In the future, people will hopefully begin using
postgres for more intense applications. We are looking at perhaps many
tens of billions of transactions per day within the next year or two.
To simply buy a "bigger box" each time we outgrow the one we're on is
not effective nor efficient. I simply don't believe we're the only ones
pushing postgres this hard.

I understand there are many applications out there trying to achieve
replication. Some of them seem fairly promising. However, it seems to
me that if we want to see a true clustered database environment, there
would have to be actual native support in the postmaster (inter
postmaster communication if you will) for replication and
cross-instance locking.

This is obviously a complicated problem, and probably not very many of
us are doing anything near as large-scale as this. However, I am sure
most of us can see the benefit of being able to provide support for
these sorts of applications.

I've just submitted this RFC in the hopes that we can discuss both the
best way to support very large scale databases, as well as how to
handle them presently.

Thanks again for your time.
alex

--
alex@posixnap.net
Alex J. Avriette, Solaris Systems Masseur
"I ... remain against the death penalty because I feel that eternal boredom with no hope of parole is a much worse
punishmentthan just ending it all mercifully with that quiet needle." - Rachel Mills, NC Libertarian Gubernatorial
Candidate


Re: RFC: Very large scale postgres support

From
Andreas Pflug
Date:
Alex J. Avriette wrote:

>
>I feel that it would be a very good thing if some thinking on this
>subject was done. In the future, people will hopefully begin using
>postgres for more intense applications. We are looking at perhaps many
>tens of billions of transactions per day within the next year or two.
>  
>

tens of billions =10e10 per day? This is probably a typo, because this 
would mean > 100,000 requests per second? Do you want to feed a monitor 
with pixel data right from the database, using individual queries for 
each pixel? Or record each irc user's keyclick in the world concurrently 
online in a single database?

Regards,
Andreas





Re: RFC: Very large scale postgres support

From
"Alex J. Avriette"
Date:
On Sun, Feb 08, 2004 at 08:07:14PM +0100, Andreas Pflug wrote:

> >I feel that it would be a very good thing if some thinking on this
> >subject was done. In the future, people will hopefully begin using
> >postgres for more intense applications. We are looking at perhaps many
> >tens of billions of transactions per day within the next year or two.
> > 
> >
> 
> tens of billions =10e10 per day? This is probably a typo, because this 
> would mean > 100,000 requests per second? Do you want to feed a monitor 

That's what I said, and what I meant. Ten billion transactions equates
to 115,740 transactions per second.

> with pixel data right from the database, using individual queries for 
> each pixel? Or record each irc user's keyclick in the world concurrently 
> online in a single database?

Just because you don't think there is a valid use for that sort of
traffic doesn't mean there isn't one. Imagine, if you will, a hundred
thousand agents making four to five requests a second.  Now, imagine
these requests are all going to the same database.

I'll leave the rest of this exercise up to you.

The fact is, there are situations in which such extreme traffic is
warranted. My concern is that I am not able to use postgres in such
situations because it cannot scale to that level. I feel that it would
be possible to reach that level with support in the postmaster for
replication. 

With software load balancing (eg rotors or similar) and updates between
postmasters, it would be (it seems to me) possible to drastically
increase the available capacity of a database installation through the
addition of more nodes. This has the added benefit of allowing us to
distribute network resources.

Alex

--
alex@posixnap.net
Alex J. Avriette, Unix Systems Gladiator
The Emperor Wears No Clothes.
http://www.drugsense.org/wodclock.htm


Re: RFC: Very large scale postgres support

From
Rod Taylor
Date:
> The fact is, there are situations in which such extreme traffic is
> warranted. My concern is that I am not able to use postgres in such
> situations because it cannot scale to that level. I feel that it would
> be possible to reach that level with support in the postmaster for
> replication. 

Replication won't help if those are all mostly write transactions. If a
small percentage, even 1% would be challenging, is INSERTS, UPDATES or
DELETES, master / slave replication might get you somewhere.

Otherwise you're going to need to partition the data up into smaller,
easily managed sizes -- that of course requires an ability to
horizontally partition the data.

Anyway, if you want a sane answer we need more information about the
data (is it partitionable?), schema type, queries producing the load
(simple or complex), acceptable data delays (does a new insert need to
be immediately visible?), etc.

Dealing with a hundred thousand queries/second isn't just challenging to
PostgreSQL, you will be hard pressed to find the hardware that will push
that much data around even with the overhead of the database itself.



Re: RFC: Very large scale postgres support

From
"Alex J. Avriette"
Date:
On Sun, Feb 08, 2004 at 08:01:38PM -0500, Rod Taylor wrote:

> Replication won't help if those are all mostly write transactions. If a
> small percentage, even 1% would be challenging, is INSERTS, UPDATES or
> DELETES, master / slave replication might get you somewhere.

There is no way on earth we could be doing writes at that rate. I think
that's a given.

> Otherwise you're going to need to partition the data up into smaller,
> easily managed sizes -- that of course requires an ability to
> horizontally partition the data.

Obviously, this is the route we have taken.

> Anyway, if you want a sane answer we need more information about the
> data (is it partitionable?), schema type, queries producing the load
> (simple or complex), acceptable data delays (does a new insert need to
> be immediately visible?), etc.

We've considered a lot of this. Like I said, I think a lot of our need
for distributing the database can be helped along with native
replication. Am I hearing that nobody believes scalability is a
concern?  I think many of us would like to see features that would
allow large scale installations to be more practical. I also think most
of us would agree that the current "graft-on" replication methods are
sub-ideal.

alex

--
alex@posixnap.net
Alex J. Avriette, Unix Systems Gladiator
The Emperor Wears No Clothes.
http://www.drugsense.org/wodclock.htm


Re: RFC: Very large scale postgres support

From
Rod Taylor
Date:
> replication. Am I hearing that nobody believes scalability is a
> concern?  I think many of us would like to see features that would
> allow large scale installations to be more practical. I also think most
> of us would agree that the current "graft-on" replication methods are
> sub-ideal.

You really haven't told us which approach to scaling you require. Many
are available and PostgreSQL is really only good at a few of them.

Anyway, current replication is a PITA mostly due to it's inability to
easily start from an empty live database and catch up. You might want to
throw a developer at helping SLONY along if you're feel master/slave is
the right direction, since presumably you will have several slaves.



Re: RFC: Very large scale postgres support

From
Rod Taylor
Date:
On Sun, 2004-02-08 at 21:01, Alex J. Avriette wrote:
> On Sun, Feb 08, 2004 at 08:01:38PM -0500, Rod Taylor wrote:
> 
> > Replication won't help if those are all mostly write transactions. If a
> > small percentage, even 1% would be challenging, is INSERTS, UPDATES or
> > DELETES, master / slave replication might get you somewhere.
> 
> There is no way on earth we could be doing writes at that rate. I think
> that's a given.

Sure you can, if you can horizontally partition the data so clients A
are on machine A, clients B are on machine B, ...



Re: RFC: Very large scale postgres support

From
"Alex J. Avriette"
Date:
On Sun, Feb 08, 2004 at 09:20:07PM -0500, Rod Taylor wrote:
> On Sun, 2004-02-08 at 21:01, Alex J. Avriette wrote:
> > On Sun, Feb 08, 2004 at 08:01:38PM -0500, Rod Taylor wrote:
> > 
> > > Replication won't help if those are all mostly write transactions. If a
> > > small percentage, even 1% would be challenging, is INSERTS, UPDATES or
> > > DELETES, master / slave replication might get you somewhere.
> > 
> > There is no way on earth we could be doing writes at that rate. I think
> > that's a given.
> 
> Sure you can, if you can horizontally partition the data so clients A
> are on machine A, clients B are on machine B, ...

I think you were assuming inserts here. The problem actually comes from
updates here. The problem is, if I update here, how long before the
rest of my "cloud" of postgres nodes understand that record is
updated?  With an insert, the transaction and propagation are fairly
clear. With an update, the overall cost is higher, and the cost
per-node is higher.

Alex

--
alex@posixnap.net
Alex J. Avriette, Unix Systems Gladiator
"You cannot invade the mainland United States. There would be a rifle behind each blade of grass." - Admiral Isoroku
Yamamoto
 


Re: RFC: Very large scale postgres support

From
Josh Berkus
Date:
Alex,

> I find myself wondering what other people are doing with postgres that
> this doesn't seem to have come up. When one searches for postgres
> clustering on google, they will find lots of HA products. However,
> nobody seems to be attempting to create very high throughput clusters.

Have you checked out "Clusgres" from Linux Labs?   One of my clients will be 
testing this application soon.

While we haven't run into transaction throughput limits, a couple of my 
client's seem to be running in x86's limits on very large queries, 
particularly on vendor hardware.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: RFC: Very large scale postgres support

From
Chris
Date:
> That's what I said, and what I meant. Ten billion transactions equates
> to 115,740 transactions per second.

Have you tried to look at the scientific comunity? CERN has setups
that produce such large amounts of data - try searching google for
http://www.google.com/search?q=cern+event+database+postgresql

or even
http://www.google.com/search?q=cern+event+database

You might find some inspiration there!


Bye, Chris.




Re: RFC: Very large scale postgres support

From
Rod Taylor
Date:
On Mon, 2004-02-09 at 01:54, Alex J. Avriette wrote:
> On Sun, Feb 08, 2004 at 09:20:07PM -0500, Rod Taylor wrote:
> > On Sun, 2004-02-08 at 21:01, Alex J. Avriette wrote:
> > > On Sun, Feb 08, 2004 at 08:01:38PM -0500, Rod Taylor wrote:
> > > 
> > > > Replication won't help if those are all mostly write transactions. If a
> > > > small percentage, even 1% would be challenging, is INSERTS, UPDATES or
> > > > DELETES, master / slave replication might get you somewhere.
> > > 
> > > There is no way on earth we could be doing writes at that rate. I think
> > > that's a given.
> > 
> > Sure you can, if you can horizontally partition the data so clients A
> > are on machine A, clients B are on machine B, ...
> 
> I think you were assuming inserts here. The problem actually comes from
> updates here. The problem is, if I update here, how long before the

No, with the above I'm assuming that you have several completely
independent systems.

Look, if you're really doing that many select style queries, I presume
it is because you simply have too much data or it changes too frequently
to present statically as otherwise you wouldn't be using the database
for those queries.

To me, this implies either every one of your mirrors is going to need
some kind of enterprise storage solution, OR you segregate the clients
onto different databases so there is not any cross talk or data
propagation at all.

Anyway, you've yet to tell us anything of substance to even allow
guessing at what your solution should be. Simple master/slave mirroring
is only one of many options.



Re: RFC: Very large scale postgres support

From
"Keith Bottner"
Date:
Alex,

I agree that this is something that is worth spending time on. This
resembles the Oracle RAC (Real Application Cluster). While other people may
feel that the amount of data is unreasonable I have a similar problem that
will only be solved using such a solution.

In regards to how your database is designed? Who cares? This is an RFC for a
general discussion on how to design this level of functionality into
Postgres. Ultimately any solution would work without regard to the insert,
updates, or deletes being executed. Alex, I think as a first step we should
start coming up with a feature list of what would be necessary to support
this level of functionality. From that point we could then identify efforts
that are currently ongoing on Postgres development that we could help out on
as well as those items that would need to be handled directly.

I am very interested in going forth with this discussion and believe that I
would be able to have the company I work for put forward resources (i.e.
people or money) on developing the solution if we can come up with a
workable plan.

Josh, thanks for the heads up on Clusgres, I will take a look and see how
that fits.

Thanks,

Keith

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alex J. Avriette
Sent: Saturday, February 07, 2004 12:29 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] RFC: Very large scale postgres support


Recently I was tasked with creating a "distribution system" for postgres
nodes here at work. This would allow us to simply bring up a new box, push
postgres to it, and have a new database.

At the same time, we have started to approach the limits of what we can do
with postgres on one machine. Our platform presently is the HP DL380. It is
a reasonably fast machine, but in order to eke more performance out of
postgres, we are going to have to upgrade the hardware substantially.

So the subject came up, wouldn't it be nice if, with replication and
proxies, we could create postgres clusters? When we need more throughput, to
just put a new box in the cluster, dist a psotgres instance to it, and tell
the proxy about it. This is a very attractive idea for us, from a
scalability standpoint. It means that we don't have to buy $300,000 servers
when we max out our 2- or 4- cpu machines (in the past, I would have
suggested a Sun V880 for this database, but we are using Linux on x86).

We are left with one last option, and that is re-engineering our application
to distribute load across several instances of postgres which are operating
without any real knowledge of eachother. I worry, though, that as our needs
increase further, these application redesigns will become asymptotic.

I find myself wondering what other people are doing with postgres that this
doesn't seem to have come up. When one searches for postgres clustering on
google, they will find lots of HA products. However, nobody seems to be
attempting to create very high throughput clusters.

I feel that it would be a very good thing if some thinking on this subject
was done. In the future, people will hopefully begin using postgres for more
intense applications. We are looking at perhaps many tens of billions of
transactions per day within the next year or two. To simply buy a "bigger
box" each time we outgrow the one we're on is not effective nor efficient. I
simply don't believe we're the only ones pushing postgres this hard.

I understand there are many applications out there trying to achieve
replication. Some of them seem fairly promising. However, it seems to me
that if we want to see a true clustered database environment, there would
have to be actual native support in the postmaster (inter postmaster
communication if you will) for replication and cross-instance locking.

This is obviously a complicated problem, and probably not very many of us
are doing anything near as large-scale as this. However, I am sure most of
us can see the benefit of being able to provide support for these sorts of
applications.

I've just submitted this RFC in the hopes that we can discuss both the best
way to support very large scale databases, as well as how to handle them
presently.

Thanks again for your time.
alex

--
alex@posixnap.net
Alex J. Avriette, Solaris Systems Masseur
"I ... remain against the death penalty because I feel that eternal boredom
with no hope of parole is a much worse punishment than just ending it all
mercifully with that quiet needle." - Rachel Mills, NC Libertarian
Gubernatorial Candidate

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
              http://archives.postgresql.org



Re: RFC: Very large scale postgres support

From
Andreas Pflug
Date:
Keith Bottner wrote:

>Alex,
>
>I agree that this is something that is worth spending time on. This
>resembles the Oracle RAC (Real Application Cluster). While other people may
>feel that the amount of data is unreasonable I have a similar problem that
>will only be solved using such a solution.
>
>In regards to how your database is designed? Who cares? This is an RFC for a
>general discussion on how to design this level of functionality into
>Postgres.
>

IMHO a general discussion isn't too helpful, you might be discussing 
stuff that's never needed for PostgreSQL. Different database systems 
give different solutions to the same problem, as you might see from e.g. 
table partition discussions, which where initiated by Oracle-originating 
people.
There still might be weaknesses in pgsql, but to identify them, *real* 
issues need to be discussed. This is necessary to avoid major 
hardware/software dbms efforts that might well be replaced by 
organizational/app level tools.

Regards,
Andreas




Re: RFC: Very large scale postgres support

From
"Keith Bottner"
Date:
I always enjoy how everyone wants to talk about using different solutions
prior to understanding the complete problem. I would say that a *real* issue
is any perceived issue whether a current solution exists or not. If current
solutions are applicable and would work then great we have all gained;
however, if in fact it is necessary to add the additional functionality to
Postgres then so be it. But none of this can be decided until the complete
problem and hence the requirements are understood. My impression of the
Postgres project has always been that of a high end database system that is
endeavoring to become a component of critical enterprise systems. If this is
not true or as a group we are going to keep placing these scalability issues
aside then this will never be achieved and those of us who want Postgres to
play a more important role for our corporate systems will have no choice but
to go somewhere else.

I understand your position Andreas and respect your opinion; maybe what I
have identified as requirements is what you are specifying as *real* issues.
I hope so, because I to would like to avoid unnecessary dbms efforts. But
from what I understand of Alex's problem and more specifically mine, adding
another layer at the organizational/app level will not provide the level of
functionality that is required.

Regards,

Keith

-----Original Message-----
From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
Sent: Monday, February 09, 2004 9:54 AM
To: Keith Bottner
Cc: 'Alex J. Avriette'; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] RFC: Very large scale postgres support


Keith Bottner wrote:

>Alex,
>
>I agree that this is something that is worth spending time on. This
>resembles the Oracle RAC (Real Application Cluster). While other people
>may feel that the amount of data is unreasonable I have a similar
>problem that will only be solved using such a solution.
>
>In regards to how your database is designed? Who cares? This is an RFC
>for a general discussion on how to design this level of functionality
>into Postgres.
>

IMHO a general discussion isn't too helpful, you might be discussing
stuff that's never needed for PostgreSQL. Different database systems
give different solutions to the same problem, as you might see from e.g.
table partition discussions, which where initiated by Oracle-originating
people.
There still might be weaknesses in pgsql, but to identify them, *real*
issues need to be discussed. This is necessary to avoid major
hardware/software dbms efforts that might well be replaced by
organizational/app level tools.

Regards,
Andreas




Re: RFC: Very large scale postgres support

From
Andreas Pflug
Date:
Keith Bottner wrote:

>
>I understand your position Andreas and respect your opinion; maybe what I
>have identified as requirements is what you are specifying as *real* issues.
>I hope so, because I to would like to avoid unnecessary dbms efforts.
>

You got me very right. I didn't mean to declare high volume databasing 
as minor issue, I meant real world requirements.

Regards,
Andreas



Re: RFC: Very large scale postgres support

From
eddy.kalem@edusoft.com (ElPeddy)
Date:
Alex,

In our quest to see if we can get better performance out of PostgreSQL
by throwing more HW at it, I would have recommended a V880 also. I'm
curious to find out why you would have: "(in the past, I would have
suggested a Sun V880 for this database, but we are using Linux on
x86)" too.

Cheers,

Eddy

kbottner@comcast.net ("Keith Bottner") wrote in message news:<007f01c3ef1c$6a230ab0$7d00a8c0@juxtapose>...
> Alex,
> 
> I agree that this is something that is worth spending time on. This
> resembles the Oracle RAC (Real Application Cluster). While other people may
> feel that the amount of data is unreasonable I have a similar problem that
> will only be solved using such a solution.
> 
> In regards to how your database is designed? Who cares? This is an RFC for a
> general discussion on how to design this level of functionality into
> Postgres. Ultimately any solution would work without regard to the insert,
> updates, or deletes being executed. Alex, I think as a first step we should
> start coming up with a feature list of what would be necessary to support
> this level of functionality. From that point we could then identify efforts
> that are currently ongoing on Postgres development that we could help out on
> as well as those items that would need to be handled directly.
> 
> I am very interested in going forth with this discussion and believe that I
> would be able to have the company I work for put forward resources (i.e.
> people or money) on developing the solution if we can come up with a
> workable plan.
> 
> Josh, thanks for the heads up on Clusgres, I will take a look and see how
> that fits.
> 
> Thanks,
> 
> Keith
> 
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alex J. Avriette
> Sent: Saturday, February 07, 2004 12:29 PM
> To: pgsql-hackers@postgresql.org
> Subject: [HACKERS] RFC: Very large scale postgres support
> 
> 
> Recently I was tasked with creating a "distribution system" for postgres
> nodes here at work. This would allow us to simply bring up a new box, push
> postgres to it, and have a new database.
> 
> At the same time, we have started to approach the limits of what we can do
> with postgres on one machine. Our platform presently is the HP DL380. It is
> a reasonably fast machine, but in order to eke more performance out of
> postgres, we are going to have to upgrade the hardware substantially.
> 
> So the subject came up, wouldn't it be nice if, with replication and
> proxies, we could create postgres clusters? When we need more throughput, to
> just put a new box in the cluster, dist a psotgres instance to it, and tell
> the proxy about it. This is a very attractive idea for us, from a
> scalability standpoint. It means that we don't have to buy $300,000 servers
> when we max out our 2- or 4- cpu machines (in the past, I would have
> suggested a Sun V880 for this database, but we are using Linux on x86).
> 
> We are left with one last option, and that is re-engineering our application
> to distribute load across several instances of postgres which are operating
> without any real knowledge of eachother. I worry, though, that as our needs
> increase further, these application redesigns will become asymptotic.
> 
> I find myself wondering what other people are doing with postgres that this
> doesn't seem to have come up. When one searches for postgres clustering on
> google, they will find lots of HA products. However, nobody seems to be
> attempting to create very high throughput clusters.
> 
> I feel that it would be a very good thing if some thinking on this subject
> was done. In the future, people will hopefully begin using postgres for more
> intense applications. We are looking at perhaps many tens of billions of
> transactions per day within the next year or two. To simply buy a "bigger
> box" each time we outgrow the one we're on is not effective nor efficient. I
> simply don't believe we're the only ones pushing postgres this hard.
> 
> I understand there are many applications out there trying to achieve
> replication. Some of them seem fairly promising. However, it seems to me
> that if we want to see a true clustered database environment, there would
> have to be actual native support in the postmaster (inter postmaster
> communication if you will) for replication and cross-instance locking.
> 
> This is obviously a complicated problem, and probably not very many of us
> are doing anything near as large-scale as this. However, I am sure most of
> us can see the benefit of being able to provide support for these sorts of
> applications.
> 
> I've just submitted this RFC in the hopes that we can discuss both the best
> way to support very large scale databases, as well as how to handle them
> presently.
> 
> Thanks again for your time.
> alex
> 
> --
> alex@posixnap.net
> Alex J. Avriette, Solaris Systems Masseur
> "I ... remain against the death penalty because I feel that eternal boredom
> with no hope of parole is a much worse punishment than just ending it all
> mercifully with that quiet needle." - Rachel Mills, NC Libertarian
> Gubernatorial Candidate
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faqs/FAQ.html