Thread: RFC: Very large scale postgres support
Recently I was tasked with creating a "distribution system" for postgres nodes here at work. This would allow us to simply bring up a new box, push postgres to it, and have a new database. At the same time, we have started to approach the limits of what we can do with postgres on one machine. Our platform presently is the HP DL380. It is a reasonably fast machine, but in order to eke more performance out of postgres, we are going to have to upgrade the hardware substantially. So the subject came up, wouldn't it be nice if, with replication and proxies, we could create postgres clusters? When we need more throughput, to just put a new box in the cluster, dist a psotgres instance to it, and tell the proxy about it. This is a very attractive idea for us, from a scalability standpoint. It means that we don't have to buy $300,000 servers when we max out our 2- or 4- cpu machines (in the past, I would have suggested a Sun V880 for this database, but we are using Linux on x86). We are left with one last option, and that is re-engineering our application to distribute load across several instances of postgres which are operating without any real knowledge of eachother. I worry, though, that as our needs increase further, these application redesigns will become asymptotic. I find myself wondering what other people are doing with postgres that this doesn't seem to have come up. When one searches for postgres clustering on google, they will find lots of HA products. However, nobody seems to be attempting to create very high throughput clusters. I feel that it would be a very good thing if some thinking on this subject was done. In the future, people will hopefully begin using postgres for more intense applications. We are looking at perhaps many tens of billions of transactions per day within the next year or two. To simply buy a "bigger box" each time we outgrow the one we're on is not effective nor efficient. I simply don't believe we're the only ones pushing postgres this hard. I understand there are many applications out there trying to achieve replication. Some of them seem fairly promising. However, it seems to me that if we want to see a true clustered database environment, there would have to be actual native support in the postmaster (inter postmaster communication if you will) for replication and cross-instance locking. This is obviously a complicated problem, and probably not very many of us are doing anything near as large-scale as this. However, I am sure most of us can see the benefit of being able to provide support for these sorts of applications. I've just submitted this RFC in the hopes that we can discuss both the best way to support very large scale databases, as well as how to handle them presently. Thanks again for your time. alex -- alex@posixnap.net Alex J. Avriette, Solaris Systems Masseur "I ... remain against the death penalty because I feel that eternal boredom with no hope of parole is a much worse punishmentthan just ending it all mercifully with that quiet needle." - Rachel Mills, NC Libertarian Gubernatorial Candidate
Alex J. Avriette wrote: > >I feel that it would be a very good thing if some thinking on this >subject was done. In the future, people will hopefully begin using >postgres for more intense applications. We are looking at perhaps many >tens of billions of transactions per day within the next year or two. > > tens of billions =10e10 per day? This is probably a typo, because this would mean > 100,000 requests per second? Do you want to feed a monitor with pixel data right from the database, using individual queries for each pixel? Or record each irc user's keyclick in the world concurrently online in a single database? Regards, Andreas
On Sun, Feb 08, 2004 at 08:07:14PM +0100, Andreas Pflug wrote: > >I feel that it would be a very good thing if some thinking on this > >subject was done. In the future, people will hopefully begin using > >postgres for more intense applications. We are looking at perhaps many > >tens of billions of transactions per day within the next year or two. > > > > > > tens of billions =10e10 per day? This is probably a typo, because this > would mean > 100,000 requests per second? Do you want to feed a monitor That's what I said, and what I meant. Ten billion transactions equates to 115,740 transactions per second. > with pixel data right from the database, using individual queries for > each pixel? Or record each irc user's keyclick in the world concurrently > online in a single database? Just because you don't think there is a valid use for that sort of traffic doesn't mean there isn't one. Imagine, if you will, a hundred thousand agents making four to five requests a second. Now, imagine these requests are all going to the same database. I'll leave the rest of this exercise up to you. The fact is, there are situations in which such extreme traffic is warranted. My concern is that I am not able to use postgres in such situations because it cannot scale to that level. I feel that it would be possible to reach that level with support in the postmaster for replication. With software load balancing (eg rotors or similar) and updates between postmasters, it would be (it seems to me) possible to drastically increase the available capacity of a database installation through the addition of more nodes. This has the added benefit of allowing us to distribute network resources. Alex -- alex@posixnap.net Alex J. Avriette, Unix Systems Gladiator The Emperor Wears No Clothes. http://www.drugsense.org/wodclock.htm
> The fact is, there are situations in which such extreme traffic is > warranted. My concern is that I am not able to use postgres in such > situations because it cannot scale to that level. I feel that it would > be possible to reach that level with support in the postmaster for > replication. Replication won't help if those are all mostly write transactions. If a small percentage, even 1% would be challenging, is INSERTS, UPDATES or DELETES, master / slave replication might get you somewhere. Otherwise you're going to need to partition the data up into smaller, easily managed sizes -- that of course requires an ability to horizontally partition the data. Anyway, if you want a sane answer we need more information about the data (is it partitionable?), schema type, queries producing the load (simple or complex), acceptable data delays (does a new insert need to be immediately visible?), etc. Dealing with a hundred thousand queries/second isn't just challenging to PostgreSQL, you will be hard pressed to find the hardware that will push that much data around even with the overhead of the database itself.
On Sun, Feb 08, 2004 at 08:01:38PM -0500, Rod Taylor wrote: > Replication won't help if those are all mostly write transactions. If a > small percentage, even 1% would be challenging, is INSERTS, UPDATES or > DELETES, master / slave replication might get you somewhere. There is no way on earth we could be doing writes at that rate. I think that's a given. > Otherwise you're going to need to partition the data up into smaller, > easily managed sizes -- that of course requires an ability to > horizontally partition the data. Obviously, this is the route we have taken. > Anyway, if you want a sane answer we need more information about the > data (is it partitionable?), schema type, queries producing the load > (simple or complex), acceptable data delays (does a new insert need to > be immediately visible?), etc. We've considered a lot of this. Like I said, I think a lot of our need for distributing the database can be helped along with native replication. Am I hearing that nobody believes scalability is a concern? I think many of us would like to see features that would allow large scale installations to be more practical. I also think most of us would agree that the current "graft-on" replication methods are sub-ideal. alex -- alex@posixnap.net Alex J. Avriette, Unix Systems Gladiator The Emperor Wears No Clothes. http://www.drugsense.org/wodclock.htm
> replication. Am I hearing that nobody believes scalability is a > concern? I think many of us would like to see features that would > allow large scale installations to be more practical. I also think most > of us would agree that the current "graft-on" replication methods are > sub-ideal. You really haven't told us which approach to scaling you require. Many are available and PostgreSQL is really only good at a few of them. Anyway, current replication is a PITA mostly due to it's inability to easily start from an empty live database and catch up. You might want to throw a developer at helping SLONY along if you're feel master/slave is the right direction, since presumably you will have several slaves.
On Sun, 2004-02-08 at 21:01, Alex J. Avriette wrote: > On Sun, Feb 08, 2004 at 08:01:38PM -0500, Rod Taylor wrote: > > > Replication won't help if those are all mostly write transactions. If a > > small percentage, even 1% would be challenging, is INSERTS, UPDATES or > > DELETES, master / slave replication might get you somewhere. > > There is no way on earth we could be doing writes at that rate. I think > that's a given. Sure you can, if you can horizontally partition the data so clients A are on machine A, clients B are on machine B, ...
On Sun, Feb 08, 2004 at 09:20:07PM -0500, Rod Taylor wrote: > On Sun, 2004-02-08 at 21:01, Alex J. Avriette wrote: > > On Sun, Feb 08, 2004 at 08:01:38PM -0500, Rod Taylor wrote: > > > > > Replication won't help if those are all mostly write transactions. If a > > > small percentage, even 1% would be challenging, is INSERTS, UPDATES or > > > DELETES, master / slave replication might get you somewhere. > > > > There is no way on earth we could be doing writes at that rate. I think > > that's a given. > > Sure you can, if you can horizontally partition the data so clients A > are on machine A, clients B are on machine B, ... I think you were assuming inserts here. The problem actually comes from updates here. The problem is, if I update here, how long before the rest of my "cloud" of postgres nodes understand that record is updated? With an insert, the transaction and propagation are fairly clear. With an update, the overall cost is higher, and the cost per-node is higher. Alex -- alex@posixnap.net Alex J. Avriette, Unix Systems Gladiator "You cannot invade the mainland United States. There would be a rifle behind each blade of grass." - Admiral Isoroku Yamamoto
Alex, > I find myself wondering what other people are doing with postgres that > this doesn't seem to have come up. When one searches for postgres > clustering on google, they will find lots of HA products. However, > nobody seems to be attempting to create very high throughput clusters. Have you checked out "Clusgres" from Linux Labs? One of my clients will be testing this application soon. While we haven't run into transaction throughput limits, a couple of my client's seem to be running in x86's limits on very large queries, particularly on vendor hardware. -- Josh Berkus Aglio Database Solutions San Francisco
> That's what I said, and what I meant. Ten billion transactions equates > to 115,740 transactions per second. Have you tried to look at the scientific comunity? CERN has setups that produce such large amounts of data - try searching google for http://www.google.com/search?q=cern+event+database+postgresql or even http://www.google.com/search?q=cern+event+database You might find some inspiration there! Bye, Chris.
On Mon, 2004-02-09 at 01:54, Alex J. Avriette wrote: > On Sun, Feb 08, 2004 at 09:20:07PM -0500, Rod Taylor wrote: > > On Sun, 2004-02-08 at 21:01, Alex J. Avriette wrote: > > > On Sun, Feb 08, 2004 at 08:01:38PM -0500, Rod Taylor wrote: > > > > > > > Replication won't help if those are all mostly write transactions. If a > > > > small percentage, even 1% would be challenging, is INSERTS, UPDATES or > > > > DELETES, master / slave replication might get you somewhere. > > > > > > There is no way on earth we could be doing writes at that rate. I think > > > that's a given. > > > > Sure you can, if you can horizontally partition the data so clients A > > are on machine A, clients B are on machine B, ... > > I think you were assuming inserts here. The problem actually comes from > updates here. The problem is, if I update here, how long before the No, with the above I'm assuming that you have several completely independent systems. Look, if you're really doing that many select style queries, I presume it is because you simply have too much data or it changes too frequently to present statically as otherwise you wouldn't be using the database for those queries. To me, this implies either every one of your mirrors is going to need some kind of enterprise storage solution, OR you segregate the clients onto different databases so there is not any cross talk or data propagation at all. Anyway, you've yet to tell us anything of substance to even allow guessing at what your solution should be. Simple master/slave mirroring is only one of many options.
Alex, I agree that this is something that is worth spending time on. This resembles the Oracle RAC (Real Application Cluster). While other people may feel that the amount of data is unreasonable I have a similar problem that will only be solved using such a solution. In regards to how your database is designed? Who cares? This is an RFC for a general discussion on how to design this level of functionality into Postgres. Ultimately any solution would work without regard to the insert, updates, or deletes being executed. Alex, I think as a first step we should start coming up with a feature list of what would be necessary to support this level of functionality. From that point we could then identify efforts that are currently ongoing on Postgres development that we could help out on as well as those items that would need to be handled directly. I am very interested in going forth with this discussion and believe that I would be able to have the company I work for put forward resources (i.e. people or money) on developing the solution if we can come up with a workable plan. Josh, thanks for the heads up on Clusgres, I will take a look and see how that fits. Thanks, Keith -----Original Message----- From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alex J. Avriette Sent: Saturday, February 07, 2004 12:29 PM To: pgsql-hackers@postgresql.org Subject: [HACKERS] RFC: Very large scale postgres support Recently I was tasked with creating a "distribution system" for postgres nodes here at work. This would allow us to simply bring up a new box, push postgres to it, and have a new database. At the same time, we have started to approach the limits of what we can do with postgres on one machine. Our platform presently is the HP DL380. It is a reasonably fast machine, but in order to eke more performance out of postgres, we are going to have to upgrade the hardware substantially. So the subject came up, wouldn't it be nice if, with replication and proxies, we could create postgres clusters? When we need more throughput, to just put a new box in the cluster, dist a psotgres instance to it, and tell the proxy about it. This is a very attractive idea for us, from a scalability standpoint. It means that we don't have to buy $300,000 servers when we max out our 2- or 4- cpu machines (in the past, I would have suggested a Sun V880 for this database, but we are using Linux on x86). We are left with one last option, and that is re-engineering our application to distribute load across several instances of postgres which are operating without any real knowledge of eachother. I worry, though, that as our needs increase further, these application redesigns will become asymptotic. I find myself wondering what other people are doing with postgres that this doesn't seem to have come up. When one searches for postgres clustering on google, they will find lots of HA products. However, nobody seems to be attempting to create very high throughput clusters. I feel that it would be a very good thing if some thinking on this subject was done. In the future, people will hopefully begin using postgres for more intense applications. We are looking at perhaps many tens of billions of transactions per day within the next year or two. To simply buy a "bigger box" each time we outgrow the one we're on is not effective nor efficient. I simply don't believe we're the only ones pushing postgres this hard. I understand there are many applications out there trying to achieve replication. Some of them seem fairly promising. However, it seems to me that if we want to see a true clustered database environment, there would have to be actual native support in the postmaster (inter postmaster communication if you will) for replication and cross-instance locking. This is obviously a complicated problem, and probably not very many of us are doing anything near as large-scale as this. However, I am sure most of us can see the benefit of being able to provide support for these sorts of applications. I've just submitted this RFC in the hopes that we can discuss both the best way to support very large scale databases, as well as how to handle them presently. Thanks again for your time. alex -- alex@posixnap.net Alex J. Avriette, Solaris Systems Masseur "I ... remain against the death penalty because I feel that eternal boredom with no hope of parole is a much worse punishment than just ending it all mercifully with that quiet needle." - Rachel Mills, NC Libertarian Gubernatorial Candidate ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Keith Bottner wrote: >Alex, > >I agree that this is something that is worth spending time on. This >resembles the Oracle RAC (Real Application Cluster). While other people may >feel that the amount of data is unreasonable I have a similar problem that >will only be solved using such a solution. > >In regards to how your database is designed? Who cares? This is an RFC for a >general discussion on how to design this level of functionality into >Postgres. > IMHO a general discussion isn't too helpful, you might be discussing stuff that's never needed for PostgreSQL. Different database systems give different solutions to the same problem, as you might see from e.g. table partition discussions, which where initiated by Oracle-originating people. There still might be weaknesses in pgsql, but to identify them, *real* issues need to be discussed. This is necessary to avoid major hardware/software dbms efforts that might well be replaced by organizational/app level tools. Regards, Andreas
I always enjoy how everyone wants to talk about using different solutions prior to understanding the complete problem. I would say that a *real* issue is any perceived issue whether a current solution exists or not. If current solutions are applicable and would work then great we have all gained; however, if in fact it is necessary to add the additional functionality to Postgres then so be it. But none of this can be decided until the complete problem and hence the requirements are understood. My impression of the Postgres project has always been that of a high end database system that is endeavoring to become a component of critical enterprise systems. If this is not true or as a group we are going to keep placing these scalability issues aside then this will never be achieved and those of us who want Postgres to play a more important role for our corporate systems will have no choice but to go somewhere else. I understand your position Andreas and respect your opinion; maybe what I have identified as requirements is what you are specifying as *real* issues. I hope so, because I to would like to avoid unnecessary dbms efforts. But from what I understand of Alex's problem and more specifically mine, adding another layer at the organizational/app level will not provide the level of functionality that is required. Regards, Keith -----Original Message----- From: Andreas Pflug [mailto:pgadmin@pse-consulting.de] Sent: Monday, February 09, 2004 9:54 AM To: Keith Bottner Cc: 'Alex J. Avriette'; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] RFC: Very large scale postgres support Keith Bottner wrote: >Alex, > >I agree that this is something that is worth spending time on. This >resembles the Oracle RAC (Real Application Cluster). While other people >may feel that the amount of data is unreasonable I have a similar >problem that will only be solved using such a solution. > >In regards to how your database is designed? Who cares? This is an RFC >for a general discussion on how to design this level of functionality >into Postgres. > IMHO a general discussion isn't too helpful, you might be discussing stuff that's never needed for PostgreSQL. Different database systems give different solutions to the same problem, as you might see from e.g. table partition discussions, which where initiated by Oracle-originating people. There still might be weaknesses in pgsql, but to identify them, *real* issues need to be discussed. This is necessary to avoid major hardware/software dbms efforts that might well be replaced by organizational/app level tools. Regards, Andreas
Keith Bottner wrote: > >I understand your position Andreas and respect your opinion; maybe what I >have identified as requirements is what you are specifying as *real* issues. >I hope so, because I to would like to avoid unnecessary dbms efforts. > You got me very right. I didn't mean to declare high volume databasing as minor issue, I meant real world requirements. Regards, Andreas
Alex, In our quest to see if we can get better performance out of PostgreSQL by throwing more HW at it, I would have recommended a V880 also. I'm curious to find out why you would have: "(in the past, I would have suggested a Sun V880 for this database, but we are using Linux on x86)" too. Cheers, Eddy kbottner@comcast.net ("Keith Bottner") wrote in message news:<007f01c3ef1c$6a230ab0$7d00a8c0@juxtapose>... > Alex, > > I agree that this is something that is worth spending time on. This > resembles the Oracle RAC (Real Application Cluster). While other people may > feel that the amount of data is unreasonable I have a similar problem that > will only be solved using such a solution. > > In regards to how your database is designed? Who cares? This is an RFC for a > general discussion on how to design this level of functionality into > Postgres. Ultimately any solution would work without regard to the insert, > updates, or deletes being executed. Alex, I think as a first step we should > start coming up with a feature list of what would be necessary to support > this level of functionality. From that point we could then identify efforts > that are currently ongoing on Postgres development that we could help out on > as well as those items that would need to be handled directly. > > I am very interested in going forth with this discussion and believe that I > would be able to have the company I work for put forward resources (i.e. > people or money) on developing the solution if we can come up with a > workable plan. > > Josh, thanks for the heads up on Clusgres, I will take a look and see how > that fits. > > Thanks, > > Keith > > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alex J. Avriette > Sent: Saturday, February 07, 2004 12:29 PM > To: pgsql-hackers@postgresql.org > Subject: [HACKERS] RFC: Very large scale postgres support > > > Recently I was tasked with creating a "distribution system" for postgres > nodes here at work. This would allow us to simply bring up a new box, push > postgres to it, and have a new database. > > At the same time, we have started to approach the limits of what we can do > with postgres on one machine. Our platform presently is the HP DL380. It is > a reasonably fast machine, but in order to eke more performance out of > postgres, we are going to have to upgrade the hardware substantially. > > So the subject came up, wouldn't it be nice if, with replication and > proxies, we could create postgres clusters? When we need more throughput, to > just put a new box in the cluster, dist a psotgres instance to it, and tell > the proxy about it. This is a very attractive idea for us, from a > scalability standpoint. It means that we don't have to buy $300,000 servers > when we max out our 2- or 4- cpu machines (in the past, I would have > suggested a Sun V880 for this database, but we are using Linux on x86). > > We are left with one last option, and that is re-engineering our application > to distribute load across several instances of postgres which are operating > without any real knowledge of eachother. I worry, though, that as our needs > increase further, these application redesigns will become asymptotic. > > I find myself wondering what other people are doing with postgres that this > doesn't seem to have come up. When one searches for postgres clustering on > google, they will find lots of HA products. However, nobody seems to be > attempting to create very high throughput clusters. > > I feel that it would be a very good thing if some thinking on this subject > was done. In the future, people will hopefully begin using postgres for more > intense applications. We are looking at perhaps many tens of billions of > transactions per day within the next year or two. To simply buy a "bigger > box" each time we outgrow the one we're on is not effective nor efficient. I > simply don't believe we're the only ones pushing postgres this hard. > > I understand there are many applications out there trying to achieve > replication. Some of them seem fairly promising. However, it seems to me > that if we want to see a true clustered database environment, there would > have to be actual native support in the postmaster (inter postmaster > communication if you will) for replication and cross-instance locking. > > This is obviously a complicated problem, and probably not very many of us > are doing anything near as large-scale as this. However, I am sure most of > us can see the benefit of being able to provide support for these sorts of > applications. > > I've just submitted this RFC in the hopes that we can discuss both the best > way to support very large scale databases, as well as how to handle them > presently. > > Thanks again for your time. > alex > > -- > alex@posixnap.net > Alex J. Avriette, Solaris Systems Masseur > "I ... remain against the death penalty because I feel that eternal boredom > with no hope of parole is a much worse punishment than just ending it all > mercifully with that quiet needle." - Rachel Mills, NC Libertarian > Gubernatorial Candidate > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html