RFC: Very large scale postgres support - Mailing list pgsql-hackers
From | Alex J. Avriette |
---|---|
Subject | RFC: Very large scale postgres support |
Date | |
Msg-id | 20040207182913.GJ7256@posixnap.net Whole thread Raw |
Responses |
Re: RFC: Very large scale postgres support
Re: RFC: Very large scale postgres support |
List | pgsql-hackers |
Recently I was tasked with creating a "distribution system" for postgres nodes here at work. This would allow us to simply bring up a new box, push postgres to it, and have a new database. At the same time, we have started to approach the limits of what we can do with postgres on one machine. Our platform presently is the HP DL380. It is a reasonably fast machine, but in order to eke more performance out of postgres, we are going to have to upgrade the hardware substantially. So the subject came up, wouldn't it be nice if, with replication and proxies, we could create postgres clusters? When we need more throughput, to just put a new box in the cluster, dist a psotgres instance to it, and tell the proxy about it. This is a very attractive idea for us, from a scalability standpoint. It means that we don't have to buy $300,000 servers when we max out our 2- or 4- cpu machines (in the past, I would have suggested a Sun V880 for this database, but we are using Linux on x86). We are left with one last option, and that is re-engineering our application to distribute load across several instances of postgres which are operating without any real knowledge of eachother. I worry, though, that as our needs increase further, these application redesigns will become asymptotic. I find myself wondering what other people are doing with postgres that this doesn't seem to have come up. When one searches for postgres clustering on google, they will find lots of HA products. However, nobody seems to be attempting to create very high throughput clusters. I feel that it would be a very good thing if some thinking on this subject was done. In the future, people will hopefully begin using postgres for more intense applications. We are looking at perhaps many tens of billions of transactions per day within the next year or two. To simply buy a "bigger box" each time we outgrow the one we're on is not effective nor efficient. I simply don't believe we're the only ones pushing postgres this hard. I understand there are many applications out there trying to achieve replication. Some of them seem fairly promising. However, it seems to me that if we want to see a true clustered database environment, there would have to be actual native support in the postmaster (inter postmaster communication if you will) for replication and cross-instance locking. This is obviously a complicated problem, and probably not very many of us are doing anything near as large-scale as this. However, I am sure most of us can see the benefit of being able to provide support for these sorts of applications. I've just submitted this RFC in the hopes that we can discuss both the best way to support very large scale databases, as well as how to handle them presently. Thanks again for your time. alex -- alex@posixnap.net Alex J. Avriette, Solaris Systems Masseur "I ... remain against the death penalty because I feel that eternal boredom with no hope of parole is a much worse punishmentthan just ending it all mercifully with that quiet needle." - Rachel Mills, NC Libertarian Gubernatorial Candidate
pgsql-hackers by date: