RFC: Very large scale postgres support - Mailing list pgsql-hackers

From Alex J. Avriette
Subject RFC: Very large scale postgres support
Date
Msg-id 20040207182913.GJ7256@posixnap.net
Whole thread Raw
Responses Re: RFC: Very large scale postgres support  (Andreas Pflug <pgadmin@pse-consulting.de>)
Re: RFC: Very large scale postgres support  ("Keith Bottner" <kbottner@comcast.net>)
List pgsql-hackers
Recently I was tasked with creating a "distribution system" for
postgres nodes here at work. This would allow us to simply bring up a
new box, push postgres to it, and have a new database.

At the same time, we have started to approach the limits of what we can
do with postgres on one machine. Our platform presently is the HP
DL380. It is a reasonably fast machine, but in order to eke more
performance out of postgres, we are going to have to upgrade the
hardware substantially.

So the subject came up, wouldn't it be nice if, with replication and
proxies, we could create postgres clusters? When we need more
throughput, to just put a new box in the cluster, dist a psotgres
instance to it, and tell the proxy about it. This is a very attractive
idea for us, from a scalability standpoint. It means that we don't have
to buy $300,000 servers when we max out our 2- or 4- cpu machines (in
the past, I would have suggested a Sun V880 for this database, but we
are using Linux on x86).

We are left with one last option, and that is re-engineering our
application to distribute load across several instances of postgres
which are operating without any real knowledge of eachother. I worry,
though, that as our needs increase further, these application redesigns
will become asymptotic.

I find myself wondering what other people are doing with postgres that
this doesn't seem to have come up. When one searches for postgres
clustering on google, they will find lots of HA products. However,
nobody seems to be attempting to create very high throughput clusters.

I feel that it would be a very good thing if some thinking on this
subject was done. In the future, people will hopefully begin using
postgres for more intense applications. We are looking at perhaps many
tens of billions of transactions per day within the next year or two.
To simply buy a "bigger box" each time we outgrow the one we're on is
not effective nor efficient. I simply don't believe we're the only ones
pushing postgres this hard.

I understand there are many applications out there trying to achieve
replication. Some of them seem fairly promising. However, it seems to
me that if we want to see a true clustered database environment, there
would have to be actual native support in the postmaster (inter
postmaster communication if you will) for replication and
cross-instance locking.

This is obviously a complicated problem, and probably not very many of
us are doing anything near as large-scale as this. However, I am sure
most of us can see the benefit of being able to provide support for
these sorts of applications.

I've just submitted this RFC in the hopes that we can discuss both the
best way to support very large scale databases, as well as how to
handle them presently.

Thanks again for your time.
alex

--
alex@posixnap.net
Alex J. Avriette, Solaris Systems Masseur
"I ... remain against the death penalty because I feel that eternal boredom with no hope of parole is a much worse
punishmentthan just ending it all mercifully with that quiet needle." - Rachel Mills, NC Libertarian Gubernatorial
Candidate


pgsql-hackers by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: Aggregation question
Next
From: Andrew Dunstan
Date:
Subject: Re: dollar quoting