Re: RFC: Very large scale postgres support - Mailing list pgsql-hackers

From Keith Bottner
Subject Re: RFC: Very large scale postgres support
Date
Msg-id 007f01c3ef1c$6a230ab0$7d00a8c0@juxtapose
Whole thread Raw
In response to RFC: Very large scale postgres support  ("Alex J. Avriette" <alex@posixnap.net>)
Responses Re: RFC: Very large scale postgres support  (Andreas Pflug <pgadmin@pse-consulting.de>)
List pgsql-hackers
Alex,

I agree that this is something that is worth spending time on. This
resembles the Oracle RAC (Real Application Cluster). While other people may
feel that the amount of data is unreasonable I have a similar problem that
will only be solved using such a solution.

In regards to how your database is designed? Who cares? This is an RFC for a
general discussion on how to design this level of functionality into
Postgres. Ultimately any solution would work without regard to the insert,
updates, or deletes being executed. Alex, I think as a first step we should
start coming up with a feature list of what would be necessary to support
this level of functionality. From that point we could then identify efforts
that are currently ongoing on Postgres development that we could help out on
as well as those items that would need to be handled directly.

I am very interested in going forth with this discussion and believe that I
would be able to have the company I work for put forward resources (i.e.
people or money) on developing the solution if we can come up with a
workable plan.

Josh, thanks for the heads up on Clusgres, I will take a look and see how
that fits.

Thanks,

Keith

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alex J. Avriette
Sent: Saturday, February 07, 2004 12:29 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] RFC: Very large scale postgres support


Recently I was tasked with creating a "distribution system" for postgres
nodes here at work. This would allow us to simply bring up a new box, push
postgres to it, and have a new database.

At the same time, we have started to approach the limits of what we can do
with postgres on one machine. Our platform presently is the HP DL380. It is
a reasonably fast machine, but in order to eke more performance out of
postgres, we are going to have to upgrade the hardware substantially.

So the subject came up, wouldn't it be nice if, with replication and
proxies, we could create postgres clusters? When we need more throughput, to
just put a new box in the cluster, dist a psotgres instance to it, and tell
the proxy about it. This is a very attractive idea for us, from a
scalability standpoint. It means that we don't have to buy $300,000 servers
when we max out our 2- or 4- cpu machines (in the past, I would have
suggested a Sun V880 for this database, but we are using Linux on x86).

We are left with one last option, and that is re-engineering our application
to distribute load across several instances of postgres which are operating
without any real knowledge of eachother. I worry, though, that as our needs
increase further, these application redesigns will become asymptotic.

I find myself wondering what other people are doing with postgres that this
doesn't seem to have come up. When one searches for postgres clustering on
google, they will find lots of HA products. However, nobody seems to be
attempting to create very high throughput clusters.

I feel that it would be a very good thing if some thinking on this subject
was done. In the future, people will hopefully begin using postgres for more
intense applications. We are looking at perhaps many tens of billions of
transactions per day within the next year or two. To simply buy a "bigger
box" each time we outgrow the one we're on is not effective nor efficient. I
simply don't believe we're the only ones pushing postgres this hard.

I understand there are many applications out there trying to achieve
replication. Some of them seem fairly promising. However, it seems to me
that if we want to see a true clustered database environment, there would
have to be actual native support in the postmaster (inter postmaster
communication if you will) for replication and cross-instance locking.

This is obviously a complicated problem, and probably not very many of us
are doing anything near as large-scale as this. However, I am sure most of
us can see the benefit of being able to provide support for these sorts of
applications.

I've just submitted this RFC in the hopes that we can discuss both the best
way to support very large scale databases, as well as how to handle them
presently.

Thanks again for your time.
alex

--
alex@posixnap.net
Alex J. Avriette, Solaris Systems Masseur
"I ... remain against the death penalty because I feel that eternal boredom
with no hope of parole is a much worse punishment than just ending it all
mercifully with that quiet needle." - Rachel Mills, NC Libertarian
Gubernatorial Candidate

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
              http://archives.postgresql.org



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Transaction aborts on syntax error.
Next
From: Neil Conway
Date:
Subject: Re: psql variables