Re: Database cluster? - Mailing list pgsql-general

From Gordan Bobic
Subject Re: Database cluster?
Date
Msg-id 001401c05bbb$ca573a20$8000000a@localdomain
Whole thread Raw
In response to Re: Database cluster?  ("Valter Mazzola" <txian@hotmail.com>)
List pgsql-general
> > Than you can connect to any of the postgres on your cluster, for
> >example: > round robin.
> >
> >Hmm... But is this really what we want to do? This is less than ideal
for
> >several reasons (if I understand what you're saying correctly).
Replication
> >is off-line for a start, and it only works well for a system that has
few
> >inserts and lots of selects, probably from a lot of different users.
> >Probably a good things for applications like web search engines, but not
> >necessarily for much else.
>
> *** it isn't replication. It's that your cluster behaves like a
> single-computer. You modify the 'OS' (GFS + DIPC), not postgresql.

OK, that makes sense. Kind of like Mosix, then. But like mosix, this would
require lots of network bandwidth - or not, depending on how good GFS is at
figuring our what goes where.

> > > Another issue are datafiles, GFS seems promising. > But postgresql
uses
> >fcnl, and GFS (globalfilesystem.org) doesn't > support it yet. > A
> >distributed filesystem with locking etc. is required, Ideas ?
> >
> >Hmm... I am not sure that a distributed file system is what we want
here. I
> >think it might be better to have separate postgres databases on separate
> >local file systems, and handle putting the data together on a higher
level.
> >I think this would be better for both performance and scaleability.
Having
>
> ***yes... but WHEN we can have these features ? No one have done it till
> now, i've requested and searched but almost no reply.

Well, if you come up with a detailed design, I'm quite happy to help with
coding individual functions...

> >one big file system is likely to incur heavy network traffic penalties,
and
> >that is not necessary, as it can be avoided by just having the
distribution
> >done on a database level, rather than file system level.
> >
> >But then again, the distributed file system can be seen as a "neater"
> >solution, and it might work rather well, if they get the caching right
with
> >the correct "near-line" distribution of data across the network file
system
> >to make sure that the data is where it is most useful. In other words,
make
> >sure that the files (or even better, inodes) that are frequently
accessed
> >by a computer are on that computer).
> >
> >Still there is the issue of replication and redundancy.
>
> ***GFS does it transparently.

But wouldn't this all be incredibly network intensive? Could we implement
something that would make a process go to the data, instead of the other
way around? In database, data is typically bigger than the process
accessing it...

> >Indeed. As such, it should probably be the first thing to do toward
> >"clustering" a database. Still, it would be good to have a clear
> >development path, even though on that path we cludge things slightly at
> >various steps in order to have a useable system now, as opposed to a
> >"perfect" system later.
> >
>
> *** yes, i want clustering now...and i'm alone.

No, you're not. I NEED clustering now. Eventually the number of records and
tables comes and bites you, no matter how much you optimize your
application. And for most of us mere mortals, buying a Cray for running a
database is just not a viable option...

> I my opinion if GFS will do fcntl (and we can ask to GFS people, i
think),
> the stuff in this email can be done rapidly.

Well, I think you've just volunteered to contact them. :-)

> >A shared all approach is not necessarily that bad. It is (as far as I
can
> >tell), not better or worse than a "share nothing" approach. They both
have
> >pros and cons. Ideally, we should work toward coming up with an idea for
a
> >hybrid system that would pick the best of both worlds.
> >
> > > This system can give a sort of single-system-image, useful to
distribute
> > > other software beyond postgresql.
> >
> >Indeed. This is always a good thing for scalability for most
applications,
> >but databases have their specific requirements which may not be best
> >catered for by standard means of distributed processing. Still, what you
> >are suggesting would be a major improvement, from where I'm looking at
it,
> >but I am probably biased by looking at it from the point of view of my
> >particular application.
> >
> > > Also Mariposa (http://s2k-ftp.cs.berkeley.edu:8000/mariposa/) seems >
> >interesting, but it's not maintained and it's for an old postgresql
> >version.
> >
> >Hmm... Most interesting. There could be something recyclable in there.
Must
> >look at the specs and some source later...
> >
>
> *** i've compiled it , but with no results.
> An idea is to get diff to corresponding pure postgresql version (6.4/5?),
> then study the patch, and grab the secrets to fuse in current version.
The
> research papers seems very good. Perhaps some guy that have done
> Mariposa can help...

See above comment...

> My goal is to have a clustered open source database with the less effort
> possible, now.
>
> The project to do good stuff (ie code) in this field is very long...

Indeed. There has to be a feasible starting point that yields modest
improvements at modest cost (in time and effort in this case)

> i hope that some guy will start a real thing ... one idea is to start a
> project on cosource or similar to receive founding $$.
> This project is very important for the OpenSource world.

I agree. Having a fully clustered database with very little network
overhead would be a major success, both for Postgres and OpenSource. Here's
an obvious question - how good is (does it exist?) clustering support on
Oracle?

Regards.

Gordan


pgsql-general by date:

Previous
From: "Tamsin"
Date:
Subject: RE: Modify Column
Next
From: Ed Loehr
Date:
Subject: RFC: User reviews of PostgreSQL RI functionality