Home > mailing lists
Re: Database cluster? - Mailing list pgsql-general

From	Valter Mazzola
Subject	Re: Database cluster?
Date	December 1, 2000 12:28:45
Msg-id	F215kkOO1Spm7jsGwPA00002dd4@hotmail.com Whole thread Raw
In response to	Database cluster? ("Gordan Bobic" <gordan@freeuk.com>)
List	pgsql-general
Tree view


>From: "Gordan Bobic" To: Subject: Re: [GENERAL] Database cluster? Date:
>Fri, 1 Dec 2000 10:13:55 -0000
>
> > I've succesfully pacthed linux kernel 2.2.17 with DIPC and modified >
>postgresql's src (src/backend/storage/ipc/ipc.c) to create distributed >
>shm and sem.
>
>Please forgive my ignorance (I haven't used Postgres for that long), but
>what are shm and sem?
>

shared memory and semaphores

> > The strategy is then to start a postgresql that creates shm and sem on >
>ONE machine, then start other postgres on other machines on the cluster >
>that create NO shared structures ( there is a command line flag to do
>this).
>
>So, one "master" and lots of "slaves", right?
>

no, every machine is totally similar to the others, the only different this
is that only ONE machine creates the ( network Distributed by DIPC)shared
memory and semaphores.


> > Than you can connect to any of the postgres on your cluster, for
>example: > round robin.
>
>Hmm... But is this really what we want to do? This is less than ideal for
>several reasons (if I understand what you're saying correctly). Replication
>is off-line for a start, and it only works well for a system that has few
>inserts and lots of selects, probably from a lot of different users.
>Probably a good things for applications like web search engines, but not
>necessarily for much else.

*** it isn't replication. It's that your cluster behaves like a
single-computer. You modify the 'OS' (GFS + DIPC), not postgresql.


>
> > Another issue are datafiles, GFS seems promising. > But postgresql uses
>fcnl, and GFS (globalfilesystem.org) doesn't > support it yet. > A
>distributed filesystem with locking etc. is required, Ideas ?
>
>Hmm... I am not sure that a distributed file system is what we want here. I
>think it might be better to have separate postgres databases on separate
>local file systems, and handle putting the data together on a higher level.
>I think this would be better for both performance and scaleability. Having

***yes... but WHEN we can have these features ? No one have done it till
now, i've requested and searched but almost no reply.

>one big file system is likely to incur heavy network traffic penalties, and
>that is not necessary, as it can be avoided by just having the distribution
>done on a database level, rather than file system level.
>
>But then again, the distributed file system can be seen as a "neater"
>solution, and it might work rather well, if they get the caching right with
>the correct "near-line" distribution of data across the network file system
>to make sure that the data is where it is most useful. In other words, make
>sure that the files (or even better, inodes) that are frequently accessed
>by a computer are on that computer).
>
>Still there is the issue of replication and redundancy.

***GFS does it transparently.

I just think that
>for a database application, this would be best done on the database level,
>rather than a file system level, unless the distributed file system in use
>was designed with all the database-useful features in mind.
>
> > Another issue is that DIPC doesn't have a failover mechanism.
>
>Again, for a database, it might be best to handle it at a higher level.
>
> > This is a shared All approach, it's not the best, but probably it's the
> > fastest solution (bad) to implement, with little modifications (4-5) >
>lines to postgresql sources.
>
>Indeed. As such, it should probably be the first thing to do toward
>"clustering" a database. Still, it would be good to have a clear
>development path, even though on that path we cludge things slightly at
>various steps in order to have a useable system now, as opposed to a
>"perfect" system later.
>

*** yes, i want clustering now...and i'm alone.
I my opinion if GFS will do fcntl (and we can ask to GFS people, i think),
the stuff in this email can be done rapidly.


>A shared all approach is not necessarily that bad. It is (as far as I can
>tell), not better or worse than a "share nothing" approach. They both have
>pros and cons. Ideally, we should work toward coming up with an idea for a
>hybrid system that would pick the best of both worlds.
>
> > This system can give a sort of single-system-image, useful to distribute
> > other software beyond postgresql.
>
>Indeed. This is always a good thing for scalability for most applications,
>but databases have their specific requirements which may not be best
>catered for by standard means of distributed processing. Still, what you
>are suggesting would be a major improvement, from where I'm looking at it,
>but I am probably biased by looking at it from the point of view of my
>particular application.
>
> > Also Mariposa (http://s2k-ftp.cs.berkeley.edu:8000/mariposa/) seems >
>interesting, but it's not maintained and it's for an old postgresql
>version.
>
>Hmm... Most interesting. There could be something recyclable in there. Must
>look at the specs and some source later...
>

*** i've compiled it , but with no results.
An idea is to get diff to corresponding pure postgresql version (6.4/5?),
then study the patch, and grab the secrets to fuse in current version. The
research papers seems very good. Perhaps some guy that have done Mariposa
can help...

My goal is to have a clustered open source database with the less effort
possible, now.

The project to do good stuff (ie code) in this field is very long...

i hope that some guy will start a real thing ... one idea is to start a
project on cosource or similar to receive founding $$.
This project is very important for the OpenSource world.


valter

>Regards.
>
>Gordan
>
_____________________________________________________________________________________
Get more from the Web.  FREE MSN Explorer download : http://explorer.msn.com
pgsql-general by date:
From: joe@jwebmedia.com
Date: 01 December 2000, 12:10:16
Subject: Re: Modify Column
From: "Tamsin"
Date: 01 December 2000, 12:32:46
Subject: RE: Modify Column
Re: Database cluster? - Mailing list pgsql-general

Previous

Next