Re: Database cluster? - Mailing list pgsql-general
From | Gordan Bobic |
---|---|
Subject | Re: Database cluster? |
Date | |
Msg-id | 001401c05bbb$ca573a20$8000000a@localdomain Whole thread Raw |
In response to | Re: Database cluster? ("Valter Mazzola" <txian@hotmail.com>) |
List | pgsql-general |
> > Than you can connect to any of the postgres on your cluster, for > >example: > round robin. > > > >Hmm... But is this really what we want to do? This is less than ideal for > >several reasons (if I understand what you're saying correctly). Replication > >is off-line for a start, and it only works well for a system that has few > >inserts and lots of selects, probably from a lot of different users. > >Probably a good things for applications like web search engines, but not > >necessarily for much else. > > *** it isn't replication. It's that your cluster behaves like a > single-computer. You modify the 'OS' (GFS + DIPC), not postgresql. OK, that makes sense. Kind of like Mosix, then. But like mosix, this would require lots of network bandwidth - or not, depending on how good GFS is at figuring our what goes where. > > > Another issue are datafiles, GFS seems promising. > But postgresql uses > >fcnl, and GFS (globalfilesystem.org) doesn't > support it yet. > A > >distributed filesystem with locking etc. is required, Ideas ? > > > >Hmm... I am not sure that a distributed file system is what we want here. I > >think it might be better to have separate postgres databases on separate > >local file systems, and handle putting the data together on a higher level. > >I think this would be better for both performance and scaleability. Having > > ***yes... but WHEN we can have these features ? No one have done it till > now, i've requested and searched but almost no reply. Well, if you come up with a detailed design, I'm quite happy to help with coding individual functions... > >one big file system is likely to incur heavy network traffic penalties, and > >that is not necessary, as it can be avoided by just having the distribution > >done on a database level, rather than file system level. > > > >But then again, the distributed file system can be seen as a "neater" > >solution, and it might work rather well, if they get the caching right with > >the correct "near-line" distribution of data across the network file system > >to make sure that the data is where it is most useful. In other words, make > >sure that the files (or even better, inodes) that are frequently accessed > >by a computer are on that computer). > > > >Still there is the issue of replication and redundancy. > > ***GFS does it transparently. But wouldn't this all be incredibly network intensive? Could we implement something that would make a process go to the data, instead of the other way around? In database, data is typically bigger than the process accessing it... > >Indeed. As such, it should probably be the first thing to do toward > >"clustering" a database. Still, it would be good to have a clear > >development path, even though on that path we cludge things slightly at > >various steps in order to have a useable system now, as opposed to a > >"perfect" system later. > > > > *** yes, i want clustering now...and i'm alone. No, you're not. I NEED clustering now. Eventually the number of records and tables comes and bites you, no matter how much you optimize your application. And for most of us mere mortals, buying a Cray for running a database is just not a viable option... > I my opinion if GFS will do fcntl (and we can ask to GFS people, i think), > the stuff in this email can be done rapidly. Well, I think you've just volunteered to contact them. :-) > >A shared all approach is not necessarily that bad. It is (as far as I can > >tell), not better or worse than a "share nothing" approach. They both have > >pros and cons. Ideally, we should work toward coming up with an idea for a > >hybrid system that would pick the best of both worlds. > > > > > This system can give a sort of single-system-image, useful to distribute > > > other software beyond postgresql. > > > >Indeed. This is always a good thing for scalability for most applications, > >but databases have their specific requirements which may not be best > >catered for by standard means of distributed processing. Still, what you > >are suggesting would be a major improvement, from where I'm looking at it, > >but I am probably biased by looking at it from the point of view of my > >particular application. > > > > > Also Mariposa (http://s2k-ftp.cs.berkeley.edu:8000/mariposa/) seems > > >interesting, but it's not maintained and it's for an old postgresql > >version. > > > >Hmm... Most interesting. There could be something recyclable in there. Must > >look at the specs and some source later... > > > > *** i've compiled it , but with no results. > An idea is to get diff to corresponding pure postgresql version (6.4/5?), > then study the patch, and grab the secrets to fuse in current version. The > research papers seems very good. Perhaps some guy that have done > Mariposa can help... See above comment... > My goal is to have a clustered open source database with the less effort > possible, now. > > The project to do good stuff (ie code) in this field is very long... Indeed. There has to be a feasible starting point that yields modest improvements at modest cost (in time and effort in this case) > i hope that some guy will start a real thing ... one idea is to start a > project on cosource or similar to receive founding $$. > This project is very important for the OpenSource world. I agree. Having a fully clustered database with very little network overhead would be a major success, both for Postgres and OpenSource. Here's an obvious question - how good is (does it exist?) clustering support on Oracle? Regards. Gordan
pgsql-general by date: