Re: Database cluster? - Mailing list pgsql-general
From | Gordan Bobic |
---|---|
Subject | Re: Database cluster? |
Date | |
Msg-id | 004301c05b7f$eff8ad40$8000000a@localdomain Whole thread Raw |
In response to | Re: Database cluster? ("Valter Mazzola" <txian@hotmail.com>) |
Responses |
Re: Database cluster?
|
List | pgsql-general |
> I've succesfully pacthed linux kernel 2.2.17 with DIPC and modified > postgresql's src (src/backend/storage/ipc/ipc.c) to create distributed > shm and sem. Please forgive my ignorance (I haven't used Postgres for that long), but what are shm and sem? > The strategy is then to start a postgresql that creates shm and sem on > ONE machine, then start other postgres on other machines on the cluster > that create NO shared structures ( there is a command line flag to do this). So, one "master" and lots of "slaves", right? > Than you can connect to any of the postgres on your cluster, for example: > round robin. Hmm... But is this really what we want to do? This is less than ideal for several reasons (if I understand what you're saying correctly). Replication is off-line for a start, and it only works well for a system that has few inserts and lots of selects, probably from a lot of different users. Probably a good things for applications like web search engines, but not necessarily for much else. > Another issue are datafiles, GFS seems promising. > But postgresql uses fcnl, and GFS (globalfilesystem.org) doesn't > support it yet. > A distributed filesystem with locking etc. is required, Ideas ? Hmm... I am not sure that a distributed file system is what we want here. I think it might be better to have separate postgres databases on separate local file systems, and handle putting the data together on a higher level. I think this would be better for both performance and scaleability. Having one big file system is likely to incur heavy network traffic penalties, and that is not necessary, as it can be avoided by just having the distribution done on a database level, rather than file system level. But then again, the distributed file system can be seen as a "neater" solution, and it might work rather well, if they get the caching right with the correct "near-line" distribution of data across the network file system to make sure that the data is where it is most useful. In other words, make sure that the files (or even better, inodes) that are frequently accessed by a computer are on that computer). Still there is the issue of replication and redundancy. I just think that for a database application, this would be best done on the database level, rather than a file system level, unless the distributed file system in use was designed with all the database-useful features in mind. > Another issue is that DIPC doesn't have a failover mechanism. Again, for a database, it might be best to handle it at a higher level. > This is a shared All approach, it's not the best, but probably it's the > fastest solution (bad) to implement, with little modifications (4-5) > lines to postgresql sources. Indeed. As such, it should probably be the first thing to do toward "clustering" a database. Still, it would be good to have a clear development path, even though on that path we cludge things slightly at various steps in order to have a useable system now, as opposed to a "perfect" system later. A shared all approach is not necessarily that bad. It is (as far as I can tell), not better or worse than a "share nothing" approach. They both have pros and cons. Ideally, we should work toward coming up with an idea for a hybrid system that would pick the best of both worlds. > This system can give a sort of single-system-image, useful to distribute > other software beyond postgresql. Indeed. This is always a good thing for scalability for most applications, but databases have their specific requirements which may not be best catered for by standard means of distributed processing. Still, what you are suggesting would be a major improvement, from where I'm looking at it, but I am probably biased by looking at it from the point of view of my particular application. > Also Mariposa (http://s2k-ftp.cs.berkeley.edu:8000/mariposa/) seems > interesting, but it's not maintained and it's for an old postgresql version. Hmm... Most interesting. There could be something recyclable in there. Must look at the specs and some source later... Regards. Gordan
pgsql-general by date: