postgresql cluster? - Mailing list pgsql-hackers
From | Bruno.White@ubsw.com |
---|---|
Subject | postgresql cluster? |
Date | |
Msg-id | H00003cc0c04f49b.0997292299.ch1p979.chi.swissbank.com@MHS Whole thread Raw |
List | pgsql-hackers |
briefly: what would be the plausibility/difficulty in using a distributed shared memory library (DSS) and a socket bridge implementation to recompile postgresql and deploy on a cluster of linux boxes using PVFS/GFS, creating a single storage clustered database. or better yet has someone done it? Hello all, I am interested in the possibility of constructing a postgresql cluster of machines. While I am not particularly well informed, I believe this is not possible using the MOSIX system due to the following implementation features of postgresql: - shared memory to implement locking/synchronization between the multiple backend processes - passing of socket down to forked backend process while perhaps the Ongoing projects of Network RAM and migratable sockets would somehow help, I suspect they do not fully solve the problem. My idea is to try and recompile/hack postgresql to use one of the DSS/PVM libraries, with a socket proxy implementation to allow backend process to be distributed on a cluster. Additionally, to use a parallel filesystem, likely PVFS since it doesn't require specialized storage and network hardware, to distribute the data on the storage of each node. There are likely many other issues which I am not aware off, which I would like to be informed about if anybody knows them. Particularly, is there any reason that a clustered filesystem like PVFS would not provide a necessary filesystem feature ? I am interested in such a solution since I would like to create a rather large database (500GB+), on which i would like to use the ability to embed the R language in postgresql to perform statistical calculations on the data (REmbeddedPostgres). These calculation are performed on the backend, and will likely be intensive. In my mind, such a solution would allow me to add new nodes to the system which provide both additional storage and CPU. Initially the backends could be crudely balanced by a round robin system, or perhaps by MOSIX. I am not interested in using a replication solution since this would require multiple copies of the data. Also, I am not that worried about a decrease in IO performance, since the tradeoff is CPU and storage scalability at once. Due to my usage of R and philosophical beliefs, I am not interested in using any commercial solution. My basic question is basically, is my idea is possible, and if so how difficult do people think it would be to implement? As I have mentioned I am not that well informed on the details of the postgresql implementation, therefore I would greatly appreciate any advice or opinions on this idea. Thank you in advance, Bruno PS: I apologize for the automatically included disclaimernotice Visit our website at http://www.ubswarburg.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
pgsql-hackers by date: