postgresql cluster? - Mailing list pgsql-hackers

From Bruno.White@ubsw.com
Subject postgresql cluster?
Date
Msg-id H00003cc0c04f49b.0997292299.ch1p979.chi.swissbank.com@MHS
Whole thread Raw
List pgsql-hackers
briefly: what would be the plausibility/difficulty 
in using a distributed shared memory library (DSS) 
and a socket bridge implementation to recompile
postgresql and deploy on a cluster of linux boxes
using PVFS/GFS, creating a single storage clustered
database. or better yet has someone done it?

Hello all,

I am interested in the possibility of constructing a 
postgresql cluster of machines. While I am not particularly 
well informed, I believe this is not possible using
the MOSIX system due to the following implementation features
of postgresql:

- shared memory to implement locking/synchronization  between the multiple backend processes

- passing of socket down to forked backend process

while perhaps the Ongoing projects of Network RAM and
migratable sockets would somehow help, I suspect they
do not fully solve the problem. 

My idea is to try and recompile/hack postgresql to use
one of the DSS/PVM libraries, with a socket proxy
implementation to allow backend process to be
distributed on a cluster. Additionally, to use
a parallel filesystem, likely PVFS since it doesn't
require specialized storage and network hardware,
to distribute the data on the storage of each node.

There are likely many other issues which I am not
aware off, which I would like to be informed about
if anybody knows them. Particularly, is there any
reason that a clustered filesystem like PVFS would not
provide a necessary filesystem feature ?

I am interested in such a solution since I would like
to create a rather large database (500GB+), on which
i would like to use the ability to embed the R language
in postgresql to perform statistical calculations on 
the data (REmbeddedPostgres). 

These calculation are performed on the backend, and will
likely be intensive. In my mind, such a solution would
allow me to add new nodes to the system which provide
both additional storage and CPU. Initially the backends
could be crudely balanced by a round robin system, or
perhaps by MOSIX.

I am not interested in using a replication solution since
this would require multiple copies of the data. Also, I
am not that worried about a decrease in IO performance, 
since the tradeoff is CPU and storage scalability at once.

Due to my usage of R and philosophical beliefs, I am not
interested in using any commercial solution.

My basic question is basically, is my idea is possible, 
and if so how difficult do people think it would be to 
implement?

As I have mentioned I am not that well informed on the
details of the postgresql implementation, therefore
I would greatly appreciate any advice or opinions on this
idea. 

Thank you in advance,

Bruno

PS: I apologize for the automatically included disclaimernotice



Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.



pgsql-hackers by date:

Previous
From: mlw
Date:
Subject: Re: OID wraparound: summary and proposal
Next
From: Jan Wieck
Date:
Subject: Re: OID wraparound: summary and proposal