briefly: what would be the plausibility/difficulty
in using a distributed shared memory library (DSS)
and a socket bridge implementation to recompile
postgresql and deploy on a cluster of linux boxes
using PVFS/GFS, creating a single storage clustered
database. or better yet has someone done it?
Hello all,
I am interested in the possibility of constructing a
postgresql cluster of machines. While I am not particularly
well informed, I believe this is not possible using
the MOSIX system due to the following implementation features
of postgresql:
- shared memory to implement locking/synchronization between the multiple backend processes
- passing of socket down to forked backend process
while perhaps the Ongoing projects of Network RAM and
migratable sockets would somehow help, I suspect they
do not fully solve the problem.
My idea is to try and recompile/hack postgresql to use
one of the DSS/PVM libraries, with a socket proxy
implementation to allow backend process to be
distributed on a cluster. Additionally, to use
a parallel filesystem, likely PVFS since it doesn't
require specialized storage and network hardware,
to distribute the data on the storage of each node.
There are likely many other issues which I am not
aware off, which I would like to be informed about
if anybody knows them. Particularly, is there any
reason that a clustered filesystem like PVFS would not
provide a necessary filesystem feature ?
I am interested in such a solution since I would like
to create a rather large database (500GB+), on which
i would like to use the ability to embed the R language
in postgresql to perform statistical calculations on
the data (REmbeddedPostgres).
These calculation are performed on the backend, and will
likely be intensive. In my mind, such a solution would
allow me to add new nodes to the system which provide
both additional storage and CPU. Initially the backends
could be crudely balanced by a round robin system, or
perhaps by MOSIX.
I am not interested in using a replication solution since
this would require multiple copies of the data. Also, I
am not that worried about a decrease in IO performance,
since the tradeoff is CPU and storage scalability at once.
Due to my usage of R and philosophical beliefs, I am not
interested in using any commercial solution.
My basic question is basically, is my idea is possible,
and if so how difficult do people think it would be to
implement?
As I have mentioned I am not that well informed on the
details of the postgresql implementation, therefore
I would greatly appreciate any advice or opinions on this
idea.
Thank you in advance,
Bruno
PS: I apologize for the automatically included disclaimernotice
Visit our website at http://www.ubswarburg.com
This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or error-free
as information could be intercepted, corrupted, lost, destroyed,
arrive late or incomplete, or contain viruses. The sender therefore
does not accept liability for any errors or omissions in the contents
of this message which arise as a result of e-mail transmission. If
verification is required please request a hard-copy version. This
message is provided for informational purposes and should not be
construed as a solicitation or offer to buy or sell any securities or
related financial instruments.