Hi Everybody.
We've gotten a few requests for a distributed version of postgresql.
There's two models that I think could work for this. What I'm looking
for is two things:1. Any opinions on which option to take.2. Anyone willing to work on the project.
The features people are looking for are fault tolerance, and load distribution
The two models:1. A separate server process which manages the parallelism. Client connects to that server, which
handlesthe request using (nearly) unmodified backends on different machines. Would make use of the fact that the
vastmajority of commands are read-only.
Advantages: Platform independent. (e.g. SPARC's and x86 in same cluster) Less inherently
complex(simpler, separate component) Less changes to current code
Potential Snag: May need a different communication interface built.. (maybe not...)
2. Modifying the current server to handle the parallelism. i.e. Start the server with another option and config
file (or something equivalent). All modifications built into the current server.
Advantages: Could more easily be made to run single queries in parallel Could be made more
efficientby directly making storage calls, instead of using an SQL interface.
Potential Snag: Adds a lot of complexity to the current backend.
My personal preference is toward option 1. It sounds a lot easier to implement, and works through a well-defined
interface(SQL). Platform independent, just in case someone decides to throw in a SPARC box into their room of
Alphas. Minimal changes to the current backend which keeps the backend simpler for people to work on. (My next
email'sgonna go down that line somewhere)
Any preferences or options I haven't thought of? (or any details which would
complicate the project?) As well, would anybody be interested in working on
this?
Duane