Thread: || PostgreSQL
Hi Everybody. We've gotten a few requests for a distributed version of postgresql. There's two models that I think could work for this. What I'm looking for is two things:1. Any opinions on which option to take.2. Anyone willing to work on the project. The features people are looking for are fault tolerance, and load distribution The two models:1. A separate server process which manages the parallelism. Client connects to that server, which handlesthe request using (nearly) unmodified backends on different machines. Would make use of the fact that the vastmajority of commands are read-only. Advantages: Platform independent. (e.g. SPARC's and x86 in same cluster) Less inherently complex(simpler, separate component) Less changes to current code Potential Snag: May need a different communication interface built.. (maybe not...) 2. Modifying the current server to handle the parallelism. i.e. Start the server with another option and config file (or something equivalent). All modifications built into the current server. Advantages: Could more easily be made to run single queries in parallel Could be made more efficientby directly making storage calls, instead of using an SQL interface. Potential Snag: Adds a lot of complexity to the current backend. My personal preference is toward option 1. It sounds a lot easier to implement, and works through a well-defined interface(SQL). Platform independent, just in case someone decides to throw in a SPARC box into their room of Alphas. Minimal changes to the current backend which keeps the backend simpler for people to work on. (My next email'sgonna go down that line somewhere) Any preferences or options I haven't thought of? (or any details which would complicate the project?) As well, would anybody be interested in working on this? Duane
Ingres happened to implement their distributed system using option (1), having a distributed front-end which knew about remote servers and could parse and optimize queries then send individual queries to the actual servers. Seemed to work pretty well, and you could reuse a large amount of code. otoh, if you implemented option (2) (local and remote tables), then you could choose to construct your database on the option (1) model without penalty. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Hi Duane - I swear, the syncronicity is starting to get _real_ thick around here. Take a look at the discussion between myself, Tom Lane, and Bruce Mojarian yesterday, under the helpful subject line of Mariposa. We discuss implementing model 2, which, as Thomas points out in his reply to you, can be a superset of model 1. I need to be able to set up a distributed, heterogenous database system. We've priced commerical offerings in this field, and ones with sufficent flexibilty to do what we need start at $40000 for a _two_ backend license. Anyway, let's combine forces! I'm going to take a stab at this anyway, no sense in duplicating effort. I could do a local CVS tree, to keep us synced. Ross P.S. Mariposa was a project out of Stonebraker's lab at Berkeley, to build a distributed db. checkout http://mariposa.cs.berkeley.edu On Tue, Aug 03, 1999 at 10:33:05AM +0000, Duane Currie wrote: <SNIP> > > Any preferences or options I haven't thought of? (or any details which would > complicate the project?) As well, would anybody be interested in working on > this? > -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005