Thread: || PostgreSQL

|| PostgreSQL

From
Duane Currie
Date:
Hi Everybody.

We've gotten a few requests for a distributed version of postgresql.
There's two models that I think could work for this.  What I'm looking 
for is two things:1.  Any opinions on which option to take.2.  Anyone willing to work on the project.

The features people are looking for are fault tolerance, and load distribution

The two models:1.  A separate server process which manages the parallelism.    Client connects to that server, which
handlesthe request    using (nearly) unmodified backends on different machines.    Would make use of the fact that the
vastmajority of     commands are read-only.
 
    Advantages:        Platform independent.  (e.g. SPARC's and x86 in             same cluster)        Less inherently
complex(simpler, separate component)        Less changes to current code
 
    Potential Snag:        May need a different communication interface built..        (maybe not...)
2.  Modifying the current server to handle the parallelism.    i.e. Start the server with another option and config
file   (or something equivalent).  All modifications built into    the current server.
 
    Advantages:        Could more easily be made to run single queries in            parallel        Could be made more
efficientby directly making            storage calls, instead of using an SQL            interface.
 
    Potential Snag:        Adds a lot of complexity to the current backend.

My personal preference is toward option 1.  It sounds a lot easier to implement, and works through a well-defined
interface(SQL).  Platform independent, just in case someone decides to throw in a     SPARC box into their room of
Alphas. Minimal changes to the current backend which keeps the backend simpler    for people to work on.  (My next
email'sgonna go down that    line somewhere)
 
Any preferences or options I haven't thought of?  (or any details which would
complicate the project?)  As well, would anybody be interested in working on
this?

Duane


Re: [HACKERS] || PostgreSQL

From
Thomas Lockhart
Date:
Ingres happened to implement their distributed system using option
(1), having a distributed front-end which knew about remote servers
and could parse and optimize queries then send individual queries to
the actual servers.

Seemed to work pretty well, and you could reuse a large amount of
code. otoh, if you implemented option (2) (local and remote tables),
then you could choose to construct your database on the option (1)
model without penalty.
                    - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] || PostgreSQL

From
"Ross J. Reedstrom"
Date:
Hi Duane - 
I swear, the syncronicity is starting to get _real_ thick around here.
Take a look at the discussion between myself, Tom Lane, and Bruce
Mojarian yesterday, under the helpful subject line of Mariposa. We discuss
implementing model 2, which, as Thomas points out in his reply to you,
can be a superset of model 1. I need to be able to set up a distributed,
heterogenous database system. We've priced commerical offerings in this
field, and ones with sufficent flexibilty to do what we need start at
$40000 for a _two_ backend license.

Anyway, let's combine forces! I'm going to take a stab at this anyway, no
sense in duplicating effort. I could do a local CVS tree, to keep us synced.

Ross

P.S. Mariposa was a project out of Stonebraker's lab at Berkeley, to build a
distributed db. checkout http://mariposa.cs.berkeley.edu

On Tue, Aug 03, 1999 at 10:33:05AM +0000, Duane Currie wrote:

<SNIP>

>     
> Any preferences or options I haven't thought of?  (or any details which would
> complicate the project?)  As well, would anybody be interested in working on
> this?
> 

-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005