Re: Clustering, parallelised operating system, super-computing - Mailing list pgsql-general

From Brian Modra
Subject Re: Clustering, parallelised operating system, super-computing
Date
Msg-id AANLkTimV_Wh1DKofePDbUZErIb4li1qY7F9Qdi3LgxA_@mail.gmail.com
Whole thread Raw
In response to Re: Clustering, parallelised operating system, super-computing  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Clustering, parallelised operating system, super-computing  (Benjamin Smith <lists@benjamindsmith.com>)
List pgsql-general
On 14/05/2010, Bruce Momjian <bruce@momjian.us> wrote:
> Brian Modra wrote:
>> Hi,
>> I've been told that PostgreSQL and other similar databases don't work
>> well on a parallelised operating system because they make good use of
>> shared memory which does not cross the boundary between nodes in a
>> cluster.
>>
>> So I am wondering if any work is being done to make it possible to
>> have a single database schema that spans a number of hosts?
>>
>> For example, a table on one host/node that has a reference to a table
>> on another host/node with deletes cascading back.
>> e.g.
>
> Not currently.  There are some prototypes in development, but those
> usually have the same database on all the machines and they share the
> load.

I'm trying to solve the problem of firstly distributing the volume of
data, and secondarily the load.

So far, I'm putting some bulky data onto different hosts, where there
is no need to ever do a join. I put a "reference" table onto a host
with the data that needs to be joined, then I can select the actual
data from the other host by unique IDs after the join has been
performed locally.

To create a reference with "on delete cascade" across hosts, I create
a trigger (after) delete, and in the plpgsql I call dblink to do the
remote delete.

Similarly, I can do joins in plpgsql with the help of dblink.
But, doing joins across hosts certainly does defeat the purpose of
"distributing the load".

I think that the schema design must be done carefully when distributing data.
So it really will be difficult to get this "supercomputer database" right.

Maybe the best way to solve this is not to do automatic distribution
of the data, but rather to provide tools for implementing distributed
references and joins.

I'm thinking of working on this as part of "The Karoo Project" Open
Source Project I'm working on, and would appreciate
comments/support/criticism.
Thanks

--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

pgsql-general by date:

Previous
From: Catalin BOIE
Date:
Subject: PANIC: corrupted item pointer: 32766
Next
From: Leonardo F
Date:
Subject: Re: Authentication method for web app