Re: Database cluster? - Mailing list pgsql-general

From Doug Semig
Subject Re: Database cluster?
Date
Msg-id 3.0.6.32.20001130103737.007d8ca0@sloth.c3net.net
Whole thread Raw
In response to Re: Database cluster?  ("Gordan Bobic" <gordan@freeuk.com>)
List pgsql-general
You're almost describing a Teradata DBM.

What an amazing machine!  Last I heard about 6 years ago, though, AT&T was
rewriting it as an NT app instead of running on proprietary hardware.  The
proprietary hardware was essentially a cluster of 80486 computers (at the
time).

What they had done was implemented a pyramid structure of 80486 computers.
The lowest level of computers had hard disks and stored the data.  Two of
the lowest level computers would "report" to a single higher up computer.
Two of these higher up computers would "report" to yet another single
higher up computer until there was only one higher up computer to report to.

The thing that impacted me the most about this architecture was that
sorting was practically built in.  So all the intermediary computers had to
do was merge the sorted result sets from its lower level computers.  Blazing!

And data was stored on a couple of leaf-level computers for redundancy.

I miss that old beast.  But I certainly cannot afford the multimillion
dollars required to get one for myself.  We lovingly called the one we
worked with the "Pteradactyl," which is the old name for that bird-like
dinosaur (evidentally there's a new word for the bird-like dinosaur, the
pteronodon or something?).

Doug

At 02:44 PM 11/30/00 -0000, Gordan Bobic wrote:
>Thanks.
>
>I have just had another thought. If all the tables are split across several
>computers, this would help as well.
>
>For example, if we have 100 records and 2 database servers, each server
>could have 50 of those 100 records on it. When a selection is required,
>each server would look through it's much smaller database, and report back
>the "hits". This would, effectively, provide a near linear speedup in the
>query time, while introducing only the minor network overhead (or a major
>one, depending on how much data is transferred).
>
>Some extra logic could then be implemented for related tables that would
>allow the most closely related records from the different tables to be
>"clustered" (as in kind of remotely similar to the CLUSTER command) on the
>same server, for faster response time and minimized network usage
>requirements. The "vacuum" or "cluster" features could be used overnight to
>re-optimize the distribution of records across the servers.
>
>In all this, a "master" node could be used for coordinating the whole
>operation. We could ask the master node to do a query, and it would
>automatically, knowing what slaves it has, fire off that query on them.
>Each slave would then in parallel, execute a query, and return a subset of
>the data we were looking for. This data would then be joined into one
>recordset before it is returned to the client that requested it.
>
>As far I can see, as long as the amounts of data shifted aren't huge enough
>to cause problems with network congestion, and the query time is dominant
>to data transfer time over the network, this should provide a rather
>scaleable system. I understand that the form of database clustering I am
>mentioning here is fairly rudimentary and unsophisticated, but it would
>certaily be a very useful feature.
>
>Are there any plans to implement this sort of functionality in PostgreSQL?
>Or is this a lot more complicated than it seems...
>
>Regards.
>
>Gordan



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: plpgsql variable trouble
Next
From: Tom Lane
Date:
Subject: Re: Built in Functions use with recordsets