Home > mailing lists

Re: Partitioning / Clustering - Mailing list pgsql-performance

From	John A Meinel
Subject	Re: Partitioning / Clustering
Date	May 10, 2005 15:47:21
Msg-id	4280D76E.3090209@arbash-meinel.com Whole thread Raw
In response to	Re: Partitioning / Clustering ("Adam Haberlach" <adam@mediariffic.com>)
Responses	Re: Partitioning / Clustering
List	pgsql-performance

Tree view

Adam Haberlach wrote:
> I think that perhaps he was trying to avoid having to buy "Big Iron" at all.
>
> With all the Opteron v. Xeon around here, and talk of $30,000 machines,
> perhaps it would be worth exploring the option of buying 10 cheapass
> machines for $300 each.  At the moment, that $300 buys you, from Dell, a
> 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet.
> The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
> harness as a single machine.
>
> For those of us looking at batch and data warehousing applications, it would
> be really handy to be able to partition databases, tables, and processing
> load across banks of cheap hardware.
>
> Yes, clustering solutions can distribute the data, and can even do it on a
> per-table basis in some cases.  This still leaves it up to the application's
> logic to handle reunification of the data.

Sure. A lot of this is application dependent, though. For instance
foreign key constraints. In a general cluster solution, you would allow
foreign keys across partitions. I have a feeling this would be extra
slow, and hard to do correctly. Multi-machine transactions are also a
difficulty, since WAL now has to take into account all machines, and you
have to wait for fsync on all of them.

I'm not sure how Oracle does it, but these things seem like they prevent
clustering from really scaling very well.

>
> Ideas:
>     1. Create a table/storage type that consists of a select statement
> on another machine.  While I don't think the current executor is capable of
> working on multiple nodes of an execution tree at the same time, it would be
> great if it could offload a select of tuples from a remote table to an
> entirely different server and merge the resulting data into the current
> execution.  I believe MySQL has this, and Oracle may implement it in another
> way.
>
>     2. There is no #2 at this time, but I'm sure one can be
> hypothesized.
>
> ...Google and other companies have definitely proved that one can harness
> huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)

Again, it depends on the application. A generic database with lots of
cross reference integrity checking does not work on a cluster very well.
A very distributed db where you don't worry about cross references does
scale. Google made a point of making their application work in a
distributed manner.

In the other post he mentions that pg_pool could naturally split out the
rows into different machines based on partitioning, etc. I would argue
that it is more of a custom pool daemon based on the overall
application. Because you have to start dealing with things like
cross-machine joins. Who handles that? the pool daemon has to, since it
is the only thing that talks to both tables. I think you could certainly
write a reasonably simple application specific daemon where all of the
clients send their queries to, and it figures out where they need to go,
and aggregates them as necessary. But a fully generic one is *not*
simple at all, and I think is far out of the scope of something like
pg_pool.

I'm guessing that PGCluster is looking at working on that, and it might
be true that pg_pool is thinking about it. But just thinking about the
very simple query:

SELECT row1, row2 FROM table1_on_machine_a NATURAL JOIN table2_on_machine_b
WHERE restrict_table_1 AND restrict_table_2
AND restrict_1_based_on_2;

This needs to be broken into something like:

SELECT row1 FROM table1_on_machine_a
    WHERE restrict_table_1
    ORDER BY join_column;
SELECT row2 FROM table2_on_machine_b
    WHERE restrict_table_2
    ORDER BY join_column;

Then these rows need to be merge_joined, and the restrict_1_based_on_2
needs to be applied.
This is in no way trivial, and I think it is outside the scope of
pg_pool. Now maybe if you restrict yourself so that each query stays
within one machine you can make it work. Or write your own app to handle
some of this transparently for the clients. But I would expect to make
the problem feasible, it would not be a generic solution.

Maybe I'm off base, I don't really keep track of pg_pool/PGCluster/etc.
But I can see that the problem is very difficult. Not at the very least,
this is a simple query. And it doesn't even do optimizations. You might
actually prefer the above to be done with a Nestloop style, where
table_1 is selected, and then for each row you do a single index select
on table_2. But how is your app going to know that? It has to have the
statistics from the backend databases. And if it has to place an extra
query to get those statistics, you just hurt your scalability even more.
Whereas big-iron already has all the statistics, and can optimize the
query plan.

Perhaps pg_cluster will handle this, by maintaining full statistics
across the cluster on each machine, so that more optimal queries can be
performed. I don't really know.

John
=:->

Attachment

signature.asc

pgsql-performance by date:

From: Richard_D_Levine@raytheon.com
Date: 10 May 2005, 15:45:26
Subject: Re: Partitioning / Clustering

From: Tom Lane
Date: 10 May 2005, 15:50:09
Subject: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

Re: Partitioning / Clustering - Mailing list pgsql-performance

Attachment

Previous

Next