Re: Partitioning / Clustering - Mailing list pgsql-performance

From Richard_D_Levine@raytheon.com
Subject Re: Partitioning / Clustering
Date
Msg-id OFA2AB4309.BA9A6C7D-ON05256FFD.0055F9F0-05256FFD.00566ADB@ftw.us.ray.com
Whole thread Raw
In response to Re: Partitioning / Clustering  ("Adam Haberlach" <adam@mediariffic.com>)
Responses Re: Partitioning / Clustering
List pgsql-performance
> exploring the option of buying 10 cheapass
> machines for $300 each.  At the moment, that $300 buys you, from Dell, a
> 2.5Ghz Pentium 4

Buy cheaper ass Dells with an AMD 64 3000+.  Beats the crap out of the 2.5
GHz Pentium, especially for PostgreSQL.

See the thread "Whence the Opterons" for more....

Rick

pgsql-performance-owner@postgresql.org wrote on 05/10/2005 10:02:50 AM:

>
> I think that perhaps he was trying to avoid having to buy "Big Iron" at
all.
>
> With all the Opteron v. Xeon around here, and talk of $30,000 machines,
> perhaps it would be worth exploring the option of buying 10 cheapass
> machines for $300 each.  At the moment, that $300 buys you, from Dell, a
> 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit
ethernet.
> The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
> harness as a single machine.
>
> For those of us looking at batch and data warehousing applications, it
would
> be really handy to be able to partition databases, tables, and processing
> load across banks of cheap hardware.
>
> Yes, clustering solutions can distribute the data, and can even do it on
a
> per-table basis in some cases.  This still leaves it up to the
application's
> logic to handle reunification of the data.
>
> Ideas:
>    1. Create a table/storage type that consists of a select statement
> on another machine.  While I don't think the current executor is capable
of
> working on multiple nodes of an execution tree at the same time, it would
be
> great if it could offload a select of tuples from a remote table to an
> entirely different server and merge the resulting data into the current
> execution.  I believe MySQL has this, and Oracle may implement it in
another
> way.
>
>    2. There is no #2 at this time, but I'm sure one can be
> hypothesized.
>
> ...Google and other companies have definitely proved that one can harness
> huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)
>
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of John A
Meinel
> Sent: Tuesday, May 10, 2005 7:41 AM
> To: Alex Stapleton
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Partitioning / Clustering
>
> Alex Stapleton wrote:
> > What is the status of Postgres support for any sort of multi-machine
> > scaling support? What are you meant to do once you've upgraded your
> > box and tuned the conf files as much as you can? But your query load
> > is just too high for a single machine?
> >
> > Upgrading stock Dell boxes (I know we could be using better machines,
> > but I am trying to tackle the real issue) is not a hugely price
> > efficient way of getting extra performance, nor particularly scalable
> > in the long term.
>
> Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far
> away from Big Iron. I don't know what performance you are looking for,
but
> you can easily get into inserting 10M rows/day with quality hardware.
>
> But actually is it your SELECT load that is too high, or your INSERT
load,
> or something inbetween.
>
> Because Slony is around if it is a SELECT problem.
> http://gborg.postgresql.org/project/slony1/projdisplay.php
>
> Basically, Slony is a Master/Slave replication system. So if you have
INSERT
> going into the Master, you can have as many replicated slaves, which can
> handle your SELECT load.
> Slony is an asynchronous replicator, so there is a time delay from the
> INSERT until it will show up on a slave, but that time could be pretty
> small.
>
> This would require some application level support, since an INSERT goes
to a
> different place than a SELECT. But there has been some discussion about
> pg_pool being able to spread the query load, and having it be aware of
the
> difference between a SELECT and an INSERT and have it route the query to
the
> correct host. The biggest problem being that functions could cause a
SELECT
> func() to actually insert a row, which pg_pool wouldn't know about. There
> are 2 possible solutions, a) don't do that when you are using this
system,
> b) add some sort of comment hint so that pg_pool can understand that the
> select is actually an INSERT, and needs to be done on the master.
>
> >
> > So, when/is PG meant to be getting a decent partitioning system?
> > MySQL is getting one (eventually) which is apparently meant to be
> > similiar to Oracle's according to the docs. Clusgres does not appear
> > to be widely/or at all used, and info on it seems pretty thin on the
> > ground, so I am not too keen on going with that. Is the real solution
> > to multi- machine partitioning (as in, not like MySQLs MERGE tables)
> > on  PostgreSQL actually doing it in our application API? This seems
> > like  a less than perfect solution once we want to add redundancy and
> > things into the mix.
>
> There is also PGCluster
> http://pgfoundry.org/projects/pgcluster/
>
> Which is trying to be more of a Synchronous multi-master system. I
haven't
> heard of Clusgres, so I'm guessing it is an older attempt, which has been
> overtaken by pgcluster.
>
> Just realize that clusters don't necessarily scale like you would want
them
> too. Because at some point you have to insert into the same table, which
> means you need to hold a lock which prevents the other machine from doing
> anything. And with synchronous replication, you have to wait for all of
the
> machines to get a copy of the data before you can say it has been
committed,
> which does *not* scale well with the number of machines.
>
> If you can make it work, I think having a powerful master server, who can
> finish an INSERT quickly, and then having a bunch of Slony slaves with a
> middleman (like pg_pool) to do load balancing among them, is the best way
to
> scale up. There are still some requirements, like not having to see the
> results of an INSERT instantly (though if you are using hinting to
pg_pool,
> you could hint that this query must be done on the master, realizing that
> the more you do it, the more you slow everything down).
>
> John
> =:->
>
> PS> I don't know what functionality has been actually implemented in
> pg_pool, just that it was discussed in the past. Slony-II is also in the
> works.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL
Next
From: John A Meinel
Date:
Subject: Re: Partitioning / Clustering