Re: Partitioning / Clustering - Mailing list pgsql-performance
From | Richard_D_Levine@raytheon.com |
---|---|
Subject | Re: Partitioning / Clustering |
Date | |
Msg-id | OFA2AB4309.BA9A6C7D-ON05256FFD.0055F9F0-05256FFD.00566ADB@ftw.us.ray.com Whole thread Raw |
In response to | Re: Partitioning / Clustering ("Adam Haberlach" <adam@mediariffic.com>) |
Responses |
Re: Partitioning / Clustering
|
List | pgsql-performance |
> exploring the option of buying 10 cheapass > machines for $300 each. At the moment, that $300 buys you, from Dell, a > 2.5Ghz Pentium 4 Buy cheaper ass Dells with an AMD 64 3000+. Beats the crap out of the 2.5 GHz Pentium, especially for PostgreSQL. See the thread "Whence the Opterons" for more.... Rick pgsql-performance-owner@postgresql.org wrote on 05/10/2005 10:02:50 AM: > > I think that perhaps he was trying to avoid having to buy "Big Iron" at all. > > With all the Opteron v. Xeon around here, and talk of $30,000 machines, > perhaps it would be worth exploring the option of buying 10 cheapass > machines for $300 each. At the moment, that $300 buys you, from Dell, a > 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet. > The aggregate CPU and bandwidth is pretty stupendous, but not as easy to > harness as a single machine. > > For those of us looking at batch and data warehousing applications, it would > be really handy to be able to partition databases, tables, and processing > load across banks of cheap hardware. > > Yes, clustering solutions can distribute the data, and can even do it on a > per-table basis in some cases. This still leaves it up to the application's > logic to handle reunification of the data. > > Ideas: > 1. Create a table/storage type that consists of a select statement > on another machine. While I don't think the current executor is capable of > working on multiple nodes of an execution tree at the same time, it would be > great if it could offload a select of tuples from a remote table to an > entirely different server and merge the resulting data into the current > execution. I believe MySQL has this, and Oracle may implement it in another > way. > > 2. There is no #2 at this time, but I'm sure one can be > hypothesized. > > ...Google and other companies have definitely proved that one can harness > huge clusters of cheap hardware. It can't be _that_ hard, can it. :) > > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of John A Meinel > Sent: Tuesday, May 10, 2005 7:41 AM > To: Alex Stapleton > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Partitioning / Clustering > > Alex Stapleton wrote: > > What is the status of Postgres support for any sort of multi-machine > > scaling support? What are you meant to do once you've upgraded your > > box and tuned the conf files as much as you can? But your query load > > is just too high for a single machine? > > > > Upgrading stock Dell boxes (I know we could be using better machines, > > but I am trying to tackle the real issue) is not a hugely price > > efficient way of getting extra performance, nor particularly scalable > > in the long term. > > Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far > away from Big Iron. I don't know what performance you are looking for, but > you can easily get into inserting 10M rows/day with quality hardware. > > But actually is it your SELECT load that is too high, or your INSERT load, > or something inbetween. > > Because Slony is around if it is a SELECT problem. > http://gborg.postgresql.org/project/slony1/projdisplay.php > > Basically, Slony is a Master/Slave replication system. So if you have INSERT > going into the Master, you can have as many replicated slaves, which can > handle your SELECT load. > Slony is an asynchronous replicator, so there is a time delay from the > INSERT until it will show up on a slave, but that time could be pretty > small. > > This would require some application level support, since an INSERT goes to a > different place than a SELECT. But there has been some discussion about > pg_pool being able to spread the query load, and having it be aware of the > difference between a SELECT and an INSERT and have it route the query to the > correct host. The biggest problem being that functions could cause a SELECT > func() to actually insert a row, which pg_pool wouldn't know about. There > are 2 possible solutions, a) don't do that when you are using this system, > b) add some sort of comment hint so that pg_pool can understand that the > select is actually an INSERT, and needs to be done on the master. > > > > > So, when/is PG meant to be getting a decent partitioning system? > > MySQL is getting one (eventually) which is apparently meant to be > > similiar to Oracle's according to the docs. Clusgres does not appear > > to be widely/or at all used, and info on it seems pretty thin on the > > ground, so I am not too keen on going with that. Is the real solution > > to multi- machine partitioning (as in, not like MySQLs MERGE tables) > > on PostgreSQL actually doing it in our application API? This seems > > like a less than perfect solution once we want to add redundancy and > > things into the mix. > > There is also PGCluster > http://pgfoundry.org/projects/pgcluster/ > > Which is trying to be more of a Synchronous multi-master system. I haven't > heard of Clusgres, so I'm guessing it is an older attempt, which has been > overtaken by pgcluster. > > Just realize that clusters don't necessarily scale like you would want them > too. Because at some point you have to insert into the same table, which > means you need to hold a lock which prevents the other machine from doing > anything. And with synchronous replication, you have to wait for all of the > machines to get a copy of the data before you can say it has been committed, > which does *not* scale well with the number of machines. > > If you can make it work, I think having a powerful master server, who can > finish an INSERT quickly, and then having a bunch of Slony slaves with a > middleman (like pg_pool) to do load balancing among them, is the best way to > scale up. There are still some requirements, like not having to see the > results of an INSERT instantly (though if you are using hinting to pg_pool, > you could hint that this query must be done on the master, realizing that > the more you do it, the more you slow everything down). > > John > =:-> > > PS> I don't know what functionality has been actually implemented in > pg_pool, just that it was discussed in the past. Slony-II is also in the > works. > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
pgsql-performance by date: