Thread: Dream Server?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm currently running a 4 Gig 11 million row database and am looking to build a "dream server" for it. I am currently running a dual p3 1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have configured pgsql to use all of) and I'm concerned about performance once the db doubles in size, which should be in 6 mos to a year at the latest. First off, If money was no concern, what would you buy as the ultimate postgresql server running linux? Second off, on a more technical note, does pgsql take advantage of multiple processors. If I had a 8 way 800 MHz Xeon would the machine blow away a 2GHz P4? How much is CPU a factor compared to memory? Disk speed? I want to be able to do large volume selects on tables with more than 5 million rows and not have the server blink at other requests put in at the same time. Any hints or suggestions/experience here would be appreciated. Thanks, Gavin -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com> iQA/AwUBPGGlgn9xeBXAlKqsEQJPywCfenV2bHDKJ0czKLy2qbaoj+hiow8AoNaT 9gYOTvmzFR9+YIjA5MQwjSMN =B6Ak -----END PGP SIGNATURE-----
here's a link for a sql server performance tuning guide... http://www.sql-server-performance.com/hardware_tuning.asp some of the nuggets from it is: "When selecting your CPU for your server, select one with a large L2 cache. This is especially important if you have multiple-processor servers. Select at least a 1MB L2 cache if you have one or two CPUs. If you have 4 or more CPUs, get at a least 2MB L2 cache in each CPU. The greater the size of the L2 cache, the greater the server's CPU performance because it reduces the amount of wait time experienced by the CPU when reading and writing data to main memory." "From a performance perspective, it is better to have more smaller SCSI disk drives in an array than having fewer larger SCSI disk drives. Let's say that you need about 100GB of hard disk space in a RAID 5 array. There are several ways you can configure such an array, some of which offer more performance than others. Some configurations include: 13 - 9GB Drives 7 - 18GB Drives 4 - 36GB Drives Each of the above configurations will provide about the same amount of storage space, but the more drives there are in the array, the faster the I/O will be (assuming that the controllers can handle all of the I/O traffic). This is because more drives offer more read/write heads that all can be working simultaneously, which speeds disk reads and writes." ***** "Select the best I/O controller you can get. Top-notch controllers offload much of the I/O work onto its own local CPU, freeing up CPU time on the server to do other tasks. For the ultimate in I/O controllers, consider a fiber channel connection instead of a SCSI connection." rjsjr > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Gavin M. Roy > Sent: Wednesday, February 06, 2002 3:52 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Dream Server? > > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I'm currently running a 4 Gig 11 million row database and am looking > to build a "dream server" for it. I am currently running a dual p3 > 1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have > configured pgsql to use all of) and I'm concerned about performance > once the db doubles in size, which should be in 6 mos to a year at > the latest. First off, If money was no concern, what would you buy > as the ultimate postgresql server running linux? Second off, on a > more technical note, does pgsql take advantage of multiple > processors. If I had a 8 way 800 MHz Xeon would the machine blow > away a 2GHz P4? How much is CPU a factor compared to memory? Disk > speed? I want to be able to do large volume selects on tables with > more than 5 million rows and not have the server blink at other > requests put in at the same time. > > Any hints or suggestions/experience here would be appreciated. > > Thanks, > > Gavin > > -----BEGIN PGP SIGNATURE----- > Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com> > > iQA/AwUBPGGlgn9xeBXAlKqsEQJPywCfenV2bHDKJ0czKLy2qbaoj+hiow8AoNaT > 9gYOTvmzFR9+YIjA5MQwjSMN > =B6Ak > -----END PGP SIGNATURE----- > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
> I'm currently running a 4 Gig 11 million row database and am looking > to build a "dream server" for it. I am currently running a dual p3 > 1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have > configured pgsql to use all of) and I'm concerned about performance > once the db doubles in size, which should be in 6 mos to a year at > the latest. First off, If money was no concern, what would you buy > as the ultimate postgresql server running linux? I don't know if Linux runs on the new version of the Sun Starfires, does it? > Second off, on a > more technical note, does pgsql take advantage of multiple > processors. If you are running multiple connections, then yes, it does. But having an extra CPU around to handle kernel code, disk activity, network activity, etc. does help even for a single connection. > If I had a 8 way 800 MHz Xeon would the machine blow > away a 2GHz P4? How much is CPU a factor compared to memory? Both CPU and memory have to be taken into account. It does you no good to have a 10 GHz chip if your memory subsystem can't provide the data to feed the chip, and likewise, having 10 GB/sec throughput from your memory system does no good if you are running a 386! If you're using extremely large tables (which it sounds like you will be), then memory throughput is a very critical factor. As for the 8-Xeon to 2-P4, in total throughput, yes, the Xeon should blow away the P4, assuming that you were talking about overall throughput with multiple connections, and the 8-way machine had a memory subsystem that was up to the task. If you're talking about a single query on a single connection, then the P4 may very well beat the Xeon. Ideally, your SMP machine should require you to add DIMMS in groups equal to the number of your processors - in this case, 8 DIMMS at a time. Otherwise, you simply can't give each chip the full bandwidth. One interesting solution would be to interleave 4 banks of DDR Ram to feed 8 Xeons, but I doubt we'll ever see that in production. > Disk > speed? I want to be able to do large volume selects on tables with > more than 5 million rows and not have the server blink at other > requests put in at the same time. If you have enough RAM to keep the entire database in disk cache, then disk speed becomes much less of a factor, if you turn of fsync(). Like I've said plenty of times before (and probably bored everyone to tears), the lights on our DB machine only blink *occasionally*, even when the machine is literally being slammed with database activity. A hardware RAID card with cache on the board and some moderately-decent drives can give you a VERY fast, responsive disk subsystem. In a day or two, I'll be getting the parts to build a dual Athlon MP 1800+ machine, and I'm planning on putting PG on it and testing it against our 4x700 MHz Xeon machine, to see how it fares. Once I do, I'm planning on making a full report to the list. If anyone has a certain PG benchmark that they'd like me to run, let me know. steve
Perhaps we should think wide not tall. As the pontiac commercial says, wider is better. Build a distributed database. Increasing height of a box does not scale. Amdahl proved it. Perhaps someone can help with some links, it have seen references to it on ha-linux groups. Steve Wolfe wrote: > > I'm currently running a 4 Gig 11 million row database and am looking > > to build a "dream server" for it. I am currently running a dual p3 > > 1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have > > configured pgsql to use all of) and I'm concerned about performance > > once the db doubles in size, which should be in 6 mos to a year at > > the latest. First off, If money was no concern, what would you buy > > as the ultimate postgresql server running linux? > > I don't know if Linux runs on the new version of the Sun Starfires, does > it? > > > Second off, on a > > more technical note, does pgsql take advantage of multiple > > processors. > > If you are running multiple connections, then yes, it does. But having > an extra CPU around to handle kernel code, disk activity, network > activity, etc. does help even for a single connection. > > > If I had a 8 way 800 MHz Xeon would the machine blow > > away a 2GHz P4? How much is CPU a factor compared to memory? > > Both CPU and memory have to be taken into account. It does you no good > to have a 10 GHz chip if your memory subsystem can't provide the data to > feed the chip, and likewise, having 10 GB/sec throughput from your memory > system does no good if you are running a 386! If you're using extremely > large tables (which it sounds like you will be), then memory throughput is > a very critical factor. > > As for the 8-Xeon to 2-P4, in total throughput, yes, the Xeon should > blow away the P4, assuming that you were talking about overall throughput > with multiple connections, and the 8-way machine had a memory subsystem > that was up to the task. If you're talking about a single query on a > single connection, then the P4 may very well beat the Xeon. Ideally, your > SMP machine should require you to add DIMMS in groups equal to the number > of your processors - in this case, 8 DIMMS at a time. Otherwise, you > simply can't give each chip the full bandwidth. One interesting solution > would be to interleave 4 banks of DDR Ram to feed 8 Xeons, but I doubt > we'll ever see that in production. > > > Disk > > speed? I want to be able to do large volume selects on tables with > > more than 5 million rows and not have the server blink at other > > requests put in at the same time. > > If you have enough RAM to keep the entire database in disk cache, then > disk speed becomes much less of a factor, if you turn of fsync(). Like > I've said plenty of times before (and probably bored everyone to tears), > the lights on our DB machine only blink *occasionally*, even when the > machine is literally being slammed with database activity. A hardware > RAID card with cache on the board and some moderately-decent drives can > give you a VERY fast, responsive disk subsystem. > > In a day or two, I'll be getting the parts to build a dual Athlon MP > 1800+ machine, and I'm planning on putting PG on it and testing it against > our 4x700 MHz Xeon machine, to see how it fares. Once I do, I'm planning > on making a full report to the list. If anyone has a certain PG benchmark > that they'd like me to run, let me know. > > steve > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- ------------------------------------------------------------------------- Medi Montaseri medi@CyberShell.com Unix Distributed Systems Engineer HTTP://www.CyberShell.com CyberShell Engineering -------------------------------------------------------------------------
----- Original Message ----- > Perhaps we should think wide not tall. As the pontiac commercial says, wider > is > better. Build a distributed database. Increasing height of a box does not > scale. > Amdahl proved it. > > Perhaps someone can help with some links, it have seen references to it on > ha-linux groups. That's something that I've dreamed about for some time. My rack of load-balanced web servers scales efficiently, easily, and cheaply. I need double the capacity? I buy more machines, and plug them in. If there were a way of replicating PG data from one master to many slaves in near-real-time, I could have a rack full of load-balanced database servers right next to it - cheap, easy, and effective. Even though I've kept my mouth shut, I've wondered why more effort isn't devoted to that. There are a LOT of companies out there that fork money over hand-over-fist trying to buy a single machine that can handle all of their database usage, and as the size of the machine increases, the cost per transaction seems to increase exponentially. Sure, a million dollars would get you an entry-level Starfire, with 16 processors, upgradeably to 64, with a couple of gigabytes/second throughput, or for a lot less money, you could buy a number of smaller systems that, through copious amounts of RAM, CPU cycles, and combined bandwidth, could spin circles around the Starfire. Don't get the impression that I'm bad-mouthing the developpers - I'm sure that they're taking care of priorities as best they can. steve
On Wed, Feb 06, 2002 at 06:00:33PM -0700, Steve Wolfe wrote: > need double the capacity? I buy more machines, and plug them in. If > there were a way of replicating PG data from one master to many slaves in > near-real-time, I could have a rack full of load-balanced database servers > right next to it - cheap, easy, and effective. If the idea here is to have only one master (where data goes in) but many slaves (whence data comes when you read), then check out eRserver. PostgreSQL Inc's version is better than the one included in contrib/. We're using it to provide both redundancy and a secondary source for reads, and it is performing without a problem. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
> That's something that I've dreamed about for some time. My rack of > load-balanced web servers scales efficiently, easily, and cheaply. I > need double the capacity? I buy more machines, and plug them in. If > there were a way of replicating PG data from one master to many slaves in > near-real-time, I could have a rack full of load-balanced database servers > right next to it - cheap, easy, and effective. > > Even though I've kept my mouth shut, I've wondered why more effort isn't > devoted to that. I'll second this. Replication is my/our number one (actually, aside from schemas, the only) needed new feature for PostgreSQL. If we could do updates to a master server and push all those changes to a bunch of slaves, my job would be incredibly easier. As it stands, we've had to restructure our data model so that we can spread information across a number of different machines. > Don't get the impression that I'm bad-mouthing the developpers - I'm > sure that they're taking care of priorities as best they can. Here here. PostgreSQL seems to have come a long way, even in the relatively short time I've been using it. I can't heap enough praise on those who give their time to improving it. But as a user, it's important to let the developers know what we want and/or need. Greg
The two tier architecture proposed here, consisting of a master and slave seems to be the quickest way to get there. However writes handled by a slave needs to be reported to the master for rebroadcast to all slaves. This would create a wave in the system, or poor integrity in rapid writes environment. Of course an "application aware" scenario, where application connects to the master for all writes could ease the situation. It effectively removes one hup from the data path. ie instead of client-slave-master-slaves, it does client-master-slaves. And for a do it yourselfer, one can have a trigger on a system table for auto-updates. What I was thinking is perhaps to modify the "Planner/Optimizer" so that this layer would in addition to its current functionality, would assume the role of a distributor, dispatcher, scheduler (or whatever other name), setting up a "Modified Plan Tree" for the Executor. This "Plan Tree" would include a Node information as well (a Remote Executor, think RPC). There would also need to be an Intra-Node Advisory Communication Channel (perhaps more than one) for load balancing and Lock management. I call this an Asymetric architecture. Another approach would be to have the "Planner" only do the lock management (in additon to its normal things), I call this "Almost Shared Nothing", and let DNS determine who is next. Another approach would be to have a RAID (Redundant Array of Inexpensive Databases, I know I'm overloading RAID) Virtual Database (think Virtual File System), where an abstraction layer would determine where to put the tables; on node-1 or node-2 or both. Then either thru a discovery phase (client says, where do I connect for this query), or in a transparent mode, the server would pipe up with the owner of the data for the given query and simply facilitates the data movement. The more I write the more I see how complicated this is.... Sorry about my random thoughts.....it reminds me of "Some write to say what they think and some write to think what they want to say...." Aaaahhh.... I better get back to work.... day dreaming again... Gregory Wood wrote: > > That's something that I've dreamed about for some time. My rack of > > load-balanced web servers scales efficiently, easily, and cheaply. I > > need double the capacity? I buy more machines, and plug them in. If > > there were a way of replicating PG data from one master to many slaves in > > near-real-time, I could have a rack full of load-balanced database servers > > right next to it - cheap, easy, and effective. > > > > Even though I've kept my mouth shut, I've wondered why more effort isn't > > devoted to that. > > I'll second this. Replication is my/our number one (actually, aside from > schemas, the only) needed new feature for PostgreSQL. If we could do updates > to a master server and push all those changes to a bunch of slaves, my job > would be incredibly easier. As it stands, we've had to restructure our data > model so that we can spread information across a number of different > machines. > > > Don't get the impression that I'm bad-mouthing the developpers - I'm > > sure that they're taking care of priorities as best they can. > > Here here. PostgreSQL seems to have come a long way, even in the relatively > short time I've been using it. I can't heap enough praise on those who give > their time to improving it. But as a user, it's important to let the > developers know what we want and/or need. > > Greg > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- ------------------------------------------------------------------------- Medi Montaseri medi@CyberShell.com Unix Distributed Systems Engineer HTTP://www.CyberShell.com CyberShell Engineering -------------------------------------------------------------------------