Thread: [PERFORMANCE] Buying hardware
So, the eternal problem with what hardware to buy. I really miss a hardware buying guide for database servers now that I'm about to buy one.. Some general guidelines mixed with ranked lists of what hardware that is best, shouldn't that be on the wiki?. THis is of course very difficult to advice about, but shouldn't geneeral advice be like: 1) A fast CPU but not on the bleeding edge 2) As much RAM as you can fit into the machine without paying way to much for it. Use the fastest ram you can find (what is it called today? PC 1333 MHz or something like that?) 3) Fast harddiscs. Best is raid X (what raid should one use?) 4) Use software raid unless you have the money to buy a raid controller, in which case here is the ranking of them <list of brand/modells> ordered by quality and a general comment on exactly how much better they are than the one below on the list ;-)
On Sun, Jan 25, 2009 at 2:21 PM, A B <gentosaker@gmail.com> wrote: > So, the eternal problem with what hardware to buy. I really miss a > hardware buying guide for database servers now that I'm about to buy > one.. > Some general guidelines mixed with ranked lists of what hardware that > is best, shouldn't that be on the wiki?. > > THis is of course very difficult to advice about, but shouldn't > geneeral advice be like: > 1) A fast CPU but not on the bleeding edge More cores is more important than faster but fewer > 2) As much RAM as you can fit into the machine without paying way to > much for it. Use the fastest ram you can find (what is it called > today? PC 1333 MHz or something like that?) The speed of the RAM isn't as important as the amount and the speed of the chipset on the motherboard. > 3) Fast harddiscs. Best is raid X (what raid should one use?) Again, more slower disks > fewer slower ones. RAID-10 is almost always the right choice. > 4) Use software raid unless you have the money to buy a raid > controller, in which case here is the ranking of them > <list of brand/modells> Areca and 3ware/Escalade are the two best controllers for the money out right now. They tend to take turns being the absolute best as they release new cards. Newer Arecas (the 1680 series) use an ethernet port for traps and such, so no need for special software that might be kernel version dependent. Both cost about the same for their top of the line cards. Make sure you have battery backed cache.
On Sun, 25 Jan 2009, Scott Marlowe wrote: > More cores is more important than faster but fewer > > Again, more slower disks > fewer slower ones. Not necessarily. It depends what you are doing. If you're going to be running only one database connection at a time, doing really big complex queries, then having really fast CPUs and discs is better than having lots. However, that situation is rare. > RAID-10 is almost always the right choice. Agreed. Unless you don't care about the data and need the space, where RAID 0 might be useful, or if you really don't need the space, where RAID 1 might be okay. If your controller supports it. Matthew -- The third years are wandering about all worried at the moment because they have to hand in their final projects. Please be sympathetic to them, say things like "ha-ha-ha", but in a sympathetic tone of voice -- Computer Science Lecturer
On Mon, Jan 26, 2009 at 4:09 AM, Matthew Wakeling <matthew@flymine.org> wrote: > On Sun, 25 Jan 2009, Scott Marlowe wrote: >> >> More cores is more important than faster but fewer >> >> Again, more slower disks > fewer slower ones. > > Not necessarily. It depends what you are doing. If you're going to be > running only one database connection at a time, doing really big complex > queries, then having really fast CPUs and discs is better than having lots. > However, that situation is rare. If backup/restore times are important, having a fast CPU is important because backup/restore is single threaded and unable to use more than one CPU. OK, two CPUs, one for the pg_dump process and one for the postgres daemon - but who buys anything with less than two cores these days? We do daily backups of our databases, and although our biggest isn't very large at approximately 15GB, backups take a bit more than an hour with one CPU maxed out. This system has two Xeon 5130 @ 2GHz, so even with the fastest processors, we can only reduce backup times by at most 50%. During normal workloads, processing hundreds of queries a second, system utilization stays below 10% on average - so for us, fewer cores that are faster would be a better purchase than more cores that are slower. Lots of people have databases much, much, bigger - I'd hate to imagine have to restore from backup from one of those monsters. -Dave
On Jan 26, 2009, at 2:42 PM, David Rees wrote: > > Lots of people have databases much, much, bigger - I'd hate to imagine > have to restore from backup from one of those monsters. > If you use PITR + rsync you can create a binary snapshot of the db, so restore time is simply how long it takes to untar / whatever it into place. Our backup script basically does: archive backup directory pg_start_backup rsync pg_stop_backup voila. I have 2 full copies of the db. You could even expand it a bit and after the rsync & friends have it fire up the instance and run pg_dump against it for a pg_restore compatible dump "just in case". It takes a long time to restore a 300GB db, even if you cheat and parallelify some of it. 8.4 may get a pg_restore that can load in parallel - which will help somewhat. -- Jeff Trout <jeff@jefftrout.com> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/
On Mon, 2009-01-26 at 14:58 -0500, Jeff wrote: > voila. I have 2 full copies of the db. You could even expand it a bit > and after the rsync & friends have it fire up the instance and run > pg_dump against it for a pg_restore compatible dump "just in case". > > It takes a long time to restore a 300GB db, even if you cheat and > parallelify some of it. 8.4 may get a pg_restore that can load in > parallel - which will help somewhat. Somewhat? Just to be clear, if you have the hardware for it, parallel restore can take a 500GB restore in 2.5 hours (versus 15). IMO, that is a *little* more than somewhat. Maybe, a bit? ;) Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
The technique Jeff is speaking of below is exactly how we do it, except we use file-system snapshots vs rsync. The problem is how slow log application is when recovering since it's a single process, and very slow at that. -kg On Jan 26, 2009, at 11:58 AM, Jeff wrote: > > On Jan 26, 2009, at 2:42 PM, David Rees wrote: >> >> Lots of people have databases much, much, bigger - I'd hate to >> imagine >> have to restore from backup from one of those monsters. >> > > If you use PITR + rsync you can create a binary snapshot of the db, > so restore time is simply how long it takes to untar / whatever it > into place. Our backup script basically does: > > archive backup directory > pg_start_backup > rsync > pg_stop_backup > > voila. I have 2 full copies of the db. You could even expand it a > bit and after the rsync & friends have it fire up the instance and > run pg_dump against it for a pg_restore compatible dump "just in > case". > > It takes a long time to restore a 300GB db, even if you cheat and > parallelify some of it. 8.4 may get a pg_restore that can load in > parallel - which will help somewhat. > > -- > Jeff Trout <jeff@jefftrout.com> > http://www.stuarthamm.net/ > http://www.dellsmartexitin.com/ > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Mon, Jan 26, 2009 at 11:58 AM, Jeff <threshar@torgo.978.org> wrote: > On Jan 26, 2009, at 2:42 PM, David Rees wrote: >> Lots of people have databases much, much, bigger - I'd hate to imagine >> have to restore from backup from one of those monsters. > > If you use PITR + rsync you can create a binary snapshot of the db, so > restore time is simply how long it takes to untar / whatever it into place. Good point - we do that as well and that helps with backup times (though we still grab daily backups in addition to log-shipping), but that still doesn't do much to avoid long restore times (at least until pg-8.4 as Joshua mentions which can do parallel backup/restores). Going back to the original question - Trying to come up with a general guide to buying db hardware isn't easy because of the number of variables like - what's your budget, what type of workload do you need to support, how big is your db, how large will you db get, etc... As the others mentioned having a good RAID controller with a BBU cache is essential for good performance. RAID10 is generally the best performing raid configuration, though for data warehousing where you need maximum storage you might consider a RAID6 with a RAID1 for the WAL. The workload will determine whether is more beneficial to go with quantity rather than speed of processors - As a rough calculation you can simply look at the raw GHz you're getting (multiply core speed by number of cores) - more GHz should be faster as long as your workload is parallelizable. And yes, the more memory you can squeeze into the machine, the better, though you'll find that after a certain point, price starts going up steeply. Of course, if you only have a 15GB database, once you reach 16GB of memory you've pretty much hit the point of diminishing returns. -Dave
On Jan 26, 2009, at 3:00 PM, Joshua D. Drake wrote: > On Mon, 2009-01-26 at 14:58 -0500, Jeff wrote: > >> voila. I have 2 full copies of the db. You could even expand it a >> bit >> and after the rsync & friends have it fire up the instance and run >> pg_dump against it for a pg_restore compatible dump "just in case". >> >> It takes a long time to restore a 300GB db, even if you cheat and >> parallelify some of it. 8.4 may get a pg_restore that can load in >> parallel - which will help somewhat. > > Somewhat? Just to be clear, if you have the hardware for it, parallel > restore can take a 500GB restore in 2.5 hours (versus 15). IMO, that > is > a *little* more than somewhat. Maybe, a bit? ;) > I'd say that qualifies more towards just a "smidge" faster ;) I'm quite excited about the feature. I'm still on 8.2 mostly because of the downtime of the dump & restore. I wrote up some plans a while back on doing the poor-mans parallel restore, but I haven't had the time to actually do it. Theoretically, wouldn't the parallel pg_restore be able to run against an 8.3 instance with a dump from 8.2? I don't see why it wouldn't be able to (unless it uses some handy dandy new 8.4-only catalog). Maybe if I get time (HAHAHA) I'll test that out.. -- Jeff Trout <jeff@jefftrout.com> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/
On Mon, Jan 26, 2009 at 12:27 PM, Jeff <threshar@torgo.978.org> wrote: > I'm quite excited about the feature. I'm still on 8.2 mostly because of the > downtime of the dump & restore. I wrote up some plans a while back on doing > the poor-mans parallel restore, but I haven't had the time to actually do > it. We use slony to provide read-only copies of the database for reporting purposes and any other queries that it's OK to serve data that may be slightly stale. We also have used it to upgrade our 15GB database with only a few minutes of downtime instead of the minimum 2 hours it takes to do a dump/restore. Slony works with any version of Postgres going back to versions before 8.1 (anyone still use Postgres that old?). If you get all your ducks in a row, you can get upgraded with only seconds of downtime, though realistically it takes a bit longer. http://slony.info/documentation/versionupgrade.html -Dave
Jeff wrote: > If you use PITR + rsync you can create a binary snapshot of the db, so > restore time is simply how long it takes to untar / whatever it into > place. Our backup script basically does: > > archive backup directory > pg_start_backup > rsync > pg_stop_backup > > voila. I have 2 full copies of the db. Note that this does NOT protect you against a Pg bug, a filesystem issue, or a RAID / disk media issue that results in corruption or damage to the database that isn't immediately detected. I personally wouldn't want to rely on PITR alone. I take periodic SQL dumps as well, using PITR more for disaster recovery in case of software/user error, so I can roll back to when the error occurred rather than the potentially much earlier time of the last backup. A dump is much more likely to trip over a problem, so it's good to run periodically just to make sure the entire DB is readable. It's also more likely to be restorable if something goes horribly wrong. Personally I might be OK with WAL-archiving based backups if I was using a warm-standby server in continuous recovery mode that'd notify me if anything went wrong with the restore. I'm much more comfortable having an SQL dump around, though. > You could even expand it a bit > and after the rsync & friends have it fire up the instance and run > pg_dump against it for a pg_restore compatible dump "just in case". Yep, that's what I do. > It takes a long time to restore a 300GB db, even if you cheat and > parallelify some of it. 8.4 may get a pg_restore that can load in > parallel - which will help somewhat. It's a great pity that pg_dump can't dump in parallel, though. It makes sense given that Pg has no mechanism for one backend to "join" anothers' existing transaction, and without that you'd risk inconsistent snapshots, but it's still a bit of a pity. Is it actually that hard to let one backend join another's (strictly read-only) transaction? The first backend will be ensuring that tuples in the xmin/xmax range required by the transaction aren't VACUUMed away etc. -- Craig Ringer
On Mon, 26 Jan 2009, Matthew Wakeling wrote: > On Sun, 25 Jan 2009, Scott Marlowe wrote: >> RAID-10 is almost always the right choice. > > Agreed. Unless you don't care about the data and need the space, where RAID 0 > might be useful, or if you really don't need the space, where RAID 1 might be > okay. If your controller supports it. if you can reproduce your data at will (would not mind loosing it) you can do a single disk or raid 0 (and you can disable fsync for even more performance) for this you need N disks (where N is the number needed to hold your data) if you cannot reproduce your data at will (or it takes too long) if you need the capacity of a single disk do raid 1 if you need the capacity of a small number of disks do raid 10 (raid 1 combined with raid 0 to stripe redundant copies of data across multiple disks) if you need the capacity of a large number of disks you need to seriously think about your performance needs. the fact that raid 10 can loose data if the wrong 2 disks fail and requires buying 2 disks for every disk worth of capacity that you need are both factors. (note that some raid 10 implimentations let you have more than 2 copies of your data, but your disk requirements go up # copies * capacity) for these you need N*M disks (where N is the number needed to hold your data and M-1 is the number of disks you can loose without loosing any data) at some point it may make sense to use raid 6 for some data. It is definantly slower, but you can loose two drives and not loose any data while only needing N+2 drives to further complicate matters, some parts of your database are more sensitive to performance than others. the fsync throughput of the device you have the WAL on will determine the max transactions/sec of your system (and any seeking that this disk needs to do for other purposes will hurt this) two other performance sensitive areas are temporary table space and indexes question, are these more seneitive to random or sequential performance? see the recent discussions and links to performance ratings of different drive types in the thread 'SSD performance' unfortunantly the SSD drive types so overwelm the normal drives that it's hard to see the differences in the graphs between the 15K rpm SCSI/SAS drives and the 7200 rpm SATA drives, but they are there. David Lang
On Mon, 26 Jan 2009, David Rees wrote: > And yes, the more memory you can squeeze into the machine, the better, > though you'll find that after a certain point, price starts going up > steeply. Of course, if you only have a 15GB database, once you reach > 16GB of memory you've pretty much hit the point of diminishing > returns. actually, you need more memory than that. besides the data itself you would want memory for several other things, among them: 1. your OS 2. your indexes 3. you per-request memory allocations (for sorting, etc) this is highly dependant on your workload (type and number of parallel requests) 4. 'dead' tuples in your table (that will be cleared by a vaccum, but haven't been yet) and probably other things as well. I don't know how large a database will fit in 16G of ram, but I suspect it's closer to 8G than 15G. any experts want to throw out a rule-of-thumb here? David Lang
>> 4) Use software raid unless you have the money to buy a raid >> controller, in which case here is the ranking of them >> <list of brand/modells> > > Areca and 3ware/Escalade are the two best controllers for the money > out right now. They tend to take turns being the absolute best as > they release new cards. Newer Arecas (the 1680 series) use an > ethernet port for traps and such, so no need for special software that > might be kernel version dependent. > > Both cost about the same for their top of the line cards. > > Make sure you have battery backed cache. While browsing the net I found a server with a raid controller HP Smart Array P400/512MB BBWC Controller How does one know what this is, if it is any good or so? I guess they just stuck their "HP" label onto some other raid controller? I could write HP but I guess that wouldn't help much. And I could also look through the archives for the mailinglist. When I find the time,I'll do so and try to create a wiki page. The problem with this kind of built-in hardware is that it might suck, and then you can't plug in any other hardware in the box.
david@lang.hm wrote: > On Mon, 26 Jan 2009, David Rees wrote: > >> And yes, the more memory you can squeeze into the machine, the better, >> though you'll find that after a certain point, price starts going up >> steeply. Of course, if you only have a 15GB database, once you reach >> 16GB of memory you've pretty much hit the point of diminishing >> returns. > > actually, you need more memory than that. besides the data itself you > would want memory for several other things, among them: > > 1. your OS > 2. your indexes > 3. you per-request memory allocations (for sorting, etc) > this is highly dependant on your workload (type and number of parallel > requests) > 4. 'dead' tuples in your table (that will be cleared by a vaccum, but > haven't been yet) > > and probably other things as well. > > I don't know how large a database will fit in 16G of ram, but I suspect > it's closer to 8G than 15G. > > any experts want to throw out a rule-of-thumb here? > > David Lang > It depends on what else the server is doing. If you're running the whole LAPP stack on a single box, for example, the PHP interpreter will need space for intermediate data. Apache and the Linux kernel will use less space. If PostgreSQL is the only thing running on the server, though, assuming 64-bit Linux, most of the RAM in a box that large should be in either memory you've deliberately set aside for internal PostgreSQL data structures or the Linux page cache. There are starting to be some tools built that will show you how RAM is allocated, now that recent kernels (2.6.25+) do a better job of accounting for RAM pages. So I would expect the total memory dedicated to the database functionality to be much closer to 15 GB than 8 GB. Given large amounts of RAM and only PostgreSQL running in the server, the interesting trade-offs become a. How little memory can you buy without putting your service level agreements at risk? b. How do you allocate the PostgreSQL-specific memory buffers at the expense of the Linux page cache for optimum performance? -- M. Edward (Ed) Borasky I've never met a happy clam. In fact, most of them were pretty steamed.
M. Edward (Ed) Borasky wrote: > Given large amounts of RAM and only PostgreSQL running in the server, > the interesting trade-offs become > > a. How little memory can you buy without putting your service level > agreements at risk? > > b. How do you allocate the PostgreSQL-specific memory buffers at the > expense of the Linux page cache for optimum performance? c. What do I do with the idle cores? :) (or, how can I exploit them by changing my database design or the PostgreSQL architecture?) -- M. Edward (Ed) Borasky I've never met a happy clam. In fact, most of them were pretty steamed.
On Tue, 27 Jan 2009, M. Edward (Ed) Borasky wrote: > david@lang.hm wrote: >> On Mon, 26 Jan 2009, David Rees wrote: >> >>> And yes, the more memory you can squeeze into the machine, the better, >>> though you'll find that after a certain point, price starts going up >>> steeply. Of course, if you only have a 15GB database, once you reach >>> 16GB of memory you've pretty much hit the point of diminishing >>> returns. >> >> actually, you need more memory than that. besides the data itself you >> would want memory for several other things, among them: >> >> 1. your OS >> 2. your indexes >> 3. you per-request memory allocations (for sorting, etc) >> this is highly dependant on your workload (type and number of parallel >> requests) >> 4. 'dead' tuples in your table (that will be cleared by a vaccum, but >> haven't been yet) >> >> and probably other things as well. >> >> I don't know how large a database will fit in 16G of ram, but I suspect >> it's closer to 8G than 15G. >> >> any experts want to throw out a rule-of-thumb here? >> > > There are starting to be some tools built that will show you how RAM is > allocated, now that recent kernels (2.6.25+) do a better job of > accounting for RAM pages. So I would expect the total memory dedicated > to the database functionality to be much closer to 15 GB than 8 GB. that's not quite the opposite of the statement that I was trying to make. assuming that you are not running anything else on the system, how much data can you put on the system and run entirely out of ram. the database has it's overhead (sort buffers, indexes, per-request buffers, 'dead tuples', etc) that mean that if you have a database that an uncompressed dump takes 8G, you need substantially more than 8G of ram to avoid using the disks (other than to store changes) how much more is the question. I know it is going to vary from installation to installation, but is there any guidelines that people can start with? David Lang
david@lang.hm wrote: > that's not quite the opposite of the statement that I was trying to make. > > assuming that you are not running anything else on the system, how much > data can you put on the system and run entirely out of ram. > > the database has it's overhead (sort buffers, indexes, per-request > buffers, 'dead tuples', etc) that mean that if you have a database that > an uncompressed dump takes 8G, you need substantially more than 8G of > ram to avoid using the disks (other than to store changes) > > how much more is the question. I know it is going to vary from > installation to installation, but is there any guidelines that people > can start with? I'm not sure there are any rules of thumb / guidelines for that. My experience has been that doing no disk I/O except writing logs to disk, creating and updating rows is an unrealistic expectation, even for "small" databases. The cost is prohibitive, for one thing. And for capacity planning, what's probably more important is whether the service level agreements are being met, not whether you're meeting them purely in RAM or by re-reading data from disk sometimes. I think it's "easy", however, to solve the inverse problem. Borrow a huge-memory server from your vendor, put your small database up on it, run benchmarks and gradually reduce the amount of memory available until the performance becomes unacceptable. The tools exist to measure memory allocations while the benchmarks are running. If you get enough data points (about five for the simplest models) you can build a model that you could then "invert" to go the other way. -- take a database size and figure out how much more RAM was needed to meet the SLAs. You don't necessarily have to reboot to reduce available memory -- there are ways you can tie up memory without consuming processor or disk time to do so. But you would need to "poison" the caches between runs, and restart PostgreSQL if you're modifying its memory allocations. -- M. Edward (Ed) Borasky I've never met a happy clam. In fact, most of them were pretty steamed.
On Tue, Jan 27, 2009 at 1:20 AM, A B <gentosaker@gmail.com> wrote: > While browsing the net I found a server with a raid controller > HP Smart Array P400/512MB BBWC Controller > How does one know what this is, if it is any good or so? I guess they > just stuck their "HP" label onto some other raid controller? > I could write HP but I guess that wouldn't help much. And I could also > look through the archives for the mailinglist. When I find the > time,I'll do so and try to create a wiki page. The only HP controller I've seen get decent reviews here has been the P800. The P400 has gotten pretty poor reviews. I imagine it's a fine controller for a workgroup level file server. > The problem with this kind of built-in hardware is that it might suck, > and then you can't plug in any other hardware in the box. Exactly. Which is why I prefer high quality white box servers from smaller shops.
On Tue, Jan 27, 2009 at 7:03 AM, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote: > M. Edward (Ed) Borasky wrote: >> Given large amounts of RAM and only PostgreSQL running in the server, >> the interesting trade-offs become >> >> a. How little memory can you buy without putting your service level >> agreements at risk? >> >> b. How do you allocate the PostgreSQL-specific memory buffers at the >> expense of the Linux page cache for optimum performance? > > c. What do I do with the idle cores? :) > > (or, how can I exploit them by changing my database design or the > PostgreSQL architecture?) You run as many queries as you have cores at the same time? If you've only ever got 1 or 2 queries running at the same time, don't buy so many extra cores.