Thread: Arguments Pro/Contra Software Raid
Hi, I've just had some discussion with colleagues regarding the usage of hardware or software raid 1/10 for our linux based database servers. I myself can't see much reason to spend $500 on high end controller cards for a simple Raid 1. Any arguments pro or contra would be desirable. From my experience and what I've read here: + Hardware Raids might be a bit easier to manage, if you never spend a few hours to learn Software Raid Tools. + There are situations in which Software Raids are faster, as CPU power has advanced dramatically in the last years and even high end controller cards cannot keep up with that. + Using SATA drives is always a bit of risk, as some drives are lying about whether they are caching or not. + Using hardware controllers, the array becomes locked to a particular vendor. You can't switch controller vendors as the array meta information is stored proprietary. In case the Raid is broken to a level the controller can't recover automatically this might complicate manual recovery by specialists. + Even battery backed controllers can't guarantee that data written to the drives is consistent after a power outage, neither that the drive does not corrupt something during the involuntary shutdown / power irregularities. (This is theoretical as any server will be UPS backed) -- Regards, Hannes Dorbath
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Hannes Dorbath wrote: > Hi, > > I've just had some discussion with colleagues regarding the usage of > hardware or software raid 1/10 for our linux based database servers. > > I myself can't see much reason to spend $500 on high end controller > cards for a simple Raid 1. > > Any arguments pro or contra would be desirable. > One pro and one con off the top of my head. Hotplug. Depending on your platform, SATA may or may not be hotpluggable (I know AHCI mode is the only one promising some kind of a hotplug, which means ICH6+ and Silicon Image controllers last I heard). SCSI isn't hotpluggable without the use of special hotplug backplanes and disks. You lose that in software RAID, which effectively means you need to shut the box down and do maintenance. Hassle. CPU. It's cheap. Much cheaper than your average hardware RAID card. For the 5-10% overhead usually imposed by software RAID, you can throw in a faster CPU and never even notice it. Most cases aren't CPU-bound anyways, or at least, most cases are I/O bound for the better part. This does raise the question of I/O bandwidth your standard SATA or SCSI controller comes with, though. If you're careful about that and handle hotplug sufficiently, you're probably never going to notice you're not running on metal. Kind regards, - -- Grega Bremec gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFEYHRAfu4IwuB3+XoRA9jqAJ9sS3RBJZEurvwUXGKrFMRZfYy9pQCggGHh tLAy/YtHwKvhd3ekVDGFtWE= =vlyC -----END PGP SIGNATURE-----
On May 9, 2006, at 2:16 AM, Hannes Dorbath wrote: > Hi, > > I've just had some discussion with colleagues regarding the usage > of hardware or software raid 1/10 for our linux based database > servers. > > I myself can't see much reason to spend $500 on high end controller > cards for a simple Raid 1. > > Any arguments pro or contra would be desirable. > > From my experience and what I've read here: > > + Hardware Raids might be a bit easier to manage, if you never > spend a few hours to learn Software Raid Tools. > > + There are situations in which Software Raids are faster, as CPU > power has advanced dramatically in the last years and even high end > controller cards cannot keep up with that. > > + Using SATA drives is always a bit of risk, as some drives are > lying about whether they are caching or not. Don't buy those drives. That's unrelated to whether you use hardware or software RAID. > > + Using hardware controllers, the array becomes locked to a > particular vendor. You can't switch controller vendors as the array > meta information is stored proprietary. In case the Raid is broken > to a level the controller can't recover automatically this might > complicate manual recovery by specialists. Yes. Fortunately we're using the RAID for database work, rather than file storage, so we can use all the nice postgresql features for backing up and replicating the data elsewhere, which avoids most of this issue. > > + Even battery backed controllers can't guarantee that data written > to the drives is consistent after a power outage, neither that the > drive does not corrupt something during the involuntary shutdown / > power irregularities. (This is theoretical as any server will be > UPS backed) fsync of WAL log. If you have a battery backed writeback cache then you can get the reliability of fsyncing the WAL for every transaction, and the performance of not needing to hit the disk for every transaction. Also, if you're not doing that you'll need to dedicate a pair of spindles to the WAL log if you want to get good performance, so that there'll be no seeking on the WAL. With a writeback cache you can put the WAL on the same spindles as the database and not lose much, if anything, in the way of performance. If that saves you the cost of two additional spindles, and the space on your drive shelf for them, you've just paid for a reasonably proced RAID controller. Given those advantages... I can't imagine speccing a large system that didn't have a battery-backed write-back cache in it. My dev systems mostly use software RAID, if they use RAID at all. But my production boxes all use SATA RAID (and I tell my customers to use controllers with BB cache, whether it be SCSI or SATA). My usual workloads are write-heavy. If yours are read-heavy that will move the sweet spot around significantly, and I can easily imagine that for a read-heavy load software RAID might be a much better match. Cheers, Steve
On Tue, May 09, 2006 at 12:10:32 +0200, "Jean-Yves F. Barbier" <7ukwn@free.fr> wrote: > Naa, you can find ATA &| SATA ctrlrs for about EUR30 ! But those are the ones that you would generally be better off not using. > Definitely NOT, however if your server doen't have a heavy load, the > software overload can't be noticed (essentially cache managing and > syncing) It is fairly common for database machines to be IO, rather than CPU, bound and so the CPU impact of software raid is low. > Some hardware ctrlrs are able to avoid the loss of a disk if you turn > to have some faulty sectors (by relocating internally them); software > RAID doesn't as sectors *must* be @ the same (linear) addresses. That is not true. Software raid works just fine on drives that have internally remapped sectors.
> > Don't buy those drives. That's unrelated to whether you use hardware > or software RAID. Sorry that is an extremely misleading statement. SATA RAID is perfectly acceptable if you have a hardware raid controller with a battery backup controller. And dollar for dollar, SCSI will NOT be faster nor have the hard drive capacity that you will get with SATA. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
On May 9, 2006, at 8:51 AM, Joshua D. Drake wrote: ("Using SATA drives is always a bit of risk, as some drives are lying about whether they are caching or not.") >> Don't buy those drives. That's unrelated to whether you use hardware >> or software RAID. > > Sorry that is an extremely misleading statement. SATA RAID is > perfectly acceptable if you have a hardware raid controller with a > battery backup controller. If the drive says it's hit the disk and it hasn't then the RAID controller will have flushed the data from its cache (or flagged it as correctly written). At that point the only place the data is stored is in the non battery backed cache on the drive itself. If something fails then you'll have lost data. You're not suggesting that a hardware RAID controller will protect you against drives that lie about sync, are you? > > And dollar for dollar, SCSI will NOT be faster nor have the hard > drive capacity that you will get with SATA. Yup. That's why I use SATA RAID for all my databases. Cheers, Steve
On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote: > Sorry that is an extremely misleading statement. SATA RAID is > perfectly acceptable if you have a hardware raid controller with a > battery backup controller. > > And dollar for dollar, SCSI will NOT be faster nor have the hard > drive capacity that you will get with SATA. Does this hold true still under heavy concurrent-write loads? I'm preparing yet another big DB server and if SATA is a better option, I'm all (elephant) ears.
Vivek Khera <vivek@khera.org> writes: > On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote: > >> And dollar for dollar, SCSI will NOT be faster nor have the hard >> drive capacity that you will get with SATA. > > Does this hold true still under heavy concurrent-write loads? I'm > preparing yet another big DB server and if SATA is a better option, > I'm all (elephant) ears. Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive. -Doug
Vivek Khera wrote: > > On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote: > >> Sorry that is an extremely misleading statement. SATA RAID is >> perfectly acceptable if you have a hardware raid controller with a >> battery backup controller. >> >> And dollar for dollar, SCSI will NOT be faster nor have the hard drive >> capacity that you will get with SATA. > > Does this hold true still under heavy concurrent-write loads? I'm > preparing yet another big DB server and if SATA is a better option, I'm > all (elephant) ears. I didn't say better :). If you can afford, SCSI is the way to go. However SATA with a good controller (I am fond of the LSI 150 series) can provide some great performance. I have not used, but have heard good things about Areca as well. Oh, and make sure they are SATA-II drives. Sincerely, Joshua D. Drake > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
> You're not suggesting that a hardware RAID controller will protect > you against drives that lie about sync, are you? Of course not, but which drives lie about sync that are SATA? Or more specifically SATA-II? Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
On May 9, 2006, at 11:26 AM, Joshua D. Drake wrote: > >> You're not suggesting that a hardware RAID controller will protect >> you against drives that lie about sync, are you? > > Of course not, but which drives lie about sync that are SATA? Or > more specifically SATA-II? SATA-II, none that I'm aware of, but there's a long history of dodgy behaviour designed to pump up benchmark results down in the consumer drive space, and low end consumer space is where a lot of SATA drives are. I wouldn't be surprised to see that beahviour there still. I was responding to the original posters assertion that drives lying about sync were a reason not to buy SATA drives, by telling him not to buy drives that lie about sync. You seem to have read this as "don't buy SATA drives", which is not what I said and not what I meant. Cheers, Steve
Douglas McNaught wrote: > Vivek Khera <vivek@khera.org> writes: > >> On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote: >> >>> And dollar for dollar, SCSI will NOT be faster nor have the hard >>> drive capacity that you will get with SATA. >> Does this hold true still under heavy concurrent-write loads? I'm >> preparing yet another big DB server and if SATA is a better option, >> I'm all (elephant) ears. > > Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive. Best I have seen is 10k but if I can put 4x the number of drives in the array at the same cost... I don't need 15k. Joshua D. Drake > > -Doug > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
On Tue, 2006-05-09 at 12:52, Steve Atkins wrote: > On May 9, 2006, at 8:51 AM, Joshua D. Drake wrote: > > ("Using SATA drives is always a bit of risk, as some drives are lying > about whether they are caching or not.") > > >> Don't buy those drives. That's unrelated to whether you use hardware > >> or software RAID. > > > > Sorry that is an extremely misleading statement. SATA RAID is > > perfectly acceptable if you have a hardware raid controller with a > > battery backup controller. > > If the drive says it's hit the disk and it hasn't then the RAID > controller > will have flushed the data from its cache (or flagged it as correctly > written). At that point the only place the data is stored is in the non > battery backed cache on the drive itself. If something fails then you'll > have lost data. > > You're not suggesting that a hardware RAID controller will protect > you against drives that lie about sync, are you? Actually, in the case of the Escalades at least, the answer is yes. Last year (maybe a bit more) someone was testing an IDE escalade controller with drives that were known to lie, and it passed the power plug pull test repeatedly. Apparently, the escalades tell the drives to turn off their cache. While most all IDEs and a fair number of SATA drives lie about cache fsyncing, they all seem to turn off the cache when you ask. And, since a hardware RAID controller with bbu cache has its own cache, it's not like it really needs the one on the drives anyway.
Joshua D. Drake wrote: > Vivek Khera wrote: > > > > On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote: > > > >> Sorry that is an extremely misleading statement. SATA RAID is > >> perfectly acceptable if you have a hardware raid controller with a > >> battery backup controller. > >> > >> And dollar for dollar, SCSI will NOT be faster nor have the hard drive > >> capacity that you will get with SATA. > > > > Does this hold true still under heavy concurrent-write loads? I'm > > preparing yet another big DB server and if SATA is a better option, I'm > > all (elephant) ears. > > I didn't say better :). If you can afford, SCSI is the way to go. > However SATA with a good controller (I am fond of the LSI 150 series) > can provide some great performance. Basically, you can get away with cheaper hardware, but it usually doesn't have the reliability/performance of more expensive options. You want an in-depth comparison of how a server disk drive is internally better than a desktop drive: http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Scott Marlowe wrote: > Actually, in the case of the Escalades at least, the answer is yes. > Last year (maybe a bit more) someone was testing an IDE escalade > controller with drives that were known to lie, and it passed the power > plug pull test repeatedly. Apparently, the escalades tell the drives to > turn off their cache. While most all IDEs and a fair number of SATA > drives lie about cache fsyncing, they all seem to turn off the cache > when you ask. > > And, since a hardware RAID controller with bbu cache has its own cache, > it's not like it really needs the one on the drives anyway. You do if the controller thinks the data is already on the drives and removes it from its cache. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
William Yu wrote: > We upgraded our disk system for our main data processing server earlier > this year. After pricing out all the components, basically we had the > choice of: > > LSI MegaRaid 320-2 w/ 1GB RAM+BBU + 8 15K 150GB SCSI > > or > > Areca 1124 w/ 1GB RAM+BBU + 24 7200RPM 250GB SATA My mistake -- I keep doing calculations and they don't add up. So I looked again on pricewatch and it turns out the actual comparison was for 4 SCSI drives, not 8! ($600 for a 15K 145GB versus $90 for a 7200 250GB.) No wonder our decision seemed to much more decisive back then.
Hi Hannes, Hannes Dorbath a écrit : > Hi, > > I've just had some discussion with colleagues regarding the usage of > hardware or software raid 1/10 for our linux based database servers. > > I myself can't see much reason to spend $500 on high end controller > cards for a simple Raid 1. Naa, you can find ATA &| SATA ctrlrs for about EUR30 ! > Any arguments pro or contra would be desirable. > > From my experience and what I've read here: > > + Hardware Raids might be a bit easier to manage, if you never spend a > few hours to learn Software Raid Tools. I'd the same (mostly as you still have to punch a command line for most of the controlers) > + There are situations in which Software Raids are faster, as CPU power > has advanced dramatically in the last years and even high end controller > cards cannot keep up with that. Definitely NOT, however if your server doen't have a heavy load, the software overload can't be noticed (essentially cache managing and syncing) For bi-core CPUs, it might be true > + Using SATA drives is always a bit of risk, as some drives are lying > about whether they are caching or not. ?? Do you intend to use your server without a UPS ?? > + Using hardware controllers, the array becomes locked to a particular > vendor. You can't switch controller vendors as the array meta > information is stored proprietary. In case the Raid is broken to a level > the controller can't recover automatically this might complicate manual > recovery by specialists. ?? Do you intend not to make backups ?? > + Even battery backed controllers can't guarantee that data written to > the drives is consistent after a power outage, neither that the drive > does not corrupt something during the involuntary shutdown / power > irregularities. (This is theoretical as any server will be UPS backed) RAID's "laws": 1- RAID prevents you from loosing data on healthy disks, not from faulty disks, 1b- So format and reformat your RAID disks (whatever SCSI, ATA, SATA) several times, with destructive tests (see "-c -c" option from the mke2fs man) - It will ensure that disks are safe, and also make a kind of burn test (might turn to... days of formating!), 2- RAID doesn't prevent you from power suply brokeage or electricity breakdown, so use a (LARGE) UPS, 2b- LARGE UPS because HDs are the components that have the higher power consomption (a 700VA UPS gives me about 10-12 minutes on a machine with a XP2200+, 1GB RAM and a 40GB HD, however this fall to...... less than 25 secondes with seven HDs ! all ATA), 2c- Use server box with redudancy power supplies, 3- As for any sensitive data, make regular backups or you'll be as sitting duck. Some hardware ctrlrs are able to avoid the loss of a disk if you turn to have some faulty sectors (by relocating internally them); software RAID doesn't as sectors *must* be @ the same (linear) addresses. BUT a hardware controler is about EUR2000 and a (ATA/SATA) 500GB HD is ~ EUR350. That means you have to consider: * The server disponibility (time to change a power supply if no redudancies, time to exchange a not hotswap HD... In fact, how much down time you can "afford"), * The volume of the data (from which depends the size of the backup device), * The backup device you'll use (tape or other HDs), * The load of the server (and the number of simultaneous users => Soft|Hard, ATA/SATA|SCSI...), * The money you can spend in such a server * And most important, the color of your boss' tie the day you'll take the decision. Hope it will help you Jean-Yves
On May 9, 2006, at 11:26 AM, Joshua D. Drake wrote: > Of course not, but which drives lie about sync that are SATA? Or > more specifically SATA-II? I don't know the answer to this question, but have you seen this tool? http://brad.livejournal.com/2116715.html It attempts to experimentally determine if, with your operating system version, controller, and hard disk, fsync() does as claimed. Of course, experimentation can't prove the system is correct, but it can sometimes prove the system is broken. I say it's worth running on any new model of disk, any new controller, or after the Linux kernel people rewrite everything (i.e. on every point release). I have to admit to hypocrisy, though...I'm running with systems that other people ordered and installed, I doubt they were this thorough, and I don't have identical hardware to run tests on. So no real way to do this. Regards, Scott -- Scott Lamb <http://www.slamb.org/>
Douglas McNaught <doug@mcnaught.org> writes: > Vivek Khera <vivek@khera.org> writes: > > > On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote: > > > >> And dollar for dollar, SCSI will NOT be faster nor have the hard > >> drive capacity that you will get with SATA. > > > > Does this hold true still under heavy concurrent-write loads? I'm > > preparing yet another big DB server and if SATA is a better option, > > I'm all (elephant) ears. > > Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive. Well, dollar for dollar you would get the best performance from slower drives anyways since it would give you more spindles. 15kRPM drives are *expensive*. -- greg
Steve Atkins <steve@blighty.com> writes: > On May 9, 2006, at 2:16 AM, Hannes Dorbath wrote: > > > Hi, > > > > I've just had some discussion with colleagues regarding the usage of > > hardware or software raid 1/10 for our linux based database servers. > > > > I myself can't see much reason to spend $500 on high end controller cards > > for a simple Raid 1. > > > > Any arguments pro or contra would be desirable. Really most of what's said about software raid vs hardware raid online is just FUD. Unless you're running BIG servers with so many drives that the raid controllers are the only feasible way to connect them up anyways, the actual performance difference will likely be negligible. The only two things that actually make me pause about software RAID in heavy production use are: 1) Battery backed cache. That's a huge win for the WAL drives on Postgres. 'nuff said. 2) Not all commodity controllers or IDE drivers can handle failing drives gracefully. While the software raid might guarantee that you don't actually lose data, you still might have the machine wedge because of IDE errors on the bad drive. So as far as runtime, instead of added reliability all you've really added is another point of failure. On the data integrity front you'll still be better off. -- Greg
> 2b- LARGE UPS because HDs are the components that have the higher power > consomption (a 700VA UPS gives me about 10-12 minutes on a machine > with a XP2200+, 1GB RAM and a 40GB HD, however this fall to...... > less than 25 secondes with seven HDs ! all ATA), I got my hands on a (free) 1400 VA APC rackmount UPS ; the batteries were dead so I stuck two car batteries in. It can power my computer (Athlon 64, 7 drives) for more than 2 hours... It looks ugly though. I wouldn't put this in a server rack, but for my home PC it's perfect. It has saved my work many times... Harddisks suck in about 15 watts each, but draw large current spikes on seeking, so the VA rating of the UPS is important. I guess in your case, the batteries have enough charge left; but the current capability of the UPS is exceeded. > Some hardware ctrlrs are able to avoid the loss of a disk if you turn > to have some faulty sectors (by relocating internally them); software > RAID doesn't as sectors *must* be @ the same (linear) addresses. Harddisks do transparent remapping now... linux soft raid can rewrite bad sectors with good data and the disk will remap the faulty sector to a good one.
Greg Stark <gsstark@mit.edu> writes: > Douglas McNaught <doug@mcnaught.org> writes: >> Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive. > > Well, dollar for dollar you would get the best performance from slower drives > anyways since it would give you more spindles. 15kRPM drives are *expensive*. Depends on your power, heat and rack space budget too... If you need max performance out of a given rack space (rather than max density), SCSI is still the way to go. I'll definitely agree that SATA is becoming much more of a player in the server storage market, though. -Doug
* Hannes Dorbath: > + Hardware Raids might be a bit easier to manage, if you never spend a > few hours to learn Software Raid Tools. I disagree. RAID management is complicated, and once there is a disk failure, all kinds of oddities can occur which can make it quite a challenge to get back a non-degraded array. With some RAID controllers, monitoring is diffcult because they do not use the system's logging mechanism for reporting. In some cases, it is not possible to monitor the health status of individual disks. > + Using SATA drives is always a bit of risk, as some drives are lying > about whether they are caching or not. You can usually switch off caching. > + Using hardware controllers, the array becomes locked to a particular > vendor. You can't switch controller vendors as the array meta > information is stored proprietary. In case the Raid is broken to a > level the controller can't recover automatically this might complicate > manual recovery by specialists. It's even more difficult these days. 3ware controllers enable drive passwords, so you can't access the drive from other controllers at all (even if you could interpret the on-disk data). > + Even battery backed controllers can't guarantee that data written to > the drives is consistent after a power outage, neither that the drive > does not corrupt something during the involuntary shutdown / power > irregularities. (This is theoretical as any server will be UPS backed) UPS failures are not unheard of. 8-/ Apart from that, you can address a large class of shutdown failures if you replay a log stored in the BBU on the next reboot (partial sector writes come to my mind). It is very difficult to check if the controller does this correctly, though. A few other things to note: You can't achieve significant port density with non-RAID controllers, at least with SATA. You need to buy a RAID controller anyway. You can't quite achieve what a BBU does (even if you've got a small, fast persistent storage device) because there's no host software support for such a configuration.
Hi, Scott & all, Scott Lamb wrote: > I don't know the answer to this question, but have you seen this tool? > > http://brad.livejournal.com/2116715.html We had a simpler tool inhouse, which wrote a file byte-for-byte, and called fsync() after every byte. If the number of fsyncs/min is higher than your rotations per minute value of your disks, they must be lying. It does not find as much liers as the script above, but it is less intrusive (can be ran on every low-io machine without crashing it), and it found some liers in-house (some notebook disks, one external USB/FireWire to IDE case, and an older linux cryptoloop implementations, IIRC). If you're interested, I can dig for the C source... HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Markus Schaber wrote: > Hi, Scott & all, > > Scott Lamb wrote: > > > I don't know the answer to this question, but have you seen this tool? > > > > http://brad.livejournal.com/2116715.html > > We had a simpler tool inhouse, which wrote a file byte-for-byte, and > called fsync() after every byte. > > If the number of fsyncs/min is higher than your rotations per minute > value of your disks, they must be lying. > > It does not find as much liers as the script above, but it is less Why does it find fewer liers? --------------------------------------------------------------------------- > intrusive (can be ran on every low-io machine without crashing it), and > it found some liers in-house (some notebook disks, one external > USB/FireWire to IDE case, and an older linux cryptoloop implementations, > IIRC). > > If you're interested, I can dig for the C source... > > HTH, > Markus > > > > > -- > Markus Schaber | Logical Tracking&Tracing International AG > Dipl. Inf. | Software Development GIS > > Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On May 10, 2006, at 12:41 AM, Greg Stark wrote: > Well, dollar for dollar you would get the best performance from > slower drives > anyways since it would give you more spindles. 15kRPM drives are > *expensive*. Personally, I don't care that much for "dollar for dollar" I just need performance. If it is within a factor of 2 or 3 in price then I'll go for absolute performance over "bang for the buck".
Attachment
Vivek Khera wrote: > > On May 10, 2006, at 12:41 AM, Greg Stark wrote: > > > Well, dollar for dollar you would get the best performance from > > slower drives > > anyways since it would give you more spindles. 15kRPM drives are > > *expensive*. > > Personally, I don't care that much for "dollar for dollar" I just > need performance. If it is within a factor of 2 or 3 in price then > I'll go for absolute performance over "bang for the buck". That is really the issue. You can buy lots of consumer-grade stuff and work just fine if your performance/reliability tolerance is high enough. However, don't fool yourself that consumer and server-grade hardware is internally the same, or has the same testing. I just had a Toshiba laptop drive replaced last week (new, not refurbished), only to have it fail this week. Obviously there isn't sufficient burn-in done by Toshiba, and I don't fault them because it is a consumer laptop --- it fails, they replace it. For servers, the downtime usually can't be tolerated, while consumers usually can tolerate significant downtime. I have always purchased server-grade hardware for my home server, and I think I have had one day of hardware downtime in the past ten years. Consumer hardware just couldn't do that. As one data point, most consumer-grade IDE drives are designed to be run only 8 hours a day. The engineering doesn't anticipate 24-hour operation, and that trade-off passes all the way through the selection of componients for the drive, which generates sigificant cost savings. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, Bruce, Bruce Momjian wrote: >>It does not find as much liers as the script above, but it is less > > Why does it find fewer liers? It won't find liers that have a small "lie-queue-length" so their internal buffers get full so they have to block. After a small burst at start which usually hides in other latencies, they don't get more throughput than spindle turns. It won't find liers that first acknowledge to the host, and then immediately write the block before accepting other commands. This improves latency (which is measured in some benchmarks), but not syncs/write rate. Both of them can be captured by the other script, but not by my tool. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote: > Scott Marlowe wrote: > > Actually, in the case of the Escalades at least, the answer is yes. > > Last year (maybe a bit more) someone was testing an IDE escalade > > controller with drives that were known to lie, and it passed the power > > plug pull test repeatedly. Apparently, the escalades tell the drives to > > turn off their cache. While most all IDEs and a fair number of SATA > > drives lie about cache fsyncing, they all seem to turn off the cache > > when you ask. > > > > And, since a hardware RAID controller with bbu cache has its own cache, > > it's not like it really needs the one on the drives anyway. > > You do if the controller thinks the data is already on the drives and > removes it from its cache. Bruce, re-read what I wrote. The escalades tell the drives to TURN OFF THEIR OWN CACHE.
Scott Marlowe <smarlowe@g2switchworks.com> writes: > On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote: >> You do if the controller thinks the data is already on the drives and >> removes it from its cache. > > Bruce, re-read what I wrote. The escalades tell the drives to TURN OFF > THEIR OWN CACHE. Some ATA drives would lie about that too IIRC. Hopefully they've stopped doing it in the SATA era. -Doug
Hi, Bruce, Markus Schaber wrote: >>>It does not find as much liers as the script above, but it is less >>Why does it find fewer liers? > > It won't find liers that have a small "lie-queue-length" so their > internal buffers get full so they have to block. After a small burst at > start which usually hides in other latencies, they don't get more > throughput than spindle turns. I just reread my mail, and must admit that I would not understand what I wrote above, so I'll explain a little more: My test programs writes byte-for-byte. Let's say our FS/OS has 4k page- and blocksize, that means 4096 writes that all write the same disk blocks. Intelligent liers will see that the the 2nd and all further writes obsolete the former writes who still reside in the internal cache, and drop those former writes from cache, effectively going up to 4k writes/spindle turn. Dumb liers will keep the obsolete writes in the write cache / queue, and so won't be caught by my program. (Note that I have no proof that such disks actually exist, but I have enough experience with hardware that I won't be surprised.) HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
On Wed, 2006-05-10 at 09:51, Douglas McNaught wrote: > Scott Marlowe <smarlowe@g2switchworks.com> writes: > > > On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote: > > >> You do if the controller thinks the data is already on the drives and > >> removes it from its cache. > > > > Bruce, re-read what I wrote. The escalades tell the drives to TURN OFF > > THEIR OWN CACHE. > > Some ATA drives would lie about that too IIRC. Hopefully they've > stopped doing it in the SATA era. Ugh. Now that would make for a particularly awful bit of firmware implementation. I'd think that if I found a SATA drive doing that I'd be likely to strike the manufacturer off of the list for possible future purchases...
On Tue, May 09, 2006 at 08:59:55PM -0400, Bruce Momjian wrote: > Joshua D. Drake wrote: > > Vivek Khera wrote: > > > > > > On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote: > > > > > >> Sorry that is an extremely misleading statement. SATA RAID is > > >> perfectly acceptable if you have a hardware raid controller with a > > >> battery backup controller. > > >> > > >> And dollar for dollar, SCSI will NOT be faster nor have the hard drive > > >> capacity that you will get with SATA. > > > > > > Does this hold true still under heavy concurrent-write loads? I'm > > > preparing yet another big DB server and if SATA is a better option, I'm > > > all (elephant) ears. > > > > I didn't say better :). If you can afford, SCSI is the way to go. > > However SATA with a good controller (I am fond of the LSI 150 series) > > can provide some great performance. > > Basically, you can get away with cheaper hardware, but it usually > doesn't have the reliability/performance of more expensive options. > > You want an in-depth comparison of how a server disk drive is internally > better than a desktop drive: > > http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf BTW, someone (Western Digital?) is now offering SATA drives that carry the same MTBF/warranty/what-not as their SCSI drives. I can't remember if they actually claim that it's the same mechanisms just with a different controller on the drive... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, May 09, 2006 at 12:10:32PM +0200, Jean-Yves F. Barbier wrote: > > I myself can't see much reason to spend $500 on high end controller > > cards for a simple Raid 1. > > Naa, you can find ATA &| SATA ctrlrs for about EUR30 ! And you're likely getting what you paid for: crap. Such a controller is less likely to do things like turn of write caching so that fsync works properly. > > + Hardware Raids might be a bit easier to manage, if you never spend a > > few hours to learn Software Raid Tools. > > I'd the same (mostly as you still have to punch a command line for > most of the controlers) Controllers I've seen have some kind of easy to understand GUI, at least during bootup. When it comes to OS-level tools that's going to vary widely. > > + There are situations in which Software Raids are faster, as CPU power > > has advanced dramatically in the last years and even high end controller > > cards cannot keep up with that. > > Definitely NOT, however if your server doen't have a heavy load, the > software overload can't be noticed (essentially cache managing and > syncing) > > For bi-core CPUs, it might be true Depends. RAID performance depends on a heck of a lot more than just CPU. Software RAID allows you to do things like spread load across multiple controllers, so you can scale a lot higher for less money. Though in this case I doubt that's a consideration, so what's more important is that making sure the controller bus isn't in the way. One thing that means is ensuring that every SATA drive has it's own dedicated controller, since a lot of SATA hardware can't handle multiple commands on the bus at once. > > + Using SATA drives is always a bit of risk, as some drives are lying > > about whether they are caching or not. > > ?? Do you intend to use your server without a UPS ?? Have you never heard of someone tripping over a plug? Or a power supply failing? Or the OS crashing? If fsync is properly obeyed, PostgreSQL will gracefully recover from all of those situations. If it's not, you're at risk of losing the whole database. > > + Using hardware controllers, the array becomes locked to a particular > > vendor. You can't switch controller vendors as the array meta > > information is stored proprietary. In case the Raid is broken to a level > > the controller can't recover automatically this might complicate manual > > recovery by specialists. > > ?? Do you intend not to make backups ?? Even with backups this is still a valid concern, since the backup will be nowhere near as up-to-date as the database was unless you have a pretty low DML rate. > BUT a hardware controler is about EUR2000 and a (ATA/SATA) 500GB HD > is ~ EUR350. Huh? You can get 3ware controllers for about $500, and they're pretty decent. While I'm sure there are controllers for $2k that doesn't mean there's nothing inbetween that and nothing. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>> You want an in-depth comparison of how a server disk drive is internally >> better than a desktop drive: >> >> http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf > > BTW, someone (Western Digital?) is now offering SATA drives that carry > the same MTBF/warranty/what-not as their SCSI drives. I can't remember > if they actually claim that it's the same mechanisms just with a > different controller on the drive... Well western digital and Seagate both carry 5 year warranties. Seagate I believe does on almost all of there products. WD you have to pick the right drive. Joshua D> Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
On Thu, May 11, 2006 at 03:38:31PM -0700, Joshua D. Drake wrote: > > >>You want an in-depth comparison of how a server disk drive is internally > >>better than a desktop drive: > >> > >> http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf > > > >BTW, someone (Western Digital?) is now offering SATA drives that carry > >the same MTBF/warranty/what-not as their SCSI drives. I can't remember > >if they actually claim that it's the same mechanisms just with a > >different controller on the drive... > > Well western digital and Seagate both carry 5 year warranties. Seagate I > believe does on almost all of there products. WD you have to pick the > right drive. I know that someone recently made a big PR push about how you could get 'server reliability' in some of their SATA drives, but maybe now everyone's starting to do it. I suspect the premium you can charge for it offsets the costs, provided that you switch all your production over rather than trying to segregate production lines. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Joshua D. Drake wrote: > > >> You want an in-depth comparison of how a server disk drive is internally > >> better than a desktop drive: > >> > >> http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf > > > > BTW, someone (Western Digital?) is now offering SATA drives that carry > > the same MTBF/warranty/what-not as their SCSI drives. I can't remember > > if they actually claim that it's the same mechanisms just with a > > different controller on the drive... > > Well western digital and Seagate both carry 5 year warranties. Seagate I > believe does on almost all of there products. WD you have to pick the > right drive. That's nice, but it seems similar to my Toshiba laptop drive experience --- it breaks, we replace it. I would rather not have to replace it. :-) Let me mention the only drive that has ever failed without warning was a SCSI Deskstar (deathstar) drive, which was a hybrid because it was a SCSI drive, but made for consumer use. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
>> Well western digital and Seagate both carry 5 year warranties. Seagate I >> believe does on almost all of there products. WD you have to pick the >> right drive. > > That's nice, but it seems similar to my Toshiba laptop drive experience > --- it breaks, we replace it. I would rather not have to replace it. :-) Laptop drives are known to have short lifespans do to heat. I have IDE drives that have been running for four years without any issues but I have good fans blowing over them. Frankly I think if you are running drivess (in a production environment) for more then 3 years your crazy anyway :) > > Let me mention the only drive that has ever failed without warning was a > SCSI Deskstar (deathstar) drive, which was a hybrid because it was a > SCSI drive, but made for consumer use. > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Joshua D. Drake wrote: > > >> Well western digital and Seagate both carry 5 year warranties. Seagate I > >> believe does on almost all of there products. WD you have to pick the > >> right drive. > > > > That's nice, but it seems similar to my Toshiba laptop drive experience > > --- it breaks, we replace it. I would rather not have to replace it. :-) > > Laptop drives are known to have short lifespans do to heat. I have IDE > drives that have been running for four years without any issues but I > have good fans blowing over them. > > Frankly I think if you are running drivess (in a production environment) > for more then 3 years your crazy anyway :) Agreed --- the cost/benefit of keeping a drive >3 years just doesn't make sense. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, May 11, 2006 at 07:20:27PM -0400, Bruce Momjian wrote: > Joshua D. Drake wrote: > > > > >> You want an in-depth comparison of how a server disk drive is internally > > >> better than a desktop drive: > > >> > > >> http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf > > > > > > BTW, someone (Western Digital?) is now offering SATA drives that carry > > > the same MTBF/warranty/what-not as their SCSI drives. I can't remember > > > if they actually claim that it's the same mechanisms just with a > > > different controller on the drive... > > > > Well western digital and Seagate both carry 5 year warranties. Seagate I > > believe does on almost all of there products. WD you have to pick the > > right drive. > > That's nice, but it seems similar to my Toshiba laptop drive experience > --- it breaks, we replace it. I would rather not have to replace it. :-) > > Let me mention the only drive that has ever failed without warning was a > SCSI Deskstar (deathstar) drive, which was a hybrid because it was a > SCSI drive, but made for consumer use. My damn powerbook drive recently failed with very little warning, other than I did notice that disk activity seemed to be getting a bit slower. IIRC it didn't log any errors or anything. Even if it did, if the OS was catching them I'd hope it would pop up a warning or something. But from what I've heard, some drives now-a-days will silently remap dead sectors without telling the OS anything, which is great until you've used up all of the spare sectors and there's nowhere to remap to. :( Hmm... I should figure out how to have OS X email me daily log updates like FreeBSD does... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Thu, May 11, 2006 at 07:20:27PM -0400, Bruce Momjian wrote: > > Joshua D. Drake wrote: > > > > > > >> You want an in-depth comparison of how a server disk drive is internally > > > >> better than a desktop drive: > > > >> > > > >> http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf > > > > > > > > BTW, someone (Western Digital?) is now offering SATA drives that carry > > > > the same MTBF/warranty/what-not as their SCSI drives. I can't remember > > > > if they actually claim that it's the same mechanisms just with a > > > > different controller on the drive... > > > > > > Well western digital and Seagate both carry 5 year warranties. Seagate I > > > believe does on almost all of there products. WD you have to pick the > > > right drive. > > > > That's nice, but it seems similar to my Toshiba laptop drive experience > > --- it breaks, we replace it. I would rather not have to replace it. :-) > > > > Let me mention the only drive that has ever failed without warning was a > > SCSI Deskstar (deathstar) drive, which was a hybrid because it was a > > SCSI drive, but made for consumer use. > > My damn powerbook drive recently failed with very little warning, other > than I did notice that disk activity seemed to be getting a bit slower. > IIRC it didn't log any errors or anything. Even if it did, if the OS was > catching them I'd hope it would pop up a warning or something. But from > what I've heard, some drives now-a-days will silently remap dead sectors > without telling the OS anything, which is great until you've used up all > of the spare sectors and there's nowhere to remap to. :( Yes, I think most IDE drives do silently remap, and most SCSI drives don't. Not sure how much _most_ is. I know my SCSI controller beeps at me when I try to access a bad block. Now, that gets my attention. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> Hmm... I should figure out how to have OS X email me daily log updates > like FreeBSD does... Logwatch. -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
On Thu, May 11, 2006 at 18:41:25 -0500, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > On Thu, May 11, 2006 at 07:20:27PM -0400, Bruce Momjian wrote: > > My damn powerbook drive recently failed with very little warning, other > than I did notice that disk activity seemed to be getting a bit slower. > IIRC it didn't log any errors or anything. Even if it did, if the OS was > catching them I'd hope it would pop up a warning or something. But from > what I've heard, some drives now-a-days will silently remap dead sectors > without telling the OS anything, which is great until you've used up all > of the spare sectors and there's nowhere to remap to. :( You might look into smartmontools. One part of this is a daemon that runs selftests on the disks on a regular basis. You can have warnings mailed to you on various conditions. Drives will fail the self test before they run out of spare sectors. There are other drive characteristics that can be used to tell if drive failure is imminent and give you a chance to replace a drive before it fails.