Re: Postgresql and Software RAID/LVM - Mailing list pgsql-performance
From | John A Meinel |
---|---|
Subject | Re: Postgresql and Software RAID/LVM |
Date | |
Msg-id | 42A52465.2000302@arbash-meinel.com Whole thread Raw |
In response to | Re: Postgresql and Software RAID/LVM (Marty Scholes <marty@outputservices.com>) |
List | pgsql-performance |
Marty Scholes wrote: >> Has anyone ran Postgres with software RAID or LVM on a production box? >> What have been your experience? > > Yes, we have run for a couple years Pg with software LVM (mirroring) > against two hardware RAID5 arrays. We host a production Sun box that > runs 24/7. > > My experience: > * Software RAID (other than mirroring) is a disaster waiting to happen. > If the metadata for the RAID set gives out for any reason (CMOS > scrambles, card dies, power spike, etc.) then you are hosed beyond > belief. In most cases it is almost impossible to recover. With > mirroring, however, you can always boot and operate on a single mirror, > pretending that no LVM/RAID is underway. In other words, each mirror is > a fully functional copy of the data which will operate your server. Isn't this actually more of a problem for the meta-data to give out in a hardware situation? I mean, if the card you are using dies, you can't just get another one. With software raid, because the meta-data is on the drives, you can pull it out of that machine, and put it into any machine that has a controller which can read the drives, and a similar kernel, and you are back up and running. > > * Hardware RAID5 is a terrific way to boost performance via write > caching and spreading I/O across multiple spindles. Each of our > external arrays operates 14 drives (12 data, 1 parity and 1 hot spare). > While RAID5 protects against single spindle failure, it will not hedge > against multiple failures in a short time period, SCSI contoller > failure, SCSI cable problems or even wholesale failure of the RAID > controller. All of these things happen in a 24/7 operation. Using > software RAID1 against the hardware RAID5 arrays hedges against any > single failure. No, it hedges against *more* than one failure. But you can also do a RAID1 over a RAID5 in software. But if you are honestly willing to create a full RAID1, just create a RAID1 over RAID0. The performance is much better. And since you have a full RAID1, as long as both drives of a pairing don't give out, you can lose half of your drives. If you want the space, but you feel that RAID5 isn't redundant enough, go to RAID6, which uses 2 parity locations, each with a different method of storing parity, so not only is it more redundant, you have a better chance of finding problems. > > * Software mirroring gives you tremendous ability to change the system > while it is running, by taking offline the mirror you wish to change and > then synchronizing it after the change. > That certainly is a nice ability. But remember that LVM also has the idea of "snapshot"ing a running system. I don't know the exact details, just that there is a way to have some processes see the filesystem as it existed at an exact point in time. Which is also a great way to handle backups. > On a fully operational production server, we have: > * restriped the RAID5 array > * replaced all RAID5 media with higher capacity drives > * upgraded RAID5 controller > * moved all data from an old RAID5 array to a newer one > * replaced host SCSI controller > * uncabled and physically moved storage to a different part of data center > > Again, all of this has taken place (over the years) while our machine > was fully operational. > So you are saying that you were able to replace the RAID controller without turning off the machine? I realize there does exist hot-swappable PCI cards, but I think you are overstating what you mean by "fully operational". For instance, it's not like you can access your data while it is being physically moved. I do think you had some nice hardware. But I know you can do all of this in software as well. It is usually a price/performance tradeoff. You spend quite a bit to get a hardware RAID card that can keep up with a modern CPU. I know we have an FC raid box at work which has a full 512MB of cache on it, but it wasn't that much cheaper than buying a dedicated server. John =:->
Attachment
pgsql-performance by date: