Thread: Possible Redundancy/Performance Solution
Right now, we have a few servers that host our databases. None of them are redundant. Each hosts databases for one or more applications. Things work reasonably well but I'm worried about the availability of some of the sites. Our hardware is 3-4 years old at this point and I'm not naive to the possibility of drives, memory, motherboards or whatever failing. I'm toying with the idea of adding a little redundancy and maybe some performance to our setup. First, I'd replace are sata hard drives with a scsi controller and two scsi hard drives that run raid 0 (probably running the OS and logs on the original sata drive). Then I'd run the previous two databases on one cluster of two servers with pgpool in front (using the redundancy feature of pgpool). Our applications are mostly read intensive. I don't think that having two databases on one machine, where previously we had just one, would add too much of an impact, especially if we use the load balance feature of pgpool as well as the redundancy feature. Can anyone comment on any gotchas or issues we might encounter? Do you think this strategy has possibility to accomplish what I'm originally setting out to do? TIA -Dennis
On Tue, 6 May 2008, Dennis Muhlestein wrote: > First, I'd replace are sata hard drives with a scsi controller and two > scsi hard drives that run raid 0 (probably running the OS and logs on > the original sata drive). RAID0 on two disks makes a disk failure that will wipe out the database twice as likely. If you goal is better reliability, you want some sort of RAID1, which you can do with two disks. That should increase read throughput a bit (not quite double though) while keeping write throughput about the same. If you added four disks, then you could do a RAID1+0 combination which should substantially outperform your existing setup in every respect while also being more resiliant to drive failure. > Our applications are mostly read intensive. I don't think that having two > databases on one machine, where previously we had just one, would add too > much of an impact, especially if we use the load balance feature of pgpool as > well as the redundancy feature. A lot depends on how much RAM you've got and whether it's enough to keep the cache hit rate fairly high here. A reasonable thing to consider here is doing a round of standard performance tuning on the servers to make sure they're operating efficient before increasing their load. > Can anyone comment on any gotchas or issues we might encounter? Getting writes to replicate to multiple instances of the database usefully is where all the really nasty gotchas are in this area. Starting with that part and working your way back toward the front-end pooling from there should crash you into the hard parts early in the process. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Tue, 6 May 2008, Dennis Muhlestein wrote: > > > RAID0 on two disks makes a disk failure that will wipe out the database > twice as likely. If you goal is better reliability, you want some sort > of RAID1, which you can do with two disks. That should increase read > throughput a bit (not quite double though) while keeping write > throughput about the same. I was planning on pgpool being the cushion between the raid0 failure probability and my need for redundancy. This way, I get protection against not only disks, but cpu, memory, network cards,motherboards etc. Is this not a reasonable approach? > > If you added four disks, then you could do a RAID1+0 combination which > should substantially outperform your existing setup in every respect > while also being more resiliant to drive failure. > >> Our applications are mostly read intensive. I don't think that having >> two databases on one machine, where previously we had just one, would >> add too much of an impact, especially if we use the load balance >> feature of pgpool as well as the redundancy feature. > > A lot depends on how much RAM you've got and whether it's enough to keep > the cache hit rate fairly high here. A reasonable thing to consider > here is doing a round of standard performance tuning on the servers to > make sure they're operating efficient before increasing their load. > >> Can anyone comment on any gotchas or issues we might encounter? > > Getting writes to replicate to multiple instances of the database > usefully is where all the really nasty gotchas are in this area. > Starting with that part and working your way back toward the front-end > pooling from there should crash you into the hard parts early in the > process. Thanks for the tips! Dennis
On Tue, 6 May 2008, Dennis Muhlestein wrote: > I was planning on pgpool being the cushion between the raid0 failure > probability and my need for redundancy. This way, I get protection against > not only disks, but cpu, memory, network cards,motherboards etc. Is this > not a reasonable approach? Since disks are by far the most likely thing to fail, I think it would be bad planning to switch to a design that doubles the chance of a disk failure taking out the server just because you're adding some server-level redundancy. Anybody who's been in this business for a while will tell you that seemingly improbable double failures happen, and if were you'd I want a plan that survived a) a single disk failure on the primary and b) a single disk failure on the secondary at the same time. Let me strengthen that--I don't feel comfortable unless I'm able to survive a single disk failure on the primary and complete loss of the secondary (say by power supply failure), because a double failure that starts that way is a lot more likely than you might think. Especially with how awful hard drives are nowadays. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Tue, 6 May 2008, Dennis Muhlestein wrote: > > Since disks are by far the most likely thing to fail, I think it would > be bad planning to switch to a design that doubles the chance of a disk > failure taking out the server just because you're adding some > server-level redundancy. Anybody who's been in this business for a > while will tell you that seemingly improbable double failures happen, > and if were you'd I want a plan that survived a) a single disk failure > on the primary and b) a single disk failure on the secondary at the same > time. > > Let me strengthen that--I don't feel comfortable unless I'm able to > survive a single disk failure on the primary and complete loss of the > secondary (say by power supply failure), because a double failure that > starts that way is a lot more likely than you might think. Especially > with how awful hard drives are nowadays. Those are good points. So you'd go ahead and add the pgpool in front (or another redundancy approach, but then use raid1,5 or perhaps 10 on each server? -Dennis
On Tue, May 6, 2008 at 3:39 PM, Dennis Muhlestein <djmuhlestein@gmail.com> wrote: > Those are good points. So you'd go ahead and add the pgpool in front (or > another redundancy approach, but then use raid1,5 or perhaps 10 on each > server? That's what I'd do. specificall RAID10 for small to medium drive sets used for transactional stuff, and RAID6 for very large reporting databases that are mostly read.
On Tue, 6 May 2008, Dennis Muhlestein wrote: > Those are good points. So you'd go ahead and add the pgpool in front (or > another redundancy approach, but then use raid1,5 or perhaps 10 on each > server? Right. I don't advise using the fact that you've got some sort of replication going as an excuse to reduce the reliability of individual systems, particularly in the area of disks (unless you're really creating a much larger number of replicas than 2). RAID5 can be problematic compared to other RAID setups when you are doing write-heavy scenarios of small blocks, and it should be avoided for database use. You can find stories on this subject in the archives here and some of the papers at http://www.baarf.com/ go over why; "Is RAID 5 Really a Bargain?" is the one I like best. If you were thinking about 4 or more disks, there's a number of ways to distribute those: 1) RAID1+0 to make one big volume 2) RAID1 for OS/apps/etc, RAID1 for database 3) RAID1 for OS+xlog, RAID1 for database 4) RAID1 for OS+popular tables, RAID1 for rest of database Exactly which of these splits is best depends on your application and the tradeoffs important to you, but any of these should improve performance and reliability over what you're doing now. I personally tend to create two separate distinct volumes rather than using any striping here, create a tablespace or three right from the start, and then manage the underlying mapping to disk with symbolic links so I can shift the allocation around. That does require you have a steady hand and good nerves for when you screw up, so I wouldn't recommend that to everyone. As you get more disks it gets less practical to handle things this way, and it becomes increasingly sensible to just make one big array out of them and stopping worrying about it. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> > 1) RAID1+0 to make one big volume > 2) RAID1 for OS/apps/etc, RAID1 for database > 3) RAID1 for OS+xlog, RAID1 for database > 4) RAID1 for OS+popular tables, RAID1 for rest of database Lots of good info, thanks for all the replies. It seems to me then, that the speed increase you'd get from raid0 is not worth the downtime risk, even when you have multiple servers. I'll start pricing things out and see what options we have. Thanks again, Dennis