Re: RAID controllers for Postgresql on large setups - Mailing list pgsql-performance
From | Greg Smith |
---|---|
Subject | Re: RAID controllers for Postgresql on large setups |
Date | |
Msg-id | Pine.GSO.4.64.0805130107100.19059@westnet.com Whole thread Raw |
In response to | RAID controllers for Postgresql on large setups (Francisco Reyes <lists@stringsutils.com>) |
List | pgsql-performance |
On Mon, 12 May 2008, Francisco Reyes wrote: > We are going to redo one machine to compare RAID 10 vs RAID 50. Mostly to > see if the perfomance is close, the space gain may be usefull. Good luck with that, you'll need it. > Will it pay to go to a controller with higher memory for existing > machines? The one machine I am about to redo has PCI which seems to > somewhat limit our options. So far I have found another Adaptec > controller, 2130SLP, that has 128MB and is also just plain PCI. I need > to decide whether to buy the BBU for the 2120 or get a new controller > with more memory and a BBU. These options are both pretty miserable. I hear rumors that Adaptec makes controllers that work OK under Linux , I've never seen one. A quick search suggests both the 2120 and 2130SLP are pretty bad. The suggestions Joshua already gave look like much better ideas. Considering your goals here, I personally wouldn't put a penny into a system that wasn't pretty modern. I think you've got too aggressive a target for database size combined with commit rate to be playing with hardware unless it's new enough to support PCI-Express cards. > For DBs with bulk updates/inserts is 128MB write cache even enough to > achieve reasonable rates? (ie at least 5K inserts/sec) This really depends on how far the data is spread across disk. You'll probably be OK on inserts. Let's make a wild guess and say we fit 80 100-byte records in each 8K database block. If you have 5000/second, that's 63 8K blocks/second which works out to 0.5MB/s of writes. Pretty easy, unless there's a lot of indexes involved as well. But an update can require reading in a 8K block, modifying it, then writing another back out again. In the worst case, if your data was sparse enough (which is frighteningly possible when I hear you mention a billion records) that every update was hitting a unique block, 5K/sec * 8K = 39MB/second of reads *and* writes. That doesn't sound like horribly much, but that's pretty tough if there's a lot of seeking involved in there. Now, in reality, many of your small records will be clumped into each block on these updates and a lot of writes are deferred until checkpoint time which gives more time to aggregate across shared blocks. You'll actually be somewhere in the middle of 0.5 and 78MB/s, which is a pretty wide range. It's hard to estimate too closely here without a lot more information about the database, the application, what version of PostgreSQL you're using, all sorts of info. You really should be thinking in terms of benchmarking the current hardware first to try and draw some estimates you can extrapolate from. Theoretical comments are a very weak substitute for real-world benchmarking on the application itself, even if that benchmarking is done on less capable hardware. Run some tests, measure your update rate while also measuring real I/O rate with vmstat, compare that I/O rate to the disk's sequential/random performance as measured via bonnie++, and now there's a set of figures that mean something you can estimate based on. > For large setups (ie 500GB+ per server) does it make sense to try to get a > controller in a machine or do SANs have better throughput even if at a much > higher cost? That's not a large setup nowadays, certainly not large enough that a SAN would be required to get reasonable performance. You may need an array that's external to the server itself, but a SAN includes more than just that. There are a lot of arguments on both sides for using SANs; see http://wiki.postgresql.org/wiki/Direct_Storage_vs._SAN for a summary and link to recent discussion where this was thrashed about heavily. If you're still considering RAID5 and PCI controllers you're still a bit in denial about the needs of your situation here, but jumping right from there to assuming you need a SAN is likely overkill. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
pgsql-performance by date: