Re: RAID controllers for Postgresql on large setups - Mailing list pgsql-performance

From Greg Smith
Subject Re: RAID controllers for Postgresql on large setups
Date
Msg-id Pine.GSO.4.64.0805130107100.19059@westnet.com
Whole thread Raw
In response to RAID controllers for Postgresql on large setups  (Francisco Reyes <lists@stringsutils.com>)
List pgsql-performance
On Mon, 12 May 2008, Francisco Reyes wrote:

> We are going to redo one machine to compare  RAID 10 vs RAID 50. Mostly to
> see if the perfomance is close, the space gain may be usefull.

Good luck with that, you'll need it.

> Will it pay to go to a controller with higher memory for existing
> machines? The one machine I am about to redo has PCI which seems to
> somewhat limit our options. So far I have found another Adaptec
> controller, 2130SLP, that has 128MB and is also just plain PCI. I need
> to decide whether to buy the BBU for the 2120 or get a new controller
> with more memory and a BBU.

These options are both pretty miserable.  I hear rumors that Adaptec makes
controllers that work OK under Linux , I've never seen one.  A quick
search suggests both the 2120 and 2130SLP are pretty bad.  The suggestions
Joshua already gave look like much better ideas.

Considering your goals here, I personally wouldn't put a penny into a
system that wasn't pretty modern.  I think you've got too aggressive a
target for database size combined with commit rate to be playing with
hardware unless it's new enough to support PCI-Express cards.

> For DBs with bulk updates/inserts is 128MB write cache even enough to
> achieve reasonable rates? (ie at least 5K inserts/sec)

This really depends on how far the data is spread across disk.  You'll
probably be OK on inserts.  Let's make a wild guess and say we fit 80
100-byte records in each 8K database block.  If you have 5000/second,
that's 63 8K blocks/second which works out to 0.5MB/s of writes.  Pretty
easy, unless there's a lot of indexes involved as well.  But an update can
require reading in a 8K block, modifying it, then writing another back out
again.  In the worst case, if your data was sparse enough (which is
frighteningly possible when I hear you mention a billion records) that
every update was hitting a unique block, 5K/sec * 8K = 39MB/second of
reads *and* writes.  That doesn't sound like horribly much, but that's
pretty tough if there's a lot of seeking involved in there.

Now, in reality, many of your small records will be clumped into each
block on these updates and a lot of writes are deferred until checkpoint
time which gives more time to aggregate across shared blocks.  You'll
actually be somewhere in the middle of 0.5 and 78MB/s, which is a pretty
wide range.  It's hard to estimate too closely here without a lot more
information about the database, the application, what version of
PostgreSQL you're using, all sorts of info.

You really should be thinking in terms of benchmarking the current
hardware first to try and draw some estimates you can extrapolate from.
Theoretical comments are a very weak substitute for real-world
benchmarking on the application itself, even if that benchmarking is done
on less capable hardware.  Run some tests, measure your update rate while
also measuring real I/O rate with vmstat, compare that I/O rate to the
disk's sequential/random performance as measured via bonnie++, and now
there's a set of figures that mean something you can estimate based on.

> For large setups (ie 500GB+ per server) does it make sense to try to get a
> controller in a machine or do SANs have better throughput even if at a much
> higher cost?

That's not a large setup nowadays, certainly not large enough that a SAN
would be required to get reasonable performance.  You may need an array
that's external to the server itself, but a SAN includes more than just
that.

There are a lot of arguments on both sides for using SANs; see
http://wiki.postgresql.org/wiki/Direct_Storage_vs._SAN for a summary and
link to recent discussion where this was thrashed about heavily.  If
you're still considering RAID5 and PCI controllers you're still a bit in
denial about the needs of your situation here, but jumping right from
there to assuming you need a SAN is likely overkill.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-performance by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: RAID controllers for Postgresql on large setups
Next
From: "Gauri Kanekar"
Date:
Subject: Installation Steps to migrate to Postgres 8.3.1