Thread: RAID controllers for Postgresql on large setups

RAID controllers for Postgresql on large setups

From

Francisco Reyes

Date:

12 May 2008, 23:04:09

Inheritted a number of servers and I am starting to look into the hardware.

So far what I know from a few of the servers
Redhat servers.
15K rpm disks, 12GB to 32GB of RAM.
Adaptec 2120 SCSI controller (64MB of cache).

The servers have mostly have 12 drives in RAID 10.
We are going to redo one machine to compare  RAID 10 vs RAID 50.
Mostly to see if the perfomance is close, the space gain may be usefull.

The usage pattern is mostly large set of transactions ie bulk loads of
millions of rows, queries involving tens of millions of rows. There are
usually only a handfull of connections at once, but I have seen it go up to
10 in the few weeks I have been at the new job. The rows are not very wide.
Mostly 30 to 90 bytes. The few that will be wider will be summary tables
that will be read straight up without joins and indexed on the fields we
will be quering them. Most of the connections will all be doing bulk
reads/updates/writes.

Some of the larger tables have nearly 1 billion rows and most have tens of
millions. Most DBs are under 500GB, since they had split the data as to keep
each machine somewhat evenly balanced compared to the others.

I noticed the machine we are about to redo doesn't have a BBU.

A few questions.
Will it pay to go to a controller with higher memory for existing machines?
The one machine I am about to redo has PCI which seems to
somewhat limit our options. So far I have found another Adaptec controller,
2130SLP, that has 128MB and is also just plain PCI. I need to decide whether
to buy the BBU for the 2120 or get a new controller with more memory and a
BBU. For DBs with bulk updates/inserts is 128MB write cache even enough to
achieve reasonable rates? (ie at least 5K inserts/sec)

A broader question
For large setups (ie 500GB+ per server) does it make sense to try to get a
controller in a machine or do SANs have better throughput even if at a much
higher cost?

For future machines I plan to look into controllers with at least 512MB,
which likely will be PCI-X/PCI-e.. not seen anything with large caches for
PCI. Also the machines in question have SCSI drives, not SAS. I believe the
most recent machine has SAS, but the others may be 15K rpm scsi

Whether a SAN or just an external enclosure is 12disk enough to substain 5K
inserts/updates per second on rows in the 30 to 90bytes territory? At
5K/second inserting/updating 100 Million records would take 5.5 hours. That
is fairly reasonable if we can achieve. Faster would be better, but it
depends on what it would cost to achieve.

Re: RAID controllers for Postgresql on large setups

From

"Joshua D. Drake"

Date:

12 May 2008, 23:11:03

On Mon, 12 May 2008 22:04:03 -0400
Francisco Reyes <lists@stringsutils.com> wrote:

> Inheritted a number of servers and I am starting to look into the
> hardware.
>
> So far what I know from a few of the servers
> Redhat servers.
> 15K rpm disks, 12GB to 32GB of RAM.
> Adaptec 2120 SCSI controller (64MB of cache).
>
> The servers have mostly have 12 drives in RAID 10.
> We are going to redo one machine to compare  RAID 10 vs RAID 50.
> Mostly to see if the perfomance is close, the space gain may be
> usefull.

Most likely you have a scsi onboard as well I am guessing. You
shouldn't bother with the 2120. My tests show it is a horrible
controller for random writes.

Comparing software raid on an LSI onboard for an IBM 345 versus a 2120s
using hardware raid 10, the software raid completely blew the adaptec
away.

Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Attachment

signature.asc

Re: RAID controllers for Postgresql on large setups

From

Francisco Reyes

Date:

13 May 2008, 00:31:52

Joshua D. Drake writes:

> Most likely you have a scsi onboard as well I am guessing.

Will check.

> shouldn't bother with the 2120. My tests show it is a horrible
> controller for random writes.

Thanks for the feedback..

> Comparing software raid on an LSI onboard for an IBM 345 versus a 2120s
> using hardware raid 10, the software raid completely blew the adaptec
> away.

Any PCI controller you have had good experience with?
How any other PCI-X/PCI-e controller that you have had good results?

Re: RAID controllers for Postgresql on large setups

From

Chris Ruprecht

Date:

13 May 2008, 01:10:10

Joshua,

did you try to run the 345 on an IBM ServeRAID 6i?
I have one in mine, but I never actually ran any speed test.
Do you have any benchmarks that I could run and compare?

best regards,
chris
--
chris ruprecht
database grunt and bit pusher extraordinaíre


On May 12, 2008, at 22:11, Joshua D. Drake wrote:

> On Mon, 12 May 2008 22:04:03 -0400
> Francisco Reyes <lists@stringsutils.com> wrote:
>
>> Inheritted a number of servers and I am starting to look into the
>>

[snip]

> Comparing software raid on an LSI onboard for an IBM 345 versus a
> 2120s
> using hardware raid 10, the software raid completely blew the adaptec
> away.

[more snip]

Re: RAID controllers for Postgresql on large setups

From

"Joshua D. Drake"

Date:

13 May 2008, 01:40:39

Chris Ruprecht wrote:
> Joshua,
>
> did you try to run the 345 on an IBM ServeRAID 6i?

No the only controllers I had at the time were the 2120 and the LSI on
board that is limited to RAID 1. I put the drives on the LSI in JBOD and
used Linux software raid.

The key identifier for me was using a single writer over 6 (RAID 10)
drives with the 2120 I could get ~ 16 megs a second. The moment I went
to multiple writers it dropped exponentially.

However with software raid I was able to sustain ~ 16 megs a second over
multiple threads. I stopped testing at 4 threads when I was getting 16
megs per thread :). I was happy at that point.

Joshua D. Drake

Re: RAID controllers for Postgresql on large setups

From

"Joshua D. Drake"

Date:

13 May 2008, 01:42:28

Francisco Reyes wrote:
> Joshua D. Drake writes:
>
>
> Any PCI controller you have had good experience with?

I don't have any PCI test data.

> How any other PCI-X/PCI-e controller that you have had good results?


http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/

If you are digging for used see if you can pick up a 64xx series from
HP. A very nice card that can generally be had for reasonable dollars.


http://cgi.ebay.com/HP-Compaq-SMART-ARRAY-6402-CTRL-128MB-SCSI-273915-B21_W0QQitemZ120259020765QQihZ002QQcategoryZ11182QQssPageNameZWDVWQQrdZ1QQcmdZViewItem

If you want new, definitely go with the P800.

Sincerely,

Joshua D. Drake

Re: RAID controllers for Postgresql on large setups

From

Greg Smith

Date:

13 May 2008, 02:59:37

On Mon, 12 May 2008, Francisco Reyes wrote:

> We are going to redo one machine to compare  RAID 10 vs RAID 50. Mostly to
> see if the perfomance is close, the space gain may be usefull.

Good luck with that, you'll need it.

> Will it pay to go to a controller with higher memory for existing
> machines? The one machine I am about to redo has PCI which seems to
> somewhat limit our options. So far I have found another Adaptec
> controller, 2130SLP, that has 128MB and is also just plain PCI. I need
> to decide whether to buy the BBU for the 2120 or get a new controller
> with more memory and a BBU.

These options are both pretty miserable.  I hear rumors that Adaptec makes
controllers that work OK under Linux , I've never seen one.  A quick
search suggests both the 2120 and 2130SLP are pretty bad.  The suggestions
Joshua already gave look like much better ideas.

Considering your goals here, I personally wouldn't put a penny into a
system that wasn't pretty modern.  I think you've got too aggressive a
target for database size combined with commit rate to be playing with
hardware unless it's new enough to support PCI-Express cards.

> For DBs with bulk updates/inserts is 128MB write cache even enough to
> achieve reasonable rates? (ie at least 5K inserts/sec)

This really depends on how far the data is spread across disk.  You'll
probably be OK on inserts.  Let's make a wild guess and say we fit 80
100-byte records in each 8K database block.  If you have 5000/second,
that's 63 8K blocks/second which works out to 0.5MB/s of writes.  Pretty
easy, unless there's a lot of indexes involved as well.  But an update can
require reading in a 8K block, modifying it, then writing another back out
again.  In the worst case, if your data was sparse enough (which is
frighteningly possible when I hear you mention a billion records) that
every update was hitting a unique block, 5K/sec * 8K = 39MB/second of
reads *and* writes.  That doesn't sound like horribly much, but that's
pretty tough if there's a lot of seeking involved in there.

Now, in reality, many of your small records will be clumped into each
block on these updates and a lot of writes are deferred until checkpoint
time which gives more time to aggregate across shared blocks.  You'll
actually be somewhere in the middle of 0.5 and 78MB/s, which is a pretty
wide range.  It's hard to estimate too closely here without a lot more
information about the database, the application, what version of
PostgreSQL you're using, all sorts of info.

You really should be thinking in terms of benchmarking the current
hardware first to try and draw some estimates you can extrapolate from.
Theoretical comments are a very weak substitute for real-world
benchmarking on the application itself, even if that benchmarking is done
on less capable hardware.  Run some tests, measure your update rate while
also measuring real I/O rate with vmstat, compare that I/O rate to the
disk's sequential/random performance as measured via bonnie++, and now
there's a set of figures that mean something you can estimate based on.

> For large setups (ie 500GB+ per server) does it make sense to try to get a
> controller in a machine or do SANs have better throughput even if at a much
> higher cost?

That's not a large setup nowadays, certainly not large enough that a SAN
would be required to get reasonable performance.  You may need an array
that's external to the server itself, but a SAN includes more than just
that.

There are a lot of arguments on both sides for using SANs; see
http://wiki.postgresql.org/wiki/Direct_Storage_vs._SAN for a summary and
link to recent discussion where this was thrashed about heavily.  If
you're still considering RAID5 and PCI controllers you're still a bit in
denial about the needs of your situation here, but jumping right from
there to assuming you need a SAN is likely overkill.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: RAID controllers for Postgresql on large setups

From

"Scott Marlowe"

Date:

13 May 2008, 05:17:58

On Mon, May 12, 2008 at 8:04 PM, Francisco Reyes <lists@stringsutils.com> wrote:
> Inheritted a number of servers and I am starting to look into the hardware.
>
> So far what I know from a few of the servers
> Redhat servers.
> 15K rpm disks, 12GB to 32GB of RAM.
> Adaptec 2120 SCSI controller (64MB of cache).

Considering the generally poor performance of adaptec RAID
controllers, you'd probably be better off with 12 SATA drives hooked
up to an escalade or Areca card (or cards).  Since you seem to want a
lot of storage, a large array of SATA disks may be a better balance
between performance and economy.

> The servers have mostly have 12 drives in RAID 10.
> We are going to redo one machine to compare  RAID 10 vs RAID 50. Mostly to
> see if the perfomance is close, the space gain may be usefull.

See the remark about SATA drives above.  With 12 750Gig drives, you'd
have 6*750G of storage in RAID-10, or about 4.5 Terabytes of redundant
storage.

> The usage pattern is mostly large set of transactions ie bulk loads of
> millions of rows, queries involving tens of millions of rows.
> A few questions.
> Will it pay to go to a controller with higher memory for existing machines?

Then no matter how big your cache on your controller, it's likely NOT
big enough to ever hope to just swallow the whole set at once.  Bigger
might be better for a lot of things, but for loading, a good
controller is more important.  An increase from 64M to 256M is not
that big in comparison to how big your datasets are likely to be.

> The one machine I am about to redo has PCI which seems to somewhat limit our
> options.

You do know that you can plug a PCI-X card into a PCI slot, right?
(see the second paragraph here:
http://en.wikipedia.org/wiki/PCI-X#Technical_description)  So, you can
get a nice card today, and if needs be, a better server to toss it in
tomorrow.

Where I work we have a nice big machine in production with a very nice
PCI-X card (Not sure which one, my cohort ordered it) and we wanted
out in house testing machine to have the same card, but that machine
is much less powerful.  Same card fit, so we can get some idea about
I/O patterns on the test box before beating on production.

> A few questions.
> Will it pay to go to a controller with higher memory for existing machines?

Pay more attention to the performance metrics the card gets from
people testing it here.  Areca, Escalade / 3Ware, and LSI get good
reviews, with LSI being solid but a little slower than the other two
for most stuff.

> For large setups (ie 500GB+ per server) does it make sense to try to get a
> controller in a machine or do SANs have better throughput even if at a much
> higher cost?

SANs generally don't have much better performance, and cost MUCH more
per meg stored.  They do however have some nice management options.
If a large number of disks in discrete machines presents a problem of
maintenance, the SAN might help, but given the higher cost, it's often
just cheaper to keep a box of disks handy and have a hardware person
replace them.

> For future machines I plan to look into controllers with at least 512MB,
> which likely will be PCI-X/PCI-e.. not seen anything with large caches for
> PCI.

See remark about PCI-X / PCI

> Also the machines in question have SCSI drives, not SAS. I believe the
> most recent machine has SAS, but the others may be 15K rpm scsi
> Whether a SAN or just an external enclosure is 12disk enough to substain 5K
> inserts/updates per second on rows in the 30 to 90bytes territory?

You'll only know  by testing, and a better RAID controller can make a
WORLD of difference here.  Just make sure whatever controller you get
has battery backed cache, and preferably a fair bit of it.  Some
controllers can handle 1G+ of memory.

Re: RAID controllers for Postgresql on large setups

From

PFC

Date:

13 May 2008, 06:54:28

> Will it pay to go to a controller with higher memory for existing
> machines? The one machine I am about to redo has PCI which seems to
> somewhat limit our options.

    Urgh.

    You say that like you don't mind having PCI in a server whose job is to
perform massive query over large data sets.

    Your 12 high-end expensive SCSI drives will have a bandwidth of ... say
800 MB/s total (on reads), perhaps more.
    PCI limits you to 133 MB/s (theoretical), actual speed being around
100-110 MB/s.

    Conclusion : 85% of the power of your expensive drives is wasted by
hooking them up to the slow PCI bus ! (and hence your money is wasted too)

    For instance here I have a box with PCI, Giga Ethernet and a software
RAID5 ; reading from the RAID5 goes to about 110 MB/s (actual disk
bandwidth is closer to 250 but it's wasted) ; however when using the giga
ethernet to copy a large file over a LAN, disk and ethernet have to share
the PCI bus, so throughput falls to 50 MB/s.  Crummy, eh ?

    => If you do big data imports over the network, you lose 50% speed again
due to the bus sharing between ethernet nic and disk controller.

    In fact for bulk IO a box with 2 SATA drives would be just as fast as
your monster RAID, lol.

    And for bulk imports from network a $500 box with a few SATA drives and a
giga-ethernet, all via PCIexpress (any recent Core2 chipset) will be
faster than your megabuck servers.

    Let me repeat this : at the current state of SATA drives, just TWO of
them is enough to saturate a PCI bus. I'm speaking desktop SATA drives,
not high-end SCSI ! (which is not necessarily faster for pure throughput
anyway).
    Adding more drives will help random reads/writes but do nothing for
throughput since the tiny PCI pipe is choking.

    So, use PCIe, PCIx, whatever, but get rid of the bottleneck.
    Your money is invested in disk drives... keep those, change your RAID
controller which sucks anyway, and change your motherboard ...

    If you're limited by disk throughput (or disk <-> giga ethernet PCI bus
contention), you'll get a huge boost by going PCIe or PCIx. You might even
need less servers.

> For future machines I plan to look into controllers with at least 512MB,
> which likely will be PCI-X/PCI-e..

> not seen anything with large caches for PCI.

    That's because high performance != PCI

> Whether a SAN or just an external enclosure is 12disk enough to substain
> 5K inserts/updates per second on rows in the 30 to 90bytes territory? At
> 5K/second inserting/updating 100 Million records would take 5.5 hours.
> That is fairly reasonable if we can achieve. Faster would be better, but
> it depends on what it would cost to achieve.

    If you mean 5K transactions with begin / insert or update 1 row / commit,
that's a lot, and you are going to need cache, BBU, and 8.3 so fsync isn't
a problem anymore.
    On your current setup with 15K drives if you need 1 fsync per INSERT you
won't do more than 250 per second, which is very limiting... PG 8.3's "one
fsync per second instead of one at each commit" feature is a really cheap
alternative to a BBU (not as good as a real BBU, but much better than
nothing !)

    If you mean doing large COPY or inserting/updating lots of rows using one
SQL statement, you are going to need disk bandwidth.

    For instance if you have your 100M x 90 byte rows + overhead, that's
about 11 GB
    The amount of data to write is twice that because of the xlog, so 22 GB
to write, and 11 GB to read, total 33 GB.

    On your setup you have a rather low 110 MB/s throughput it would take a
bit more than 3 min 20 s. With 800 MB/s bandwidth it would take 45
seconds. (but I don't know if Postgres can process data this fast,
although I'd say probably).
    Of course if you have many indexes which need to be updated this will add
random IO and more WAL traffic to the mix.
    Checkpoints andbgwriter also need to be tuned so they don't kill your
performance when writing lots of data.

    For your next servers as the other on the list will tell you, a good RAID
card, and lots of SATA drives is a good choice. SATA is cheap, so you can
get more drives for the same price, which means more bandwidth :

http://tweakers.net/reviews/557/17/comparison-of-nine-serial-ata-raid-5-adapters-pagina-17.html

    Of course none of those uses PCI.
    RAID5 is good for read speed, and big sequential writes. So if the only
thing that you do is load up a multi-gigabyte dump and process it, it's
good.
    Now if you do bulk UPDATEs (like updating all the rows in one of the
partitions of your huge table) RAID5 is good too.
    However RAID5 will choke and burn on small random writes, which will come
 from UPDATing random rows in a large table, updating indexes, etc. Since
you are doing this apparently, RAID5 is therefore NOT advised !

    Also consider the usual advice, like CLUSTER, or when you load a large
amount of data in the database, COPY it to a temp table, then INSERT it in
the main table with INSERT INTO table SELECT FROM temp_table ORDER BY
(interesting_fields). If the "interesting_fields" are something like the
date and you often select or update on a date range, for instance, you'll
get more performance if all the rows from the same day are close on disk.

    Have you considered Bizgres ?

Re: RAID controllers for Postgresql on large setups

From

Francisco Reyes

Date:

13 May 2008, 09:01:04

PFC writes:

>     You say that like you don't mind having PCI in a server whose job is to
> perform massive query over large data sets.

I am in my 4th week at a new job. Trying to figure what I am working with.
From what I see I will likely get as much improvement from new hardware as
from re-doing some of the database design. Can't get everything done at
once, not to mention I have to redo one machine sooner rather than later so
I need to prioritize.

>In fact for bulk IO a box with 2 SATA drives would be just as fast as
> your monster RAID, lol.

I am working on setting up a standard test based on the type of operations
that the company does. This will give me a beter idea. Specially I will work
with the developers to make sure the queries I create for the benchmark are
representative of the workload.

>Adding more drives will help random reads/writes but do nothing for
> throughput since the tiny PCI pipe is choking.

Understood, but right now I have to use the hardware they already have. Just
trying to make the most of it. I believe another server is due in some
months so then I can better plan.

In your opinion if we get a new machine with PCI-e, at how many spindles
will the  SCSI random access superiority start to be less notable? Specially
given the low number of connections we usually have running against these
machines.

>If you mean doing large COPY or inserting/updating lots of rows using one
> SQL statement, you are going to need disk bandwidth.

We are using one single SQL statement.

> http://tweakers.net/reviews/557/17/comparison-of-nine-serial-ata-raid-5-adapters-pagina-17.html

I have heard great stories about Areca controllers. That is definitely one
in my list to research and consider.

>     However RAID5 will choke and burn on small random writes, which will come
>  from UPDATing random rows in a large table, updating indexes, etc. Since
> you are doing this apparently, RAID5 is therefore NOT advised !

I thought I read a while back in this list that as the number of drives
increased that RAID 5 was less bad. Say an external enclosure with 20+
drives.

>Have you considered Bizgres ?

Yes. In my todo list, to check it further. I have also considered Greenplums
may DB offering that has clustering, but when I initially mentioned it there
was some reluctance because of cost. Also will look into Enterprise DB.

Right now I am trying to learn usage patterns, what DBs need to be
re-designed and what hardware I have to work with. Not to mention learning
what all these tables are. Also need to make time to research/get a good
ER-diagram tool and document all these DBs. :(

Re: RAID controllers for Postgresql on large setups

From

"Merlin Moncure"

Date:

13 May 2008, 09:10:29

On Tue, May 13, 2008 at 8:00 AM, Francisco Reyes <lists@stringsutils.com> wrote:
> PFC writes:
>
>
> >        You say that like you don't mind having PCI in a server whose job
> is to  perform massive query over large data sets.
> >
>
>  I am in my 4th week at a new job. Trying to figure what I am working with.
>  From what I see I will likely get as much improvement from new hardware as
> from re-doing some of the database design. Can't get everything done at
> once, not to mention I have to redo one machine sooner rather than later so
> I need to prioritize.
>
>
>
> > In fact for bulk IO a box with 2 SATA drives would be just as fast as
> your monster RAID, lol.
> >
>
>  I am working on setting up a standard test based on the type of operations
> that the company does. This will give me a beter idea. Specially I will work
> with the developers to make sure the queries I create for the benchmark are
> representative of the workload.
>
>
>
> > Adding more drives will help random reads/writes but do nothing for
> throughput since the tiny PCI pipe is choking.
> >
>
>  Understood, but right now I have to use the hardware they already have.
> Just trying to make the most of it. I believe another server is due in some
> months so then I can better plan.
>
>  In your opinion if we get a new machine with PCI-e, at how many spindles
> will the  SCSI random access superiority start to be less notable? Specially
> given the low number of connections we usually have running against these
> machines.
>
>
>
> >        However RAID5 will choke and burn on small random writes, which
> will come   from UPDATing random rows in a large table, updating indexes,
> etc. Since  you are doing this apparently, RAID5 is therefore NOT advised !
> >
>
>  I thought I read a while back in this list that as the number of drives
> increased that RAID 5 was less bad. Say an external enclosure with 20+
> drives.

maybe, but I don't think very many people run that many drives in a
raid 5 configuration...too dangerous.   with 20 drives in a single
volume, you need to be running raid 10 or raid 6.  20 drive raid 50 is
pushing it as well..I'd at least want a hot spare.

merlin

Re: RAID controllers for Postgresql on large setups

From

Vivek Khera

Date:

13 May 2008, 12:08:36

On May 12, 2008, at 10:04 PM, Francisco Reyes wrote:

> Adaptec 2120 SCSI controller (64MB of cache).
>
> The servers have mostly have 12 drives in RAID 10.
> We are going to redo one machine to compare  RAID 10 vs RAID 50.
> Mostly to see if the perfomance is close, the space gain may be
> usefull.

with only 64Mb of cache, you will see degradation of performance.
from my experience, the adaptec controllers are not the best choice,
but that's mostly FreeBSD experience.  And if you don't have a BBU,
you're not benefitting from the write-back cache at all so it is kind
of moot.

If you want to buy a couple of 2230SLP cards with 256Mb of RAM, I have
them for sale.  They're identical to the 2130SLP but have two SCSI
channels per card instead of one.  they both have BBUs, and are in
working condition.  I retired them in favor of an external RAID
attached via Fibre Channel.

Re: RAID controllers for Postgresql on large setups

From

Vivek Khera

Date:

13 May 2008, 12:09:20

On May 12, 2008, at 11:24 PM, Francisco Reyes wrote:

> Any PCI controller you have had good experience with?
> How any other PCI-X/PCI-e controller that you have had good results?

The LSI controllers are top-notch, and always my first choice.  They
have PCI-X and PCI-e versions.

Re: RAID controllers for Postgresql on large setups

From

James Mansion

Date:

13 May 2008, 17:09:34

PFC wrote:
>     PCI limits you to 133 MB/s (theoretical), actual speed being
> around 100-110 MB/s.
Many servers do have more than one bus.  You have to process that data
too so its not going to be as much of a limit as you are suggesting.  It
may be possible to stream a compressed data file to the server and copy
in from that after decompression, which will free LAN bandwidth.  Or
even if you RPC blocks of compressed data and decompress in the proc and
insert right there.

>     On your current setup with 15K drives if you need 1 fsync per
> INSERT you won't do more than 250 per second, which is very limiting...
Well, thats 250 physical syncs.  But if you have multiple insert streams
(for group commit), or can batch the rows in each insert or copy, its
not necessarily as much of a problem as you seem to be implying.
Particularly if you are doing the holding table trick.

James

Re: RAID controllers for Postgresql on large setups

From

PFC

Date:

13 May 2008, 17:40:01

>
>>     You say that like you don't mind having PCI in a server whose job is
>> to  perform massive query over large data sets.
>
> I am in my 4th week at a new job. Trying to figure what I am working
> with.

    LOOL, ok, hehe, not exactly the time to have a "let's change everything"
fit ;)

> From what I see I will likely get as much improvement from new hardware
> as from re-doing some of the database design. Can't get everything done
> at once, not to mention I have to redo one machine sooner rather than
> later so I need to prioritize.
>
>> In fact for bulk IO a box with 2 SATA drives would be just as fast as
>> your monster RAID, lol.
>
> I am working on setting up a standard test based on the type of
> operations that the company does. This will give me a beter idea.
> Specially I will work with the developers to make sure the queries I
> create for the benchmark are representative of the workload.

    watching vmstat (or iostat) while running a very big seq scan query will
give you information about the reading speed of your drives.
    Same for writes, during one of your big updates, watch vmstat, you'll
know if you are CPU bound or IO bound...

- one core at 100% -> CPU bound
- lots of free CPU but lots of iowait -> disk bound
    - disk throughput decent (in your setup, 100 MB/s) -> PCI bus saturation
    - disk throughput miserable (< 10 MB/s) -> random IO bound (either random
reads or fsync() or random writes depending on the case)

> In your opinion if we get a new machine with PCI-e, at how many spindles
> will the  SCSI random access superiority start to be less notable?
> Specially given the low number of connections we usually have running
> against these machines.

    Sorting of random reads depends on multiple concurrent requests (which
you don't have). Sorting of random writes does not depend on concurrent
requests so, you'll benefit on your updates. About SCSI vs SATA vs number
of spindles : can't answer this one.

> We are using one single SQL statement.

    OK, so forget about fsync penalty, but do tune your checkpoints so they
are not happening all the time... and bgwriter etc.

Re: RAID controllers for Postgresql on large setups

From

Marinos Yannikos

Date:

26 May 2008, 20:14:05

PFC schrieb:
>     PCI limits you to 133 MB/s (theoretical), actual speed being around
> 100-110 MB/s.

"Current" PCI 2.1+ implementations allow 533MB/s (32bit) to 1066MB/s
(64bit) since 6-7 years ago or so.

>     For instance here I have a box with PCI, Giga Ethernet and a
> software RAID5 ; reading from the RAID5 goes to about 110 MB/s (actual
> disk bandwidth is closer to 250 but it's wasted) ; however when using
> the giga ethernet to copy a large file over a LAN, disk and ethernet
> have to share the PCI bus, so throughput falls to 50 MB/s.  Crummy, eh ?

Sounds like a slow Giga Ethernet NIC...

>     Let me repeat this : at the current state of SATA drives, just TWO
> of them is enough to saturate a PCI bus. I'm speaking desktop SATA
> drives, not high-end SCSI ! (which is not necessarily faster for pure
> throughput anyway).
>     Adding more drives will help random reads/writes but do nothing for
> throughput since the tiny PCI pipe is choking.

In my experience, SATA drives are very slow for typical database work
(which is heavy on random writes). They often have very slow access
times, bad or missing NCQ implementation (controllers / SANs as well)
and while I am not very familiar with the protocol differences, they
seem to add a hell of a lot more latency than even old U320 SCSI drives.

Sequential transfer performance is a nice indicator, but not very
useful, since most serious RAID arrays will have bottlenecks other than
the theoretical cumulated transfer rate of all the drives (from
controller cache speed to SCSI bus to fibre channel). Thus, lower
sequential transfer rate and lower access times scale much better.

>> Whether a SAN or just an external enclosure is 12disk enough to
>> substain 5K inserts/updates per second on rows in the 30 to 90bytes
>> territory? At 5K/second inserting/updating 100 Million records would
>> take 5.5 hours. That is fairly reasonable if we can achieve. Faster
>> would be better, but it depends on what it would cost to achieve.

5K/s inserts (with no indexes) are easy with PostgreSQL and typical
(current) hardware. We are copying about 175K rows/s with our current
server (Quad core Xeon 2.93GHz, lots of RAM, meagre performance SATA SAN
with RAID-5 but 2GB writeback cache). Rows are around 570b each on
average. Performance is CPU-bound with a typical number of indexes on
the table and much lower than 175K/s though, for single row updates we
get about 9K/s per thread (=5.6MB/s) and that's 100% CPU-bound on the
server - if we had to max this out, we'd thus use several clients in
parallel and/or collect inserts in text files and make bulk updates
using COPY. The slow SAN isn't a problem now.

Our SATA SAN suffers greatly when reads are interspersed with writes,
for that you want more spindles and faster disks.

To the OP I have 1 hearty recommendation: if you are using the
RAID-functionality of the 2120, get rid of it. If you can wipe the
disks, try using Linux software-RAID (yes, it's an admin's nightmare
etc. but should give much better performance even though the 2120's
plain SCSI won't be hot either) and then start tuning your PostgreSQL
installation (there's much to gain here). Your setup looks decent
otherwise for what you are trying to do (but you need a fast CPU) and
your cheapest upgrade path would be a decent RAID controller or at least
a decent non-RAID SCSI controller for software-RAID (at least 2 ports
for 12 disks), although the plain PCI market is dead.

-mjy