Re: RAID controllers for Postgresql on large setups - Mailing list pgsql-performance

From PFC
Subject Re: RAID controllers for Postgresql on large setups
Date
Msg-id op.ua26u3iacigqcu@apollo13.peufeu.com
Whole thread Raw
In response to RAID controllers for Postgresql on large setups  (Francisco Reyes <lists@stringsutils.com>)
Responses Re: RAID controllers for Postgresql on large setups
Re: RAID controllers for Postgresql on large setups
List pgsql-performance
> Will it pay to go to a controller with higher memory for existing
> machines? The one machine I am about to redo has PCI which seems to
> somewhat limit our options.

    Urgh.

    You say that like you don't mind having PCI in a server whose job is to
perform massive query over large data sets.

    Your 12 high-end expensive SCSI drives will have a bandwidth of ... say
800 MB/s total (on reads), perhaps more.
    PCI limits you to 133 MB/s (theoretical), actual speed being around
100-110 MB/s.

    Conclusion : 85% of the power of your expensive drives is wasted by
hooking them up to the slow PCI bus ! (and hence your money is wasted too)

    For instance here I have a box with PCI, Giga Ethernet and a software
RAID5 ; reading from the RAID5 goes to about 110 MB/s (actual disk
bandwidth is closer to 250 but it's wasted) ; however when using the giga
ethernet to copy a large file over a LAN, disk and ethernet have to share
the PCI bus, so throughput falls to 50 MB/s.  Crummy, eh ?

    => If you do big data imports over the network, you lose 50% speed again
due to the bus sharing between ethernet nic and disk controller.

    In fact for bulk IO a box with 2 SATA drives would be just as fast as
your monster RAID, lol.

    And for bulk imports from network a $500 box with a few SATA drives and a
giga-ethernet, all via PCIexpress (any recent Core2 chipset) will be
faster than your megabuck servers.

    Let me repeat this : at the current state of SATA drives, just TWO of
them is enough to saturate a PCI bus. I'm speaking desktop SATA drives,
not high-end SCSI ! (which is not necessarily faster for pure throughput
anyway).
    Adding more drives will help random reads/writes but do nothing for
throughput since the tiny PCI pipe is choking.

    So, use PCIe, PCIx, whatever, but get rid of the bottleneck.
    Your money is invested in disk drives... keep those, change your RAID
controller which sucks anyway, and change your motherboard ...

    If you're limited by disk throughput (or disk <-> giga ethernet PCI bus
contention), you'll get a huge boost by going PCIe or PCIx. You might even
need less servers.

> For future machines I plan to look into controllers with at least 512MB,
> which likely will be PCI-X/PCI-e..

> not seen anything with large caches for PCI.

    That's because high performance != PCI

> Whether a SAN or just an external enclosure is 12disk enough to substain
> 5K inserts/updates per second on rows in the 30 to 90bytes territory? At
> 5K/second inserting/updating 100 Million records would take 5.5 hours.
> That is fairly reasonable if we can achieve. Faster would be better, but
> it depends on what it would cost to achieve.

    If you mean 5K transactions with begin / insert or update 1 row / commit,
that's a lot, and you are going to need cache, BBU, and 8.3 so fsync isn't
a problem anymore.
    On your current setup with 15K drives if you need 1 fsync per INSERT you
won't do more than 250 per second, which is very limiting... PG 8.3's "one
fsync per second instead of one at each commit" feature is a really cheap
alternative to a BBU (not as good as a real BBU, but much better than
nothing !)

    If you mean doing large COPY or inserting/updating lots of rows using one
SQL statement, you are going to need disk bandwidth.

    For instance if you have your 100M x 90 byte rows + overhead, that's
about 11 GB
    The amount of data to write is twice that because of the xlog, so 22 GB
to write, and 11 GB to read, total 33 GB.

    On your setup you have a rather low 110 MB/s throughput it would take a
bit more than 3 min 20 s. With 800 MB/s bandwidth it would take 45
seconds. (but I don't know if Postgres can process data this fast,
although I'd say probably).
    Of course if you have many indexes which need to be updated this will add
random IO and more WAL traffic to the mix.
    Checkpoints andbgwriter also need to be tuned so they don't kill your
performance when writing lots of data.

    For your next servers as the other on the list will tell you, a good RAID
card, and lots of SATA drives is a good choice. SATA is cheap, so you can
get more drives for the same price, which means more bandwidth :

http://tweakers.net/reviews/557/17/comparison-of-nine-serial-ata-raid-5-adapters-pagina-17.html

    Of course none of those uses PCI.
    RAID5 is good for read speed, and big sequential writes. So if the only
thing that you do is load up a multi-gigabyte dump and process it, it's
good.
    Now if you do bulk UPDATEs (like updating all the rows in one of the
partitions of your huge table) RAID5 is good too.
    However RAID5 will choke and burn on small random writes, which will come
 from UPDATing random rows in a large table, updating indexes, etc. Since
you are doing this apparently, RAID5 is therefore NOT advised !

    Also consider the usual advice, like CLUSTER, or when you load a large
amount of data in the database, COPY it to a temp table, then INSERT it in
the main table with INSERT INTO table SELECT FROM temp_table ORDER BY
(interesting_fields). If the "interesting_fields" are something like the
date and you often select or update on a date range, for instance, you'll
get more performance if all the rows from the same day are close on disk.

    Have you considered Bizgres ?






pgsql-performance by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: Installation Steps to migrate to Postgres 8.3.1
Next
From: Francisco Reyes
Date:
Subject: Re: RAID controllers for Postgresql on large setups