Re: PostgreSQL block size for SSD RAID setup? - Mailing list pgsql-performance

From PFC
Subject Re: PostgreSQL block size for SSD RAID setup?
Date
Msg-id op.upw6xtwdcigqcu@soyouz
Whole thread Raw
In response to PostgreSQL block size for SSD RAID setup?  (henk de wit <henk53602@hotmail.com>)
Responses Re: PostgreSQL block size for SSD RAID setup?  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
> Hi,
> I was reading a benchmark that sets out block sizes against raw IO
> performance for a number of different RAID configurations involving high
> end SSDs (the Mtron 7535) on a powerful RAID controller (the Areca
> 1680IX with 4GB RAM). See
> http://jdevelopment.nl/hardware/one-dvd-per-second/

    Lucky guys ;)

    Something that bothers me about SSDs is the interface... The latest flash
chips from Micron (32Gb = 4GB per chip) have something like 25 us "access
time" (lol) and push data at 166 MB/s (yes megabytes per second) per chip.
So two of these chips are enough to bottleneck a SATA 3Gbps link... there
would be 8 of those chips in a 32GB SSD. Parallelizing would depend on the
block size : putting all chips in parallel would increase the block size,
so in practice I don't know how it's implemented, probably depends on the
make and model of SSD.

    And then RAIDing those (to get back the lost throughput from using SATA)
will again increase the block size which is bad for random writes. So it's
a bit of a chicken and egg problem. Also since harddisks have high
throughput but slow seeks, all the OS'es and RAID cards, drivers, etc are
probably optimized for throughput, not IOPS. You need a very different
strategy for 100K/s 8kbyte IOs versus 1K/s 1MByte IOs. Like huge queues,
smarter hardware, etc.

    FusionIO got an interesting product by using the PCI-e interface which
brings lots of benefits like much higher throughput and the possibility of
using custom drivers optimized for handling much more IO requests per
second than what the OS and RAID cards, and even SATA protocol, were
designed for.

    Intrigued by this I looked at the FusionIO benchmarks : more than 100.000
IOPS, really mindboggling, but in random access over a 10MB file. A little
bit of google image search reveals the board contains a lot of Flash chips
(expected) and a fat FPGA (expected) probably a high-end chip from X or A,
and two DDR RAM chips from Samsung, probably acting as cache. So I wonder
if the 10 MB file used as benchmark to reach those humongous IOPS was
actually in the Flash ?... or did they actually benchmark the device's
onboard cache ?...

    It probably has writeback cache so on a random writes benchmark this is
an interesting question. A good RAID card with BBU cache would have the
same benchmarking gotcha (ie if you go crazy on random writes on a 10 MB
file which is very small, and the device is smart, possibly at the end of
the benchmark nothing at all was written to the disks !)

    Anyway in a database use case if random writes are going to be a pain
they are probably not going to be distributed in a tiny 10MB zone which
the controller cache would handle...

    (just rambling XDD)

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: full text search - dictionary caching
Next
From: Farhan Husain
Date:
Subject: Re: Abnormal performance difference between Postgres and MySQL