Re: Raid 10 chunksize - Mailing list pgsql-performance

From david@lang.hm
Subject Re: Raid 10 chunksize
Date
Msg-id alpine.DEB.1.10.0904031800220.28893@asgard.lang.hm
Whole thread Raw
In response to Re: Raid 10 chunksize  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Raid 10 chunksize  (Greg Smith <gsmith@gregsmith.com>)
Re: Raid 10 chunksize  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
On Fri, 3 Apr 2009, Greg Smith wrote:

> Hannes sent this off-list, presumably via newsgroup, and it's certainly worth
> sharing.  I've always been scared off of using XFS because of the problems
> outlined at http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc , with more
> testing showing similar issues at http://pages.cs.wisc.edu/~vshree/xfs.pdf
> too
>
> (I'm finding that old message with Ted saying "Making sure you don't lose
> data is Job #1" hilarious right now, consider the recent ext4 data loss
> debacle)

also note that the message from Ted was back in 2004, there has been a
_lot_ of work done on XFS in the last 4 years.

as for the second link, that focuses on what happens to the filesystem if
the disk under it starts returning errors or garbage. with the _possible_
exception of ZFS, every filesystem around will do strange things under
those conditions. and in my option, the way to deal with this sort of
thing isn't to move to ZFS to detect the problem, it's to setup redundancy
in your storage so that you can not only detect the problem, but correct
it as well (it's a good thing to know that your database file is corrupt,
but that's not nearly as useful as having some way to recover the data
that was there)

David Lang

> ---------- Forwarded message ----------
> Date: Fri, 3 Apr 2009 10:19:38 +0200
> From: Hannes Dorbath <light@theendofthetunnel.de>
> Newsgroups: pgsql.performance
> Subject: Re: [PERFORM] Raid 10 chunksize
>
> Ron Mayer wrote:
>> Greg Smith wrote:
>>> On Wed, 1 Apr 2009, Scott Carey wrote:
>>>
>>>> Write caching on SATA is totally fine.  There were some old ATA drives
>>>> that when paried with some file systems or OS's would not be safe.  There
>>>> are
>>>> some combinations that have unsafe write barriers.  But there is a
>>>> standard
>>>> well supported ATA command to sync and only return after the data is on
>>>> disk.  If you are running an OS that is anything recent at all, and any
>>>> disks that are not really old, you're fine.
>>> While I would like to believe this, I don't trust any claims in this
>>> area that don't have matching tests that demonstrate things working as
>>> expected.  And I've never seen this work.
>>>
>>> My laptop has a 7200 RPM drive, which means that if fsync is being
>>> passed through to the disk correctly I can only fsync <120
>>> times/second.  Here's what I get when I run sysbench on it, starting
>>> with the default ext3 configuration:
>>
>> I believe it's ext3 who's cheating in this scenario.
>
> I assume so too. Here the same test using XFS, first with barriers (XFS
> default) and then without:
>
> Linux 2.6.28-gentoo-r2 #1 SMP Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz
> GenuineIntel GNU/Linux
>
> /dev/sdb /data2 xfs rw,noatime,attr2,logbufs=8,logbsize=256k,noquota 0 0
>
> # sysbench --test=fileio --file-fsync-freq=1 --file-num=1
> --file-total-size=16384 --file-test-mode=rndwr run
> sysbench 0.4.10:  multi-threaded system evaluation benchmark
>
> Running the test with following options:
> Number of threads: 1
>
> Extra file open flags: 0
> 1 files, 16Kb each
> 16Kb total file size
> Block size 16Kb
> Number of random requests for random IO: 10000
> Read/Write ratio for combined random IO test: 1.50
> Periodic FSYNC enabled, calling fsync() each 1 requests.
> Calling fsync() at the end of test, Enabled.
> Using synchronous I/O mode
> Doing random write test
> Threads started!
> Done.
>
> Operations performed:  0 Read, 10000 Write, 10000 Other = 20000 Total
> Read 0b  Written 156.25Mb  Total transferred 156.25Mb  (463.9Kb/sec)
>   28.99 Requests/sec executed
>
> Test execution summary:
>    total time:                          344.9013s
>    total number of events:              10000
>    total time taken by event execution: 0.1453
>    per-request statistics:
>         min:                                  0.01ms
>         avg:                                  0.01ms
>         max:                                  0.07ms
>         approx.  95 percentile:               0.01ms
>
> Threads fairness:
>    events (avg/stddev):           10000.0000/0.00
>    execution time (avg/stddev):   0.1453/0.00
>
>
> And now without barriers:
>
> /dev/sdb /data2 xfs
> rw,noatime,attr2,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
>
> # sysbench --test=fileio --file-fsync-freq=1 --file-num=1
> --file-total-size=16384 --file-test-mode=rndwr run
> sysbench 0.4.10:  multi-threaded system evaluation benchmark
>
> Running the test with following options:
> Number of threads: 1
>
> Extra file open flags: 0
> 1 files, 16Kb each
> 16Kb total file size
> Block size 16Kb
> Number of random requests for random IO: 10000
> Read/Write ratio for combined random IO test: 1.50
> Periodic FSYNC enabled, calling fsync() each 1 requests.
> Calling fsync() at the end of test, Enabled.
> Using synchronous I/O mode
> Doing random write test
> Threads started!
> Done.
>
> Operations performed:  0 Read, 10000 Write, 10000 Other = 20000 Total
> Read 0b  Written 156.25Mb  Total transferred 156.25Mb  (62.872Mb/sec)
> 4023.81 Requests/sec executed
>
> Test execution summary:
>    total time:                          2.4852s
>    total number of events:              10000
>    total time taken by event execution: 0.1325
>    per-request statistics:
>         min:                                  0.01ms
>         avg:                                  0.01ms
>         max:                                  0.06ms
>         approx.  95 percentile:               0.01ms
>
> Threads fairness:
>    events (avg/stddev):           10000.0000/0.00
>    execution time (avg/stddev):   0.1325/0.00
>
>
>

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Using IOZone to simulate DB access patterns
Next
From: Greg Smith
Date:
Subject: Re: Raid 10 chunksize