Re: Raid 10 chunksize - Mailing list pgsql-performance

From Ron Mayer
Subject Re: Raid 10 chunksize
Date
Msg-id 49D553E2.8000405@cheapcomplexdevices.com
Whole thread Raw
In response to Re: Raid 10 chunksize  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Raid 10 chunksize  (Hannes Dorbath <light@theendofthetunnel.de>)
List pgsql-performance
Greg Smith wrote:
> On Wed, 1 Apr 2009, Scott Carey wrote:
>
>> Write caching on SATA is totally fine.  There were some old ATA drives
>> that when paried with some file systems or OS's would not be safe.  There are
>> some combinations that have unsafe write barriers.  But there is a
>> standard
>> well supported ATA command to sync and only return after the data is on
>> disk.  If you are running an OS that is anything recent at all, and any
>> disks that are not really old, you're fine.
>
> While I would like to believe this, I don't trust any claims in this
> area that don't have matching tests that demonstrate things working as
> expected.  And I've never seen this work.
>
> My laptop has a 7200 RPM drive, which means that if fsync is being
> passed through to the disk correctly I can only fsync <120
> times/second.  Here's what I get when I run sysbench on it, starting
> with the default ext3 configuration:

I believe it's ext3 who's cheating in this scenario.

Any chance you can test the program I posted here that
tweaks the inode before the fsync:
http://archives.postgresql.org//pgsql-general/2009-03/msg00703.php

On my system with the fchmod's in that program I was getting one
fsync per disk revolution.   Without the fchmod's, fsync() didn't
wait at all.

This was the case on dozens of drives I tried, dating back to
old PATA drives from 2000.  Only drives from last century didn't
behave that way - but I can't accuse them of lying because
hdparm showed that they didn't claim to support FLUSH_CACHE.


I think this program shows that practically all hard drives are
physically capable of doing a proper fsync; but annoyingly
ext3 refuses to send the FLUSH_CACHE commands to the drive
unless the inode changed.


> $ uname -a
> Linux gsmith-t500 2.6.28-11-generic #38-Ubuntu SMP Fri Mar 27 09:00:52
> UTC 2009 i686 GNU/Linux
>
> $ mount
> /dev/sda3 on / type ext3 (rw,relatime,errors=remount-ro)
>
> $ sudo hdparm -I /dev/sda | grep FLUSH
>        *    Mandatory FLUSH_CACHE
>        *    FLUSH_CACHE_EXT
>
> $ ~/sysbench-0.4.8/sysbench/sysbench --test=fileio --file-fsync-freq=1
> --file-num=1 --file-total-size=16384 --file-test-mode=rndwr run
> sysbench v0.4.8:  multi-threaded system evaluation benchmark
>
> Running the test with following options:
> Number of threads: 1
>
> Extra file open flags: 0
> 1 files, 16Kb each
> 16Kb total file size
> Block size 16Kb
> Number of random requests for random IO: 10000
> Read/Write ratio for combined random IO test: 1.50
> Periodic FSYNC enabled, calling fsync() each 1 requests.
> Calling fsync() at the end of test, Enabled.
> Using synchronous I/O mode
> Doing random write test
> Threads started!
> Done.
>
> Operations performed:  0 Read, 10000 Write, 10000 Other = 20000 Total
> Read 0b  Written 156.25Mb  Total transferred 156.25Mb  (39.176Mb/sec)
>  2507.29 Requests/sec executed
>
>
> OK, that's clearly cached writes where the drive is lying about fsync.
> The claim is that since my drive supports both the flush calls, I just
> need to turn on barrier support, right?
>
> [Edit /etc/fstab to remount with barriers]
>
> $ mount
> /dev/sda3 on / type ext3 (rw,relatime,errors=remount-ro,barrier=1)
>
> [sysbench again]
>
>  2612.74 Requests/sec executed
>
> -----
>
> This is basically how this always works for me:  somebody claims
> barriers and/or SATA disks work now, no really this time.  I test, they
> give answers that aren't possible if fsync were working properly, I
> conclude turning off the write cache is just as necessary as it always
> was.  If you can suggest something wrong with how I'm testing here, I'd
> love to hear about it.  I'd like to believe you but I can't seem to
> produce any evidence that supports you claims here.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>


pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4
Next
From: Hannes Dorbath
Date:
Subject: Re: Raid 10 chunksize