Re: Weird XFS WAL problem - Mailing list pgsql-performance

From Craig James
Subject Re: Weird XFS WAL problem
Date
Msg-id 4C07E103.9020909@emolecules.com
Whole thread Raw
In response to Re: Weird XFS WAL problem  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses Re: Weird XFS WAL problem
Re: Weird XFS WAL problem
List pgsql-performance
On 6/2/10 4:40 PM, Mark Kirkwood wrote:
> On 03/06/10 11:30, Craig James wrote:
>> I'm testing/tuning a new midsize server and ran into an inexplicable
>> problem. With an RAID10 drive, when I move the WAL to a separate RAID1
>> drive, TPS drops from over 1200 to less than 90! I've checked
>> everything and can't find a reason.
>
> Are the 2 new RAID1 disks the same make and model as the 12 RAID10 ones?

Yes.

> Also, are barriers *on* on the RAID1 mount and off on the RAID10 one?

It was the barriers.  "barrier=1" isn't just a bad idea on ext4, it's a disaster.

pgbench -i -s 100 -U test
pgbench -c 10 -t 10000 -U test

Change WAL to barrier=0

     tps = 1463.264981 (including connections establishing)
     tps = 1463.725687 (excluding connections establishing)

Change WAL to noatime, nodiratime, barrier=0

     tps = 1479.331476 (including connections establishing)
     tps = 1479.810545 (excluding connections establishing)

Change WAL to barrier=1

     tps = 82.325446 (including connections establishing)
     tps = 82.326874 (excluding connections establishing)

This is really hard to believe, because the bonnie++ numbers and dd(1) numbers look good (see my original post).  But
it'stotally repeatable.  It must be some really unfortunate "just missed the next sector going by the write head"
problem.

So with ext4, bonnie++ and dd aren't the whole story.

BTW, I also learned that if you edit /etc/fstab and use "mount -oremount" it WON'T change "barrier=0/1" unless it is
explicitin the fstab file.  That is, if you put "barrier=0" into /etc/fstab and use the remount, it will change it to
nobarriers.  But if you then remove it from /etc/fstab, it won't change it back to the default.  You have to actually
put"barrier=1" if you want to get it back to the default.  This seems like a bug to me, and it made it really hard to
trackthis down. "mount -oremount" is not the same as umount/mount! 

Craig

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Weird XFS WAL problem
Next
From: Matthew Wakeling
Date:
Subject: Re: Weird XFS WAL problem