Re: Huge iowait during checkpoint finish - Mailing list pgsql-general

From Greg Smith
Subject Re: Huge iowait during checkpoint finish
Date
Msg-id 4B4B9F2F.4030504@2ndquadrant.com
Whole thread Raw
In response to Re: Huge iowait during checkpoint finish  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: Huge iowait during checkpoint finish  (Scott Marlowe <scott.marlowe@gmail.com>)
Re: Huge iowait during checkpoint finish  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-general
Scott Marlowe wrote:
On Mon, Jan 11, 2010 at 3:53 AM, Anton Belyaev <anton.belyaev@gmail.com> wrote: 
Old RAID-1 has "hardware" LSI controller.
I still have access to old server.   
The old RAID card likely had a battery backed cache, which would make
the fsyncs much faster, as long as you hadn't run out of cache. 

To be super clear here:  it's possible to see a 100:1 performance drop going from a system with a battery-backed write cache to one that doesn't.  This one of the three main weak spots of software RAID that still keeps hardware RAID vendors in business:  it can't do anything to speed up the type of writes done during transactions commit and at checkpoint time.  (The others are that it's hard to setup transparent failover after failure in software RAID so that it always works at boot time, and that motherboard chipsets can easily lose their minds and take down the whole system when one drive goes bad).

If you can shoehorn one more drive, you could run RAID-10 and get much
better performance. 
And throwing drives at the problem may not help.  I've see a system with a 48 disk software RAID-10 that only got 100 TPS when running a commit-heavy test, because it didn't have any way to cache writes usefully for that purpose.

If the old system had a write caching card, and the new one doesn't, that's certainly your most likely suspect for the source of the slowdown.  As for testing that specifically, if you have the old system too you can look at the slides I've got for "Database Hardware Benchmarking" at http://www.westnet.com/~gsmith/content/postgresql/index.htm and use the sysbench example I show on P26 to measure commit fsync rate.  There's a video of that presentation where I explain a lot of the background in this area too.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com

pgsql-general by date:

Previous
From: Andy Colson
Date:
Subject: Re: migration: parameterized statement and cursor
Next
From: Scott Marlowe
Date:
Subject: Re: Huge iowait during checkpoint finish