On 02/19/2013 09:51 AM, Josh Berkus wrote:
> On 02/18/2013 08:28 PM, Mark Kirkwood wrote:
>> Might be worth looking at your vm.dirty_ratio, vm.dirty_background_ratio
>> and friends settings. We managed to choke up a system with 16x SSD by
>> leaving them at their defaults...
>
> Yeah? Any settings you'd recommend specifically? What did you use on
> the SSD system?
>
NM, I tested lowering dirty_background_ratio, and it didn't help,
because checkpoints are kicking in before pdflush ever gets there.
So the issue seems to be that if you have this combination of factors:
1. large RAM
2. many/fast CPUs
3. a database which fits in RAM but is larger than the RAID controller's
WB cache
4. pg_xlog on the same volume as pgdata
... then you'll see checkpoint "stalls" and spread checkpoint will
actually make them worse by making the stalls longer.
Moving pg_xlog to a separate partition makes this better. Making
bgwriter more aggressive helps a bit more on top of that.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com