On 1/11/12 4:33 AM, Florian Weimer wrote:
> Isn't this pretty much like tuning vm.dirty_bytes? We generally set it
> to pretty low values, and seems to help to smoothen the checkpoints.
When I experimented with dropping the actual size of the cache,
checkpoint spikes improved, but things like VACUUM ran terribly slow.
On a typical medium to large server nowadays (let's say 16GB+),
PostgreSQL needs to have gigabytes of write cache for good performance.
What we're aiming to here is keep the benefits of having that much write
cache, while allowing checkpoint related work to send increasingly
strong suggestions about ordering what it needs written soon. There's
basically three primary states on Linux to be concerned about here:
Dirty: in the cache via standard write
|
v pdflush does writeback at 5 or 10% dirty || sync_file_range push
|
Writeback
|
v write happens in the background || fsync call
|
Stored on disk
The systems with bad checkpoint problems will typically have gigabytes
"Dirty", which is necessary for good performance. It's very lazy about
pushing things toward "Writeback" though. Getting the oldest portions
of the outstanding writes into the Writeback queue more aggressively
should make the eventual fsync less likely to block.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com