Home > mailing lists

Re: Improvement of checkpoint IO scheduler for stable transaction responses - Mailing list pgsql-hackers

From	didier
Subject	Re: Improvement of checkpoint IO scheduler for stable transaction responses
Date	July 22, 2013 01:26:34
Msg-id	CAJRYxu+XT8EA+OMGq1GUXztGZSpkcX1+ZYAXNaNPXv5YyUeKOg@mail.gmail.com Whole thread
In response to	Re: Improvement of checkpoint IO scheduler for stable transaction responses (Greg Smith <greg@2ndQuadrant.com>)
List	pgsql-hackers

Tree view

On Sat, Jul 20, 2013 at 6:28 PM, Greg Smith <greg@2ndquadrant.com> wrote:

On 7/20/13 4:48 AM, didier wrote:
With your tests did you try to write the hot buffers first? ie buffers
with a high refcount, either by sorting them on refcount or at least
sweeping the buffer list in reverse?

I never tried that version. After a few rounds of seeing that all changes I tried were just rearranging the good and bad cases, I got pretty bored with trying new changes in that same style.

by writing to the OS the less likely to be recycle buffers first it may
have less work to do at fsync time, hopefully they have been written by
the OS background task during the spread and are not re-dirtied by other
backends.

That is the theory. In practice write caches are so large now, there is almost no pressure forcing writes to happen until the fsync calls show up. It's easily possible to enter the checkpoint fsync phase only to discover there are 4GB of dirty writes ahead of you, ones that have nothing to do with the checkpoint's I/O.

Backends are constantly pounding the write cache with new writes in situations with checkpoint spikes. The writes and fsync calls made by the checkpoint process are only a fraction of the real I/O going on. The volume of data being squeezed out by each fsync call is based on total writes to that relation since the checkpoint. That's connected to the writes to that relation happening during the checkpoint, but the checkpoint writes can easily be the minority there.

It is not a coincidence that the next feature I'm working on attempts to quantify the total writes to each 1GB relation chunk. That's the most promising path forward on the checkpoint problem I've found.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

pgsql-hackers by date:

From: Tatsuo Ishii
Date: 22 July 2013, 00:22:13
Subject: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)

From: Quan Zongliang
Date: 22 July 2013, 04:17:51
Subject: improve Chinese locale performance

Re: Improvement of checkpoint IO scheduler for stable transaction responses - Mailing list pgsql-hackers

Previous

Next