Re: Sorting writes during checkpoint - Mailing list pgsql-patches

From Greg Smith
Subject Re: Sorting writes during checkpoint
Date
Msg-id Pine.GSO.4.64.0805050118001.24473@westnet.com
Whole thread Raw
In response to Re: Sorting writes during checkpoint  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
On Mon, 5 May 2008, Tom Lane wrote:

> It bothers me a bit that the patch forces writes to be done "all of file
> A in order, then all of file B in order, etc".  We don't know enough
> about the disk layout of the files to be sure that that's good. (This
> might also mean that whether there is a win is going to be platform and
> filesystem dependent ...)

I think most platform and filesystem implementations have disk location
correlated enough with block order that this particular issue isn't a
large one.  If the writes are mainly going to one logical area (a single
partition or disk array), it should be a win as long as the sorting step
itself isn't introducing a delay.  I am concered that in a more
complicated case than pgbench, where the writes are spread across multiple
arrays say, that forcing writes in order may slow things down.

Example:  let's say there's two tablespaces mapped to two arrays, A and B,
that the data is being written to at checkpoint time.  In the current
case, that I/O might be AABAABABBBAB, which is going to keep both arrays
busy writing.  The sorted case would instead make that AAAAAABBBBBB so
only one array will be active at a time.  It may very well be the case
that the improvement from lowering seeks on the writes to A and B is less
than the loss coming from not keeping both continuously busy.

I think I can simulate this by using a modified pgbench script that works
against an accounts1 and accounts2 with equal frequency, where 1&2 are
actually on different tablespaces on two disks.

> Right, that's in the ground rules for commitfests: if the submitter can
> respond to complaints before the fest is over, we'll reconsider the
> patch.

The small optimization I was trying to suggest was that you just bounce
this type of patch automatically to the "rejected for <x>" section of the
commitfest wiki page in cases like these.  The standard practice on this
sort of queue is to automatically reclassify when someone has made a pass
over the patch, leaving the original source to re-open with more
information.  That keeps the unprocessed part of the queue always
shrinking, and as long as people know that they can get it reconsidered by
submitting new results it's not unfair to them.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Sorting writes during checkpoint
Next
From: Alvaro Herrera
Date:
Subject: Re: configure option for XLOG_BLCKSZ