Re: Sorting writes during checkpoint - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Sorting writes during checkpoint
Date
Msg-id 1215160630.4051.19.camel@ebony.site
Whole thread Raw
In response to Re: Sorting writes during checkpoint  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Sorting writes during checkpoint  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Sorting writes during checkpoint  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-patches
On Mon, 2008-05-05 at 00:23 -0400, Tom Lane wrote:
> Greg Smith <gsmith@gregsmith.com> writes:
> > On Sun, 4 May 2008, Tom Lane wrote:
> >> Well, I tried a pgbench test similar to that one --- on smaller hardware
> >> than was reported, so it was a bit smaller test case, but it should have
> >> given similar results.
>
> > ... If
> > you're not offloading to another device like that, the OS-level elevator
> > sorting will handle sorting for you close enough to optimally that I doubt
> > this will help much (and in fact may just get in the way).
>
> Yeah.  It bothers me a bit that the patch forces writes to be done "all
> of file A in order, then all of file B in order, etc".  We don't know
> enough about the disk layout of the files to be sure that that's good.
> (This might also mean that whether there is a win is going to be
> platform and filesystem dependent ...)

No action on this seen since last commitfest, but I think we should do
something with it, rather than just ignore it.

Agree with all comments myself, so proposed solution is to implement
this as an I/O elevator hook. Standard elevator is to issue them in
order as they come, additional elevator in contrib is file/block sorted.
That will make testing easier and will also give Itagaki his benefit,
while allowing on-going research. If this solution's good enough for
Linux it ought to be good enough for us.

Note that if we do this for checkpoint we should also do this for
FlushRelationBuffers(), used during heap_sync(), for exactly the same
reasons.

Would suggest calling it bulk_io_hook() or similar.

Further observation would be that if there was an effect then it would
be at the block-device level, i.e. tablespace. Sorting the writes so
that we issued one tablespace at a time might at least help the I/O
elevators/disk caches to work with the whole problem at once. We might
get benefit on one tablespace but not on another.

Sorting by file might have inadvertently shown benefit at the tablespace
level on a larger server with spread out data whereas on Tom's test
system I would guess just a single tablespace was used.

Anyway, I note that we don't have an easy way of sorting by tablespace,
but I'm sure it would be possible to look up the tablespace for a file
within a plugin.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


pgsql-patches by date:

Previous
From: Simon Riggs
Date:
Subject: Re: WIP: executor_hook for pg_stat_statements
Next
From: Oleg Bartunov
Date:
Subject: Re: Multi-column GIN