Home > mailing lists

Re: Sorting writes during checkpoint - Mailing list pgsql-patches

From	Simon Riggs
Subject	Re: Sorting writes during checkpoint
Date	July 4, 2008 09:08:52
Msg-id	1215160630.4051.19.camel@ebony.site Whole thread Raw
In response to	Re: Sorting writes during checkpoint (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Sorting writes during checkpoint (Tom Lane <tgl@sss.pgh.pa.us>) Re: Sorting writes during checkpoint (Greg Smith <gsmith@gregsmith.com>)
List	pgsql-patches

Tree view

On Mon, 2008-05-05 at 00:23 -0400, Tom Lane wrote:
> Greg Smith <gsmith@gregsmith.com> writes:
> > On Sun, 4 May 2008, Tom Lane wrote:
> >> Well, I tried a pgbench test similar to that one --- on smaller hardware
> >> than was reported, so it was a bit smaller test case, but it should have
> >> given similar results.
>
> > ... If
> > you're not offloading to another device like that, the OS-level elevator
> > sorting will handle sorting for you close enough to optimally that I doubt
> > this will help much (and in fact may just get in the way).
>
> Yeah.  It bothers me a bit that the patch forces writes to be done "all
> of file A in order, then all of file B in order, etc".  We don't know
> enough about the disk layout of the files to be sure that that's good.
> (This might also mean that whether there is a win is going to be
> platform and filesystem dependent ...)

No action on this seen since last commitfest, but I think we should do
something with it, rather than just ignore it.

Agree with all comments myself, so proposed solution is to implement
this as an I/O elevator hook. Standard elevator is to issue them in
order as they come, additional elevator in contrib is file/block sorted.
That will make testing easier and will also give Itagaki his benefit,
while allowing on-going research. If this solution's good enough for
Linux it ought to be good enough for us.

Note that if we do this for checkpoint we should also do this for
FlushRelationBuffers(), used during heap_sync(), for exactly the same
reasons.

Would suggest calling it bulk_io_hook() or similar.

Further observation would be that if there was an effect then it would
be at the block-device level, i.e. tablespace. Sorting the writes so
that we issued one tablespace at a time might at least help the I/O
elevators/disk caches to work with the whole problem at once. We might
get benefit on one tablespace but not on another.

Sorting by file might have inadvertently shown benefit at the tablespace
level on a larger server with spread out data whereas on Tom's test
system I would guess just a single tablespace was used.

Anyway, I note that we don't have an easy way of sorting by tablespace,
but I'm sure it would be possible to look up the tablespace for a file
within a plugin.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

pgsql-patches by date:

From: Simon Riggs
Date: 04 July 2008, 09:08:38
Subject: Re: WIP: executor_hook for pg_stat_statements

From: Oleg Bartunov
Date: 04 July 2008, 09:13:28
Subject: Re: Multi-column GIN

Re: Sorting writes during checkpoint - Mailing list pgsql-patches

Previous

Next