Re: Sorting writes during checkpoint - Mailing list pgsql-patches
From | Simon Riggs |
---|---|
Subject | Re: Sorting writes during checkpoint |
Date | |
Msg-id | 1215160630.4051.19.camel@ebony.site Whole thread Raw |
In response to | Re: Sorting writes during checkpoint (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Sorting writes during checkpoint
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Sorting writes during checkpoint (Greg Smith <gsmith@gregsmith.com>) |
List | pgsql-patches |
On Mon, 2008-05-05 at 00:23 -0400, Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: > > On Sun, 4 May 2008, Tom Lane wrote: > >> Well, I tried a pgbench test similar to that one --- on smaller hardware > >> than was reported, so it was a bit smaller test case, but it should have > >> given similar results. > > > ... If > > you're not offloading to another device like that, the OS-level elevator > > sorting will handle sorting for you close enough to optimally that I doubt > > this will help much (and in fact may just get in the way). > > Yeah. It bothers me a bit that the patch forces writes to be done "all > of file A in order, then all of file B in order, etc". We don't know > enough about the disk layout of the files to be sure that that's good. > (This might also mean that whether there is a win is going to be > platform and filesystem dependent ...) No action on this seen since last commitfest, but I think we should do something with it, rather than just ignore it. Agree with all comments myself, so proposed solution is to implement this as an I/O elevator hook. Standard elevator is to issue them in order as they come, additional elevator in contrib is file/block sorted. That will make testing easier and will also give Itagaki his benefit, while allowing on-going research. If this solution's good enough for Linux it ought to be good enough for us. Note that if we do this for checkpoint we should also do this for FlushRelationBuffers(), used during heap_sync(), for exactly the same reasons. Would suggest calling it bulk_io_hook() or similar. Further observation would be that if there was an effect then it would be at the block-device level, i.e. tablespace. Sorting the writes so that we issued one tablespace at a time might at least help the I/O elevators/disk caches to work with the whole problem at once. We might get benefit on one tablespace but not on another. Sorting by file might have inadvertently shown benefit at the tablespace level on a larger server with spread out data whereas on Tom's test system I would guess just a single tablespace was used. Anyway, I note that we don't have an easy way of sorting by tablespace, but I'm sure it would be possible to look up the tablespace for a file within a plugin. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
pgsql-patches by date: