Re: Spread checkpoint sync - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Spread checkpoint sync
Date
Msg-id 201011212345.50499.andres@anarazel.de
Whole thread Raw
In response to Re: Spread checkpoint sync  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Spread checkpoint sync
List pgsql-hackers
On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:
> For a similar problem we had (kernel buffering too much) we had success
> using the fadvise and madvise WONTNEED syscalls to force the data to
> exit the cache much sooner than it would otherwise. This was on Linux
> and it had the side-effect that the data was deleted from the kernel
> cache, which we wanted, but probably isn't appropriate here.
Yep, works fine. Although it has the issue that the data will get read again if 
archiving/SR is enabled.

> There is also sync_file_range, but that's linux specific, although
> close to what you want I think. It would allow you to work with blocks
> smaller than 1GB.
Unfortunately that puts the data under quite high write-out pressure inside 
the kernel - which is not what you actually want because it limits reordering 
and such significantly.

It would be nicer if you could get a mix of both semantics (looking at it, 
depending on the approach that seems to be about a 10 line patch to the 
kernel). I.e. indicate that you want to write the pages soonish, but don't put 
it on the head of the writeout queue.

Andres


pgsql-hackers by date:

Previous
From: Vaibhav Kaushal
Date:
Subject: Re: Fwd: What do these terms mean in the SOURCE CODE?
Next
From: Josh Berkus
Date:
Subject: Re: Spread checkpoint sync