Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: checkpointer continuous flushing
Date
Msg-id CAA4eK1K5yZJAQxyfz5BsUDDyTcic1UXdDegnCCLYFRLPGAsxQA@mail.gmail.com
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Tue, Aug 18, 2015 at 1:02 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Andres,

[...] posix_fadvise().

My current thinking is "maybe yes, maybe no":-), as it may depend on the OS
implementation of posix_fadvise, so it may differ between OS.

As long as fadvise has no 'undirty' option, I don't see how that
problem goes away. You're telling the OS to throw the buffer away, so
unless it ignores it that'll have consequences when you read the page
back in.

Yep, probably.

Note that we are talking about checkpoints, which "write" buffers out *but* keep them nevertheless. As the buffer is kept, the OS page is a duplicate, and freeing it should not harm, at least immediatly.


This theory could makes sense if we can predict in some way that
the data we are flushing out of OS cache won't be needed soon.
After flush, we can only rely to an extent that data could be found in
shared_buffers if the usage_count is high, other wise it could be
replaced any moment by backend needing the buffer and there is no
free buffer.  Now here one way to think is that if the usage_count is
low, then anyway it's okay to assume that this won't be needed in near
future, however I don't think relying only on usage_count for such a thing
is good idea.

To sum up, I agree that it is indeed possible that flushing with posix_fadvise could reduce read OS-memory hits on some systems for some workloads, although not on Linux, see below.

So the option is best kept as "off" for now, without further data, I'm fine with that.


One point to think here is on what basis user can decide make
this option on, is it predictable in any way?
I think one case could be when the data set fits in shared_buffers.

In general, providing an option is a good idea if user can decide with
ease when to use that option or we can give some clear recommendation
for the same otherwise one has to recommend that test your workload
with this option and if it works then great else don't use it which might also
be okay in some cases, but it is better to be clear.


One minor point, while glancing through the patch, I noticed that couple
of multiline comments are not written in the way which is usually used
in code (Keep the first line as empty).

+/* Status of buffers to checkpoint for a particular tablespace,

+ * used internally in BufferSync.

+ * - space: oid of the tablespace

+ * - num_to_write: number of checkpoint pages counted for this tablespace

+ * - num_written: number of pages actually written out



+/* entry structure for table space to count hashtable,

+ * used internally in BufferSync.

+ */



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Potential GIN vacuum bug
Next
From: Jeff Janes
Date:
Subject: Re: Potential GIN vacuum bug