Home > mailing lists

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: checkpointer continuous flushing
Date	August 18, 2015 06:46:34
Msg-id	CAA4eK1K5yZJAQxyfz5BsUDDyTcic1UXdDegnCCLYFRLPGAsxQA@mail.gmail.com Whole thread Raw
In response to	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
List	pgsql-hackers

Tree view

On Tue, Aug 18, 2015 at 1:02 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Andres,

[...] posix_fadvise().

My current thinking is "maybe yes, maybe no":-), as it may depend on the OS
implementation of posix_fadvise, so it may differ between OS.

As long as fadvise has no 'undirty' option, I don't see how that
problem goes away. You're telling the OS to throw the buffer away, so
unless it ignores it that'll have consequences when you read the page
back in.

Yep, probably.

Note that we are talking about checkpoints, which "write" buffers out *but* keep them nevertheless. As the buffer is kept, the OS page is a duplicate, and freeing it should not harm, at least immediatly.

This theory could makes sense if we can predict in some way that

the data we are flushing out of OS cache won't be needed soon.

After flush, we can only rely to an extent that data could be found in

shared_buffers if the usage_count is high, other wise it could be

replaced any moment by backend needing the buffer and there is no

free buffer. Now here one way to think is that if the usage_count is

low, then anyway it's okay to assume that this won't be needed in near

future, however I don't think relying only on usage_count for such a thing

is good idea.

To sum up, I agree that it is indeed possible that flushing with posix_fadvise could reduce read OS-memory hits on some systems for some workloads, although not on Linux, see below.

So the option is best kept as "off" for now, without further data, I'm fine with that.

One point to think here is on what basis user can decide make

this option on, is it predictable in any way?

I think one case could be when the data set fits in shared_buffers.

In general, providing an option is a good idea if user can decide with

ease when to use that option or we can give some clear recommendation

for the same otherwise one has to recommend that test your workload

with this option and if it works then great else don't use it which might also

be okay in some cases, but it is better to be clear.

One minor point, while glancing through the patch, I noticed that couple

of multiline comments are not written in the way which is usually used

in code (Keep the first line as empty).

+/* Status of buffers to checkpoint for a particular tablespace,

+ * used internally in BufferSync.

+ * - space: oid of the tablespace

+ * - num_to_write: number of checkpoint pages counted for this tablespace

+ * - num_written: number of pages actually written out

+/* entry structure for table space to count hashtable,

+ * used internally in BufferSync.

+ */

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Alvaro Herrera
Date: 18 August 2015, 04:37:52
Subject: Re: Potential GIN vacuum bug

From: Jeff Janes
Date: 18 August 2015, 07:12:02
Subject: Re: Potential GIN vacuum bug

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

Previous

Next