Re: Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Linux kernel impact on PostgreSQL performance
Date
Msg-id CAMkU=1zsiMJ5nxbTG-YTpRFehB1-A+nUL0dYMjEqepxvFNj9bw@mail.gmail.com
Whole thread Raw
In response to Re: Linux kernel impact on PostgreSQL performance  (Jim Nasby <jim@nasby.net>)
Responses Re: Linux kernel impact on PostgreSQL performance
List pgsql-hackers
On Mon, Jan 13, 2014 at 12:32 PM, Jim Nasby <jim@nasby.net> wrote:
On 1/13/14, 2:27 PM, Claudio Freire wrote:
On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby <jim@nasby.net> wrote:
On 1/13/14, 2:19 PM, Claudio Freire wrote:

On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas <robertmhaas@gmail.com>
wrote:

On a related note, there's also the problem of double-buffering.  When
we read a page into shared_buffers, we leave a copy behind in the OS
buffers, and similarly on write-out.  It's very unclear what to do
about this, since the kernel and PostgreSQL don't have intimate
knowledge of what each other are doing, but it would be nice to solve
somehow.



There you have a much harder algorithmic problem.

You can basically control duplication with fadvise and WONTNEED. The
problem here is not the kernel and whether or not it allows postgres
to be smart about it. The problem is... what kind of smarts
(algorithm) to use.


Isn't this a fairly simple matter of when we read a page into shared buffers
tell the kernel do forget that page? And a corollary to that for when we
dump a page out of shared_buffers (here kernel, please put this back into
your cache).


That's my point. In terms of kernel-postgres interaction, it's fairly simple.

What's not so simple, is figuring out what policy to use.

I think the above is pretty simple for both interaction (allow us to inject a clean page into the file page cache) and policy (forget it after you hand it to us, then remember it again when we hand it back to you clean).  And I think it would pretty likely be an improvement over what we currently do.  But I think it is probably the wrong way to get the improvement.  I think the real problem is that we don't trust ourselves to manage more of the memory ourselves.  

As far as I know, we still don't have a publicly disclosable and readily reproducible test case for the reports of performance degradation when we have more than 8GB in shared_buffers.   If we had one of those, we could likely reduce the double buffering problem by fixing our own scalability issues and therefore taking responsibility for more of the data ourselves.



Remember,
you cannot tell the kernel to put some page in its page cache without
reading it or writing it. So, once you make the kernel forget a page,
evicting it from shared buffers becomes quite expensive.

Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of "here's some data from this range in this file. It's (clean|dirty). Put it in your cache. Just trust me on this."

Which, in the case of it being clean, amounts to "Here is data we don't want in memory any more because we think it is cold.  But we don't trust ourselves, so please hold on to it anyway."  That might be a tough sell to the kernel people.

 Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Next
From: James Bottomley
Date:
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance