Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date
Msg-id 52D47EC6.4080007@2ndQuadrant.com
Whole thread Raw
In response to Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Trond Myklebust <trondmy@gmail.com>)
Responses Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
List pgsql-hackers
On 01/13/2014 09:53 PM, Trond Myklebust wrote:
> On Jan 13, 2014, at 15:40, Andres Freund <andres@2ndquadrant.com> wrote:
>
>> On 2014-01-13 15:15:16 -0500, Robert Haas wrote:
>>> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>>> I notice, Josh, that you didn't mention the problems many people
>>>> have run into with Transparent Huge Page defrag and with NUMA
>>>> access.
>>> Amen to that.  Actually, I think NUMA can be (mostly?) fixed by
>>> setting zone_reclaim_mode; is there some other problem besides that?
>> I think that fixes some of the worst instances, but I've seen machines
>> spending horrible amounts of CPU (& BUS) time in page reclaim
>> nonetheless. If I analyzed it correctly it's in RAM << working set
>> workloads where RAM is pretty large and most of it is used as page
>> cache. The kernel ends up spending a huge percentage of time finding and
>> potentially defragmenting pages when looking for victim buffers.
>>
>>> On a related note, there's also the problem of double-buffering.  When
>>> we read a page into shared_buffers, we leave a copy behind in the OS
>>> buffers, and similarly on write-out.  It's very unclear what to do
>>> about this, since the kernel and PostgreSQL don't have intimate
>>> knowledge of what each other are doing, but it would be nice to solve
>>> somehow.
>> I've wondered before if there wouldn't be a chance for postgres to say
>> "my dear OS, that the file range 0-8192 of file x contains y, no need to
>> reread" and do that when we evict a page from s_b but I never dared to
>> actually propose that to kernel people...
> O_DIRECT was specifically designed to solve the problem of double buffering 
> between applications and the kernel. Why are you not able to use that in these situations?
What is asked is the opposite of O_DIRECT - the write from a buffer inside
postgresql to linux *buffercache* and telling linux that it is the same
as what
is currently on disk, so don't bother to write it back ever.

This would avoid current double-buffering between postgresql and linux
buffer caches while still making use of linux cache when possible.

The use case is  pages that postgresql has moved into its buffer cache
but which it has not modified. They will at some point be evicted from the
postgresql cache, but it is likely that they will still be needed
sometime soon,
so what is required is "writing them back" to the original file, only
they should
not really be written - or marked dirty to be written later - more
levels than
just to the linux cache, as they *already* are on the disk.

It is probably ok to put them in the LRU position as they are "written"
out from postgresql, though it may be better if we get some more control
over
where in the LRU order they would be placed. It may make sense to put them
there based on when they were last read while residing inside postgresql
cache

Cheers


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: plpgsql.consistent_into
Next
From: Hannu Krosing
Date:
Subject: Re: Disallow arrays with non-standard lower bounds