Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Trond Myklebust
Subject Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date
Msg-id 190E6EA3-7B06-4315-9E4C-33FBEC961531@gmail.com
Whole thread Raw
In response to Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Hannu Krosing <hannu@2ndQuadrant.com>)
Responses Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (James Bottomley <James.Bottomley@HansenPartnership.com>)
List pgsql-hackers
On Jan 13, 2014, at 19:03, Hannu Krosing <hannu@2ndQuadrant.com> wrote:

> On 01/13/2014 09:53 PM, Trond Myklebust wrote:
>> On Jan 13, 2014, at 15:40, Andres Freund <andres@2ndquadrant.com> wrote:
>>
>>> On 2014-01-13 15:15:16 -0500, Robert Haas wrote:
>>>> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>>>> I notice, Josh, that you didn't mention the problems many people
>>>>> have run into with Transparent Huge Page defrag and with NUMA
>>>>> access.
>>>> Amen to that.  Actually, I think NUMA can be (mostly?) fixed by
>>>> setting zone_reclaim_mode; is there some other problem besides that?
>>> I think that fixes some of the worst instances, but I've seen machines
>>> spending horrible amounts of CPU (& BUS) time in page reclaim
>>> nonetheless. If I analyzed it correctly it's in RAM << working set
>>> workloads where RAM is pretty large and most of it is used as page
>>> cache. The kernel ends up spending a huge percentage of time finding and
>>> potentially defragmenting pages when looking for victim buffers.
>>>
>>>> On a related note, there's also the problem of double-buffering.  When
>>>> we read a page into shared_buffers, we leave a copy behind in the OS
>>>> buffers, and similarly on write-out.  It's very unclear what to do
>>>> about this, since the kernel and PostgreSQL don't have intimate
>>>> knowledge of what each other are doing, but it would be nice to solve
>>>> somehow.
>>> I've wondered before if there wouldn't be a chance for postgres to say
>>> "my dear OS, that the file range 0-8192 of file x contains y, no need to
>>> reread" and do that when we evict a page from s_b but I never dared to
>>> actually propose that to kernel people...
>> O_DIRECT was specifically designed to solve the problem of double buffering
>> between applications and the kernel. Why are you not able to use that in these situations?
> What is asked is the opposite of O_DIRECT - the write from a buffer inside
> postgresql to linux *buffercache* and telling linux that it is the same
> as what
> is currently on disk, so don't bother to write it back ever.

I don’t understand. Are we talking about mmap()ed files here? Why would the kernel be trying to write back pages that
aren’tdirty? 

> This would avoid current double-buffering between postgresql and linux
> buffer caches while still making use of linux cache when possible.
>
> The use case is  pages that postgresql has moved into its buffer cache
> but which it has not modified. They will at some point be evicted from the
> postgresql cache, but it is likely that they will still be needed
> sometime soon,
> so what is required is "writing them back" to the original file, only
> they should
> not really be written - or marked dirty to be written later - more
> levels than
> just to the linux cache, as they *already* are on the disk.
>
> It is probably ok to put them in the LRU position as they are "written"
> out from postgresql, though it may be better if we get some more control
> over
> where in the LRU order they would be placed. It may make sense to put them
> there based on when they were last read while residing inside postgresql
> cache
>
> Cheers
>
>
> --
> Hannu Krosing
> PostgreSQL Consultant
> Performance, Scalability and High Availability
> 2ndQuadrant Nordic OÜ




pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: plpgsql.consistent_into
Next
From: James Bottomley
Date:
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance