Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers
From | Trond Myklebust |
---|---|
Subject | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date | |
Msg-id | 190E6EA3-7B06-4315-9E4C-33FBEC961531@gmail.com Whole thread Raw |
In response to | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (Hannu Krosing <hannu@2ndQuadrant.com>) |
Responses |
Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
|
List | pgsql-hackers |
On Jan 13, 2014, at 19:03, Hannu Krosing <hannu@2ndQuadrant.com> wrote: > On 01/13/2014 09:53 PM, Trond Myklebust wrote: >> On Jan 13, 2014, at 15:40, Andres Freund <andres@2ndquadrant.com> wrote: >> >>> On 2014-01-13 15:15:16 -0500, Robert Haas wrote: >>>> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner <kgrittn@ymail.com> wrote: >>>>> I notice, Josh, that you didn't mention the problems many people >>>>> have run into with Transparent Huge Page defrag and with NUMA >>>>> access. >>>> Amen to that. Actually, I think NUMA can be (mostly?) fixed by >>>> setting zone_reclaim_mode; is there some other problem besides that? >>> I think that fixes some of the worst instances, but I've seen machines >>> spending horrible amounts of CPU (& BUS) time in page reclaim >>> nonetheless. If I analyzed it correctly it's in RAM << working set >>> workloads where RAM is pretty large and most of it is used as page >>> cache. The kernel ends up spending a huge percentage of time finding and >>> potentially defragmenting pages when looking for victim buffers. >>> >>>> On a related note, there's also the problem of double-buffering. When >>>> we read a page into shared_buffers, we leave a copy behind in the OS >>>> buffers, and similarly on write-out. It's very unclear what to do >>>> about this, since the kernel and PostgreSQL don't have intimate >>>> knowledge of what each other are doing, but it would be nice to solve >>>> somehow. >>> I've wondered before if there wouldn't be a chance for postgres to say >>> "my dear OS, that the file range 0-8192 of file x contains y, no need to >>> reread" and do that when we evict a page from s_b but I never dared to >>> actually propose that to kernel people... >> O_DIRECT was specifically designed to solve the problem of double buffering >> between applications and the kernel. Why are you not able to use that in these situations? > What is asked is the opposite of O_DIRECT - the write from a buffer inside > postgresql to linux *buffercache* and telling linux that it is the same > as what > is currently on disk, so don't bother to write it back ever. I don’t understand. Are we talking about mmap()ed files here? Why would the kernel be trying to write back pages that aren’tdirty? > This would avoid current double-buffering between postgresql and linux > buffer caches while still making use of linux cache when possible. > > The use case is pages that postgresql has moved into its buffer cache > but which it has not modified. They will at some point be evicted from the > postgresql cache, but it is likely that they will still be needed > sometime soon, > so what is required is "writing them back" to the original file, only > they should > not really be written - or marked dirty to be written later - more > levels than > just to the linux cache, as they *already* are on the disk. > > It is probably ok to put them in the LRU position as they are "written" > out from postgresql, though it may be better if we get some more control > over > where in the LRU order they would be placed. It may make sense to put them > there based on when they were last read while residing inside postgresql > cache > > Cheers > > > -- > Hannu Krosing > PostgreSQL Consultant > Performance, Scalability and High Availability > 2ndQuadrant Nordic OÜ
pgsql-hackers by date: