Re: Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Linux kernel impact on PostgreSQL performance
Date
Msg-id CAMkU=1ywpD7e0N04_5A0davq1OjKTYmR-s_18KYeUnDVejgQBg@mail.gmail.com
Whole thread Raw
In response to Re: Linux kernel impact on PostgreSQL performance  (Mel Gorman <mgorman@suse.de>)
Responses Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Mel Gorman <mgorman@suse.de>)
List pgsql-hackers
On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman <mgorman@suse.de> wrote:
On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote:
> >
> > That could be something we look at. There are cases buried deep in the
> > VM where pages get shuffled to the end of the LRU and get tagged for
> > reclaim as soon as possible. Maybe you need access to something like
> > that via posix_fadvise to say "reclaim this page if you need memory but
> > leave it resident if there is no memory pressure" or something similar.
> > Not exactly sure what that interface would look like or offhand how it
> > could be reliably implemented.
> >
>
> I think the "reclaim this page if you need memory but leave it resident if
> there is no memory pressure" hint would be more useful for temporary
> working files than for what was being discussed above (shared buffers).
>  When I do work that needs large temporary files, I often see physical
> write IO spike but physical read IO does not.  I interpret that to mean
> that the temporary data is being written to disk to satisfy either
> dirty_expire_centisecs or dirty_*bytes, but the data remains in the FS
> cache and so disk reads are not needed to satisfy it.  So a hint that says
> "this file will never be fsynced so please ignore dirty_*bytes and
> dirty_expire_centisecs.

It would be good to know if dirty_expire_centisecs or dirty ratio|bytes
were the problem here.

Is there an easy way to tell?  I would guess it has to be at least dirty_expire_centisecs, if not both, as a very large sort operation takes a lot more than 30 seconds to complete.
 
An interface that forces a dirty page to stay dirty
regardless of the global system would be a major hazard. It potentially
allows the creator of the temporary file to stall all other processes
dirtying pages for an unbounded period of time.

Are the dirty ratio/bytes limits the mechanisms by which adequate clean memory is maintained?  I thought those were there just to but a limit on long it would take to execute a sync call should one be issued, and there were other setting which said how much clean memory to maintain.  It should definitely write out the pages if it needs the memory for other things, just not write them out due to fear of how long it would take to sync it if a sync was called.  (And if it needs the memory, it should be able to write it out quickly as the writes would be mostly sequential, not random--although how the kernel can believe me that that will always be the case could a problem)

 
I proposed in another part
of the thread a hint for open inodes to have the background writer thread
ignore dirty pages belonging to that inode. Dirty limits and fsync would
still be obeyed. It might also be workable for temporary files but the
proposal could be full of holes.

If calling fsync would fail with an error, would that lower the risk of DoS?
 

Your alternative here is to create a private anonymous mapping as they
are not subject to dirty limits. This is only a sensible option if the
temporarily data is guaranteeed to be relatively small. If the shared
buffers, page cache and your temporary data exceed the size of RAM then
data will get discarded or your temporary data will get pushed to swap
and performance will hit the floor.

PostgreSQL mainly uses temp files precisely when that gaurantee is hard to make.  There is a pretty big margin where it is too big to be certain it will fit in memory, so we have to switch to a disk-friendly mostly-sequential algorithm.  Yet it would still be nice to avoid the actual disk writes until we have observed that it actually is growing too big.


Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Next
From: Andrew Dunstan
Date:
Subject: Re: new json funcs