Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Date
Msg-id CAHyXU0yvKy2jgqPWO1ZdYSdVuH6YS_H==RDZ0jxuGCEOk-q1cw@mail.gmail.com
Whole thread Raw
In response to Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
On Fri, Sep 13, 2013 at 4:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
>
>> Absolutely not claiming the contrary. I think it sucks that we
>> couldn't fully figure out what's happening in detail. I'd love to
>> get my hand on a setup where it can be reliably reproduced.
>
> I have seen two completely different causes for symptoms like this,
> and I suspect that these aren't the only two.
>
> (1)  The dirty page avalanche: PostgreSQL hangs on to a large
> number of dirty buffers and then dumps a lot of them at once.  The
> OS does the same.  When PostgreSQL dumps its buffers to the OS it
> pushes the OS over a "tipping point" where it is writing dirty
> buffers too fast for the controller's BBU cache to absorb them.
> Everything freezes until the controller writes and accepts OS
> writes for a lot of data.  This can take several minutes, during
> which time the database seems "frozen".  Cure is some combination
> of these: reduce shared_buffers, make the background writer more
> aggressive, checkpoint more often, make the OS dirty page writing
> more aggressive, add more BBU RAM to the controller.

Yeah -- I've seen this too, and it's a well understood problem.
Getting o/s to spin dirty pages out faster is the name of the game I
think.  Storage is getting so fast that it's (mostly) moot anyways.
Also, this is under the umbrella of 'high i/o' -- the stuff I've been
seeing  is low- or no- I/o.

> (2)  Transparent huge page support goes haywire on its defrag work.
> Clues on this include very high "system" CPU time during an
> episode, and `perf top` shows more time in kernel spinlock
> functions than anywhere else.  The database doesn't completely lock
> up like with the dirty page avalanche, but it is slow enough that
> users often describe it that way.  So far I have only seen this
> cured by disabling THP support (in spite of some people urging that
> just the defrag be disabled).  It does make me wonder whether there
> is something we could do in PostgreSQL to interact better with
> THPs.

Ah, that's a useful tip; need to research that, thanks.  Maybe Josh
might be able to give it a whirl...

merlin



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Next
From: Josh Berkus
Date:
Subject: Re: Large shared_buffer stalls WAS: proposal: Set effective_cache_size to greater of .conf value, shared_buffers