Re: Two Necessary Kernel Tweaks for Linux Systems - Mailing list pgsql-performance

From AJ Weber
Subject Re: Two Necessary Kernel Tweaks for Linux Systems
Date
Msg-id 50EC7C24.2020203@comcast.net
Whole thread Raw
In response to Re: Two Necessary Kernel Tweaks for Linux Systems  (Shaun Thomas <sthomas@optionshouse.com>)
Responses Re: Two Necessary Kernel Tweaks for Linux Systems  (Shaun Thomas <sthomas@optionshouse.com>)
List pgsql-performance
When I checked these, both of these settings exist on my CentOS 6.x host
(2.6.32-279.5.1.el6.x86_64).

However, the autogroup_enabled was already set to 0.  (The
migration_cost was set to the 0.5ms, default noted in the OP.)  So I
don't know if this is strictly limited to kernel 3.0.

Is there an "easy" way to tell what scheduler my OS is using?

-AJ


On 1/8/2013 2:32 PM, Shaun Thomas wrote:
> On 01/08/2013 01:04 PM, Scott Marlowe wrote:
>
>> Assembly language on the brain.  of course I meant NOOP.
>
> Ok, in that case, these are completely separate things. For IO
> scheduling, there's the Completely Fair Queue (CFQ), NOOP, Deadline,
> and so on.
>
> For process scheduling, at least recently, there's Completely Fair
> Scheduler or nothing. So far as I can tell, there is no alternative
> process scheduler. Just as I can't find an alternative memory manager
> that I can tell to stop flushing my freaking active file cache due to
> phantom memory pressure. ;)
>
> The tweaks I was discussing in this thread effectively do two things:
>
> 1. Stop process grouping by TTY.
>
> On servers, this really is a net performance loss. Especially on
> heavily forked apps like PG. System % is about 5% lower since the
> scheduler is doing less work, but at the cost of less spreading across
> available CPUs. Our systems see a 30% performance hit with grouping
> enabled, others may see more or less.
>
> 2. Less aggressive process scheduling.
>
> The O(log N) scheduler heuristics collapse at high process counts for
> some reason, causing the scheduler to spend more and more time
> planning CPU assignments until it spirals completely out of control.
> I've seen this behavior on 3.0 kernels straight to 3.5, so it looks
> like an inherent weakness of CFS. By increasing migration cost, we
> make the scheduler do less work less often, so that weird 70+% system
> CPU spike vanishes.
>
> My guess is the increased migration cost basically offsets the point
> at which the scheduler would freak out. I've tested up to 2000
> connections, and it responds fine, whereas before we were seeing flaky
> results as early as 700 connections.
>
> My guess as to why this is? I think it's due to VSZ as perceived by
> the scheduler. To swap processes, it also has to preload L2 and L3
> cache for the assigned process. As the number of PG connections
> increase, all with their own VSZ/RSS allocations, the scheduler has
> more thinking to do. At a point when the sum of VSZ/RSS eclipses the
> amount of available RAM, the scheduler loses nearly all
> decision-making ability and craps its pants.
>
> This would also explain why I'm seeing something similar with memory.
> At high connection counts, even though %used is fine, and we have over
> 40GB free for caching. VSZ/RSS are both way bigger than available
> cache, so memory pressure causes kswapd to continuously purge the
> active cache pool into inactive, and inactive into free, all while the
> device attempts to fill the active pool. It's an IO feedback loop, and
> around the same number of connections that used to make the process
> scheduler die. Too much of a coincidence, in my opinion.
>
> But unlike the process scheduler, there are no good knobs to turn that
> will fix the memory manager's behavior. At least, not in 3.0, 3.2, or
> 3.4 kernels.
>
> But I freely admit I'm just speculating based on observed behavior. I
> know neither jack, nor squat about internal kernel mechanics. Anyone
> who actually *isn't* talking out of his ass is free to interject. :)
>


pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: Two Necessary Kernel Tweaks for Linux Systems
Next
From: Shaun Thomas
Date:
Subject: Re: Two Necessary Kernel Tweaks for Linux Systems