Re: OS scheduler bugs affecting high-concurrency contention - Mailing list pgsql-hackers

From Andrea Suisani
Subject Re: OS scheduler bugs affecting high-concurrency contention
Date
Msg-id 5715F99C.4080000@opinioni.net
Whole thread Raw
In response to OS scheduler bugs affecting high-concurrency contention  (Kevin Grittner <kgrittn@gmail.com>)
List pgsql-hackers
Hi,

On 04/16/2016 04:15 PM, Kevin Grittner wrote:
> There is a paper that any one interested in performance at high
> concurrency, especially in Linux, should read[1].  While doing
> other work, a group of researchers saw behavior that they suspected
> was due to scheduler bugs in Linux.  There were no tools that made
> proving that practical, so they developed such a tool set and used
> it to find four bugs in the Linux kernel which were introduced in
> these releases, have not yet been fixed, and have this following
> maximum impact when running NAS benchmarks, based on running with
> and without the researchers' fixes for the bugs:
>
> 2.6.32:  22%
> 2,6.38:  13x
> 3.9:     27x
> 3.19:   138x
>
> That's right -- one of these OS scheduler bugs in production
> versions of Linux can make one of NASA's benchmarks run for 138
> times as long as it does without the bug.  I don't feel that I can
> interpret the results of any high-concurrency benchmarks in a
> meaningful way without knowing which of these bugs were present in
> the OS used for the benchmark.  Just as an example, it is helpful
> to know that the benchmarks Andres presented were run on 3.16, so
> it would have three of these OS bugs affecting results, but not the
> most severe one.  I encourage you to read the paper an draw your
> own conclusions.
>
> Anyway, please don't confuse this thread with the one on the
> "snapshot too old" patch -- I am still working on that and will
> post results there when they are ready.


Thanks for the link, appreciated.

On slightly related topic, Jens Axboe proposed a patchset [1]
to improve the performance of background buffered writeback.

On Lwn.net an article about the issue at hand has been recently published [2].

Maybe this work could somewhat solve the problem experienced by PostgreSQL users
while checkpoint process flushes all pending changes to disk and recycles the
transaction logs.

-- 
Andrea Suisani
suisani@opinioni.net
Demetra opinioni.net srl

[1] "[PATCHSET v3][RFC] Make background writeback not suck"
http://thread.gmane.org/gmane.linux.kernel/2186732

[2] "Toward less-annoying background writeback"
https://lwn.net/SubscriberLink/682582/93d9e5b6bed03a32/





pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Parser extensions (maybe for 10?)
Next
From: Aleksander Alekseev
Date:
Subject: Re: Coverage report