Thread: contrib/start-scripts/linux failing on RHEL 6 with ~9.3 because of missing oom_score_adj

Hi all,

Support for oom_score_adj has been added in kernel 2.6.36, a version
newer than 2.6.32-71 which is embedded in RHEL6, so the startup script
contrib/start-scripts/linux would fail because it does not check for
the presence of oom_score_adj before adding the value of OOM_SCORE_ADJ
to it.

We could argue that this is not directly a problem of PG itself as one
could just copy the startup script from the code tree to create its
own service script and fix it, still I am getting the feeling that we
should show a better logic with something like the patch attached.
Thoughts?
--
Michael

Attachment
Michael Paquier <michael.paquier@gmail.com> writes:
> Support for oom_score_adj has been added in kernel 2.6.36, a version
> newer than 2.6.32-71 which is embedded in RHEL6, so the startup script
> contrib/start-scripts/linux would fail because it does not check for
> the presence of oom_score_adj before adding the value of OOM_SCORE_ADJ
> to it.

I'm not sure where you draw the conclusion that RHEL6 doesn't have
oom_score_adj.  Mine certainly does:

[tgl@sss1 pgsql]$ uname -a
Linux sss1.sss.pgh.pa.us 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

[tgl@sss1 pgsql]$ ls /proc/self/
attr/       coredump_filter  io         mountstats     pagemap      stack
autogroup   cpuset           limits     net/           personality  stat
auxv        cwd@             loginuid   ns/            root@        statm
cgroup      environ          maps       numa_maps      sched        status
clear_refs  exe@             mem        oom_adj        schedstat    syscall
cmdline     fd/              mountinfo  oom_score      sessionid    task/
comm        fdinfo/          mounts     oom_score_adj  smaps        wchan

I'm prepared to believe that the original upstream 2.6.32 didn't have
oom_score_adj, but Red Hat often back-ports kernel features.

> We could argue that this is not directly a problem of PG itself as one
> could just copy the startup script from the code tree to create its
> own service script and fix it, still I am getting the feeling that we
> should show a better logic with something like the patch attached.

This patch seems completely wrong, as it is forcing use of the oom
adjustment, whereas the intention is that the user should choose
if they want to use it; cf /contrib/start-scripts/linux lines 43-55.

Also, it kind of looks like you are working with something older
than what is in HEAD, anyway.

            regards, tom lane
On Sun, Apr 26, 2015 at 8:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> Support for oom_score_adj has been added in kernel 2.6.36, a version
>> newer than 2.6.32-71 which is embedded in RHEL6, so the startup script
>> contrib/start-scripts/linux would fail because it does not check for
>> the presence of oom_score_adj before adding the value of OOM_SCORE_ADJ
>> to it.
>
> I'm not sure where you draw the conclusion that RHEL6 doesn't have
> oom_score_adj.  Mine certainly does:
>
> [tgl@sss1 pgsql]$ uname -a
> Linux sss1.sss.pgh.pa.us 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64
GNU/Linux
>
> [tgl@sss1 pgsql]$ ls /proc/self/
> attr/       coredump_filter  io         mountstats     pagemap      stack
> autogroup   cpuset           limits     net/           personality  stat
> auxv        cwd@             loginuid   ns/            root@        statm
> cgroup      environ          maps       numa_maps      sched        status
> clear_refs  exe@             mem        oom_adj        schedstat    syscall
> cmdline     fd/              mountinfo  oom_score      sessionid    task/
> comm        fdinfo/          mounts     oom_score_adj  smaps        wchan
>
> I'm prepared to believe that the original upstream 2.6.32 didn't have
> oom_score_adj, but Red Hat often back-ports kernel features.

Well, it may have been a different version then because I was reported
by a colleague that this was failing because of oom_score_adj missing
;)

>> We could argue that this is not directly a problem of PG itself as one
>> could just copy the startup script from the code tree to create its
>> own service script and fix it, still I am getting the feeling that we
>> should show a better logic with something like the patch attached.
>
> This patch seems completely wrong, as it is forcing use of the oom
> adjustment, whereas the intention is that the user should choose
> if they want to use it; cf /contrib/start-scripts/linux lines 43-55.

On REL9_3_STABLE, the condition the startup script has is that:
test x"$OOM_SCORE_ADJ" != x && echo "$OOM_SCORE_ADJ" > /proc/self/oom_score_adj
test x"$OOM_ADJ" != x && echo "$OOM_ADJ" > /proc/self/oom_adj

My patch was adding a condition to check the existence of
/proc/self/oom_score_adj and /proc/self/oom_adj, leaving OOM_ADJ and
OOM_SCORE_ADJ uncommented, so the value will not be set if user does
not uncomment it. But perhaps I am missing something.

> Also, it kind of looks like you are working with something older
> than what is in HEAD, anyway.

The previous patch applies to REL9_3_STABLE and older, not newer
versions FWIW. Sorry for not being clearer about that.
Regards,
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
> On Sun, Apr 26, 2015 at 8:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This patch seems completely wrong, as it is forcing use of the oom
>> adjustment, whereas the intention is that the user should choose
>> if they want to use it; cf /contrib/start-scripts/linux lines 43-55.

> On REL9_3_STABLE, the condition the startup script has is that:
> test x"$OOM_SCORE_ADJ" != x && echo "$OOM_SCORE_ADJ" > /proc/self/oom_score_adj
> test x"$OOM_ADJ" != x && echo "$OOM_ADJ" > /proc/self/oom_adj

Yeah, but by default $OOM_SCORE_ADJ is unset so that the echo is not
executed.  What you are proposing is that if the user has accidentally
uncommented the wrong one of the two variables, the script should silently
do nothing instead of failing.  I think that's an actively bad idea,
because then it would be far too hard to notice that you'd misconfigured
it.

In any case, I think changing the behavior of the script in stable
branches would be unwise.  People generally will use these things as
templates, they won't just blindly copy them (at least I hope not);
and they're unlikely to overwrite an existing live script file when
installing a minor update release.

We have changed the script, hopefully for the better, in HEAD --- if
you think the behavior still leaves something to be desired, debating
that would certainly be fair game.  (I notice that HEAD *is* using a
"test -e", which according to my argument above maybe isn't a good thing.)

            regards, tom lane
On Sun, Apr 26, 2015 at 10:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Sun, Apr 26, 2015 at 8:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> This patch seems completely wrong, as it is forcing use of the oom
>>> adjustment, whereas the intention is that the user should choose
>>> if they want to use it; cf /contrib/start-scripts/linux lines 43-55.
>
>> On REL9_3_STABLE, the condition the startup script has is that:
>> test x"$OOM_SCORE_ADJ" != x && echo "$OOM_SCORE_ADJ" > /proc/self/oom_score_adj
>> test x"$OOM_ADJ" != x && echo "$OOM_ADJ" > /proc/self/oom_adj
>
> Yeah, but by default $OOM_SCORE_ADJ is unset so that the echo is not
> executed.  What you are proposing is that if the user has accidentally
> uncommented the wrong one of the two variables, the script should silently
> do nothing instead of failing.  I think that's an actively bad idea,
> because then it would be far too hard to notice that you'd misconfigured
> it.

Hm. OK. That sounds like a fair argument as well. The user I got the
complaint from was just complaining about the opposite as service
startup refused to start up with the default values, but in this case
a copy of the startup script is maintained... So that's not a big
deal.

> In any case, I think changing the behavior of the script in stable
> branches would be unwise.  People generally will use these things as
> templates, they won't just blindly copy them (at least I hope not);
> and they're unlikely to overwrite an existing live script file when
> installing a minor update release.

OK.

> We have changed the script, hopefully for the better, in HEAD --- if
> you think the behavior still leaves something to be desired, debating
> that would certainly be fair game.  (I notice that HEAD *is* using a
> "test -e", which according to my argument above maybe isn't a good thing.)

The behavior in HEAD is far better, and nothing to say about that ;)
Regards,
--
Michael