Re:Re: BUG #15187: When use huge page, there may be a lot of hangedconnections with status startup or authentication - Mailing list pgsql-bugs

From chenhj
Subject Re:Re: BUG #15187: When use huge page, there may be a lot of hangedconnections with status startup or authentication
Date
Msg-id 293babf8.1089.16360f109b0.Coremail.chjischj@163.com
Whole thread Raw
In response to Re: BUG #15187: When use huge page, there may be a lot of hangedconnections with status startup or authentication  (Andres Freund <andres@anarazel.de>)
Responses Re:Re:Re: BUG #15187: When use huge page, there may be a lot ofhanged connections with status startup or authentication  (chenhj <chjischj@163.com>)
List pgsql-bugs
At 2018-05-07 02:57:12, "Andres Freund" <andres@anarazel.de> wrote:
>On 2018-05-06 23:45:17 +0800, chenhj wrote:
>> >>
>> >>Chen, have you disabled transparent hugepages and zone reclaim?
>> >>
>> >>Greetings,
>> >>
>> >>Andres Freund>>c) Depend on huge page >huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) >huge_page=off, not happen
>> >
>> >When disable transparent hugepages ,this problem also occurs.
>> >Aboud zone reclaim,I will see it later.
>> >What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
>> >d) Depend on PostgreSQL Version
>> >PostgreSQL 10.2 happen
>> >PostgreSQL 9.6 not happen
>> >Chen Huajun
>> The problem occurs whether vm.zone_reclaim_mode is set to 0 or 1.
>> 
>> In addition, what needs to be corrected is that even huge_pages=off is problematic.
>> 
>> Huge_pages = on SQL execution is a very slow , and with hangd connections in startup and auth state.
>> 
>
>You'd probably need to provide a few perf profiles to get further
>insight.
>
>Greetings,
>
>Andres Freund

According to test, this question is related to commit "ecb0d20a9d2e09b7112d3b192047f711f9ff7e59", which changed from Using SysV semaphores to Using POSIX semaphores on Linux. 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ecb0d20a9d2e09b7112d3b192047f711f9ff7e59
Use unnamed POSIX semaphores, if available, on Linux and FreeBSD.
We've had support for using unnamed POSIX semaphores instead of System Vsemaphores for quite some time, but it was not used by default on anyplatform.  Since many systems have rather small limits on the number ofSysV semaphores allowed, it seems desirable to switch to POSIX semaphoreswhere they're available and don't create performance or kernel resourceproblems.  Experimentation by me shows that unnamed POSIX semaphoresare at least as good as SysV semaphores on Linux, and we previously hada report from Maksym Sobolyev that FreeBSD is significantly worse withSysV semaphores than POSIX ones.  So adjust those two platforms to useunnamed POSIX semaphores, if configure can find the necessary libraryfunctions.  If this goes well, we may switch other platforms as well,but it would be advisable to test them individually first.
It's not currently contemplated that we'd encourage users to selecta semaphore API for themselves, but anyone who wants to experimentcan add PREFERRED_SEMAPHORES=UNNAMED_POSIX (or NAMED_POSIX, or SYSV)to their configure command line to do so.
I also tweaked configure to report which API it's selected, mainlyso that we can tell that from buildfarm reports.
I did not touch the user documentation's discussion about semaphores;that will need some adjustment once the dust settles.
Discussion: <8536.1475704230@sss.pgh.pa.us>

This is why, this problem does not occur on 9.6.2, and it occurs on 10.2.

As to why? Perhaps this is a bug in the Linux kernel. However, it is not clear from which version of the Linux kernel "fixed?" this problem. The problem still occurs after upgrading the CentOS 6.5 kernel from 2.6.32-431 to 2.6.32-504.23.4.
To avoid this problem, may be the only way is upgrading the CentOS to higher version(such as CentOS 7.3).

Regards,
Chen Huajun

pgsql-bugs by date:

Previous
From: 007reader
Date:
Subject: Re: Abnormal JSON query performance
Next
From: Dmitry Dolgov
Date:
Subject: Re: Abnormal JSON query performance