Thread: BUG #15187: When use huge page,there may be a lot of hanged connections with status startup orauthentication

The following bug has been logged on the website:

Bug reference:      15187
Logged by:          Chen huajun
Email address:      chjischj@163.com
PostgreSQL version: 10.2
Operating system:   CentOS release 6.5
Description:

◆phenomenon
When use  huge page, there may be a lot of hanged connections with status
startup or authentication state when multiple concurrent query operations
are performed.

◆Env
hardware:Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz  16C/128G/300GB SSD
OS:CentOS release 6.5 x64
PostgreSQL:10.2

◆How to reproduct?
1) set huge_pages to on in postgresql.conf. 
2) run the following script

count=0
while [ $count -le 2000 ]; do
    psql -c 'select count(*) from test20180502;' &
    count=$((count + 1))
done

table define
----------------------------
CREATE TABLE test20180502  (
    c1 int not null,
    c2 character(10000) DEFAULT 'AAAAA',
    PRIMARY KEY(c1)
);


◆The investigation of the cause 
1、there will be a lot of improper connections with status startup or
authentication or SELECT state:
# ps -eaf | grep postgres |grep SELECT |wc
      2      24     204
# ps -eaf | grep postgres |grep authen |wc
    410    4920   45097
# ps -eaf | grep postgres |grep startup |wc
     78     936    8034
2、According to the stack information (*), it can be seen that the function
PGSemaphoreLock has been in a waiting state because of the incorrect SEMA
intelligence when performing the following operation. 
errStatus = sem_wait(PG_SEM_REF(sema));

3、By looking at the system table pg_stat_activity, I've found that the
wait_event_type and wait_event of some query operations are in the following
state. 
postgres=# select pid,wait_event_type,wait_event,state,backend_xid,query
from pg_stat_activity;
 pid  | wait_event_type |     wait_event      | state  | backend_xid |
                                 query
  

------+-----------------+---------------------+--------+-------------+--------------------------------------------------------------------------------------
 6333 | Activity        | AutoVacuumMain      |        |             | 
 6337 | Activity        | LogicalLauncherMain |        |             | 
 6408 | LWLock          | buffer_mapping      | active |             |
select count(*) from test20180502;
 6414 | LWLock          | buffer_mapping      | active |             |
select count(*) from test20180502;
 7509 |                 |                     | active |             |
select pid,wait_event_type,wait_event,state,backend_xid,query from
pg_stat_activity;
 6331 | Activity        | BgWriterMain        |        |             | 
 6330 | Activity        | CheckpointerMain    |        |             | 
 6332 | Activity        | WalWriterMain       |        |             | 
 
postgres=# select locktype,relation::regclass as
table,a.pid,mode,granted,substring(query,1,50) from pg_locks a,
pg_stat_activity b where a.pid = b.pid;
  locktype  |           table           | pid  |      mode       | granted |
                    substring                      

------------+---------------------------+------+-----------------+---------+----------------------------------------------------
 relation   | test20180502_pkey         | 6408 | AccessShareLock | t       |
select count(*) from test20180502;
 relation   | test20180502              | 6408 | AccessShareLock | t       |
select count(*) from test20180502;
 virtualxid |                           | 6408 | ExclusiveLock   | t       |
select count(*) from test20180502;
 relation   | test20180502_pkey         | 6414 | AccessShareLock | t       |
select count(*) from test20180502;
 relation   | test20180502              | 6414 | AccessShareLock | t       |
select count(*) from test20180502;
 virtualxid |                           | 6414 | ExclusiveLock   | t       |
select count(*) from test20180502;
 
*When there is a large number of improper connections in the startup or
authentication or SELECT state, the stack information is as follows: 
◇authentication state
#0  0x000000321dc0d930 in sem_wait () from /lib64/libpthread.so.0
#1  0x00000000006a5240 in PGSemaphoreLock ()
#2  0x00000000007152e5 in LWLockAcquire ()
#3  0x000000000070fc8f in LockRelease ()
#4  0x000000000070c323 in UnlockRelationId ()
#5  0x00000000004ad2d2 in relation_close ()
#6  0x000000000080ed21 in CatalogCacheInitializeCache ()
#7  0x0000000000810ee0 in SearchCatCache ()
#8  0x000000000081dc08 in GetSysCacheOid ()
#9  0x00000000007391a7 in get_role_oid ()
#10 0x000000000062d102 in hba_getauthmethod ()
#11 0x000000000062b729 in ClientAuthentication ()
#12 0x0000000000833fb3 in InitPostgres ()
#13 0x0000000000724b52 in PostgresMain ()
#14 0x00000000006b8c8a in PostmasterMain ()
#15 0x00000000006395e0 in main ()
◇startup state
#0  0x000000321dc0d930 in sem_wait () from /lib64/libpthread.so.0
#1  0x00000000006a5240 in PGSemaphoreLock ()
#2  0x00000000007152e5 in LWLockAcquire ()
#3  0x0000000000710508 in LockAcquireExtended ()
#4  0x000000000070cce8 in LockRelationOid ()
#5  0x00000000004ad405 in relation_open ()
#6  0x00000000004ad436 in heap_open ()
#7  0x000000000080ecce in CatalogCacheInitializeCache ()
#8  0x0000000000810ee0 in SearchCatCache ()
#9  0x0000000000845efd in superuser_arg ()
#10 0x0000000000833b17 in InitPostgres ()
#11 0x0000000000724b52 in PostgresMain ()
#12 0x00000000006b8c8a in PostmasterMain ()
#13 0x00000000006395e0 in main ()
◇SELECT state
#0  0x000000321dc0d930 in sem_wait () from /lib64/libpthread.so.0
#1  0x00000000006a5240 in PGSemaphoreLock ()
#2  0x00000000007152e5 in LWLockAcquire ()
#3  0x0000000000710508 in LockAcquireExtended ()
#4  0x000000000070cce8 in LockRelationOid ()
#5  0x00000000004ad405 in relation_open ()
#6  0x00000000004ad436 in heap_open ()
#7  0x000000000080ecce in CatalogCacheInitializeCache ()
#8  0x0000000000810ee0 in SearchCatCache ()
#9  0x000000000081d4e8 in get_tablespace ()
#10 0x000000000081d5c1 in get_tablespace_page_costs ()
#11 0x000000000066ba15 in cost_seqscan ()
#12 0x000000000069cc23 in create_seqscan_path ()
#13 0x0000000000665423 in set_rel_pathlist ()
#14 0x0000000000666716 in make_one_rel ()
#15 0x0000000000682e91 in query_planner ()
#16 0x0000000000687055 in grouping_planner ()
#17 0x00000000006887c9 in subquery_planner ()
#18 0x00000000006895bd in standard_planner ()
#19 0x00000000007238ca in pg_plan_query ()
#20 0x000000000072399e in pg_plan_queries ()
#21 0x000000000072406a in exec_simple_query ()
#22 0x0000000000724f09 in PostgresMain ()
#23 0x00000000006b8c8a in PostmasterMain ()
#24 0x00000000006395e0 in main ()

◆The additional occurrence condition
a)Depend on the number of table records. 
The probleam do not happen when table contains few records
 
tablename      record       result
test20180502    10000       not happen
                50000       not happen
                57500       happen
                65000       happen
                80000       happen
               100000       happen
              1000000       happen
test20180503    10000       not happen
                50000       not happen
                57500       not happen
                65000       not happen
                80000       not happen
               100000       happen
              1000000       happen

table define
----------------------------
CREATE TABLE test20180502  (
    c1 int not null,
    c2 character(10000) DEFAULT 'AAAAA',
    PRIMARY KEY(c1)
);

CREATE TABLE test20180503  (
    c1 int not null,
    PRIMARY KEY(c1)
);


b) Depend on OS
CentOS release 6.5 (Final) 2.6.32-431.el6.x86_64, happen
CentOS Linux release 7.3.1611 (Core) 3.10.0-514.el7.x86_64, not happen

c) Depend on huge page
huge_page=on, happen(no matter transparent_hugepage is [always] or
[never])
huge_page=off, not happen

d) Depend on PostgreSQL Version
PostgreSQL 10.2 happen
PostgreSQL 9.6 not happen


=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> Operating system:   CentOS release 6.5

> When use  huge page, there may be a lot of hanged connections with status
> startup or authentication state when multiple concurrent query operations
> are performed.

So don't do that.  RHEL6 kernels are pretty ancient at this point,
and are known to have performance issues with huge pages.  If you can
replicate the problem on a modern kernel, somebody might be motivated
to look into it; but not on that platform.  In any case it's very
unlikely that this is a Postgres problem rather than a kernel problem.

            regards, tom lane


On 2018-05-04 12:46:40 -0400, Tom Lane wrote:
> =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> > Operating system:   CentOS release 6.5
> 
> > When use  huge page, there may be a lot of hanged connections with status
> > startup or authentication state when multiple concurrent query operations
> > are performed.
> 
> So don't do that.  RHEL6 kernels are pretty ancient at this point,
> and are known to have performance issues with huge pages.

They're known to have perf issues with *transparent* huge pages, not
huge pages, right?

Chen, have you disabled transparent hugepages and zone reclaim?

Greetings,

Andres Freund



At 2018-05-04 23:54:34, "Andres Freund" <andres@anarazel.de> wrote: >On 2018-05-04 12:46:40 -0400, Tom Lane wrote: >> PG Bug reporting form <noreply@postgresql.org> writes: >> > Operating system: CentOS release 6.5 >> >> > When use huge page, there may be a lot of hanged connections with status >> > startup or authentication state when multiple concurrent query operations >> > are performed. >> >> So don't do that. RHEL6 kernels are pretty ancient at this point, >> and are known to have performance issues with huge pages. > >They're known to have perf issues with *transparent* huge pages, not >huge pages, right? > >Chen, have you disabled transparent hugepages and zone reclaim? > >Greetings, >
>Andres Freund

c) Depend on huge page huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) huge_page=off, not happen

When disable transparent hugepages ,this problem also occurs.
Aboud zone reclaim,I will see it later.
What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
d) Depend on PostgreSQL Version
PostgreSQL 10.2 happen
PostgreSQL 9.6 not happen
Chen Huajun

>>
>>Chen, have you disabled transparent hugepages and zone reclaim?
>>
>>Greetings,
>>
>>Andres Freund
>
>c) Depend on huge page >huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) >huge_page=off, not happen
>
>When disable transparent hugepages ,this problem also occurs.
>Aboud zone reclaim,I will see it later.
>What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
>d) Depend on PostgreSQL Version
>PostgreSQL 10.2 happen
>PostgreSQL 9.6 not happen
>Chen Huajun
The problem occurs whether vm.zone_reclaim_mode is set to 0 or 1.

In addition, what needs to be corrected is that even huge_pages=off is problematic.

Huge_pages = on SQL execution is a very slow , and with hangd connections in startup and auth state.

This is the stack when SQL is executed

$ pstack 31175
#0  0x00000036d76e15c3 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000862cca in pg_usleep ()
#2  0x0000000000715b19 in perform_spin_delay ()
#3  0x0000000000715c18 in s_lock ()
#4  0x000000000085b157 in GetOldSnapshotThresholdTimestamp ()
#5  0x00000000006f577e in TestForOldSnapshot_impl ()
#6  0x00000000004b4b2d in heap_getnext ()
#7  0x0000000000616e71 in SeqNext ()
#8  0x00000000005fc677 in ExecScan ()
#9  0x000000000060454e in fetch_input_tuple ()
#10 0x00000000006062e9 in ExecAgg ()
#11 0x00000000005f9a9f in standard_ExecutorRun ()
#12 0x00007f8fcff5d6f8 in pgss_ExecutorRun () from /usr/pgsql-10/lib/pg_stat_statements.so
#13 0x000000000072673b in PortalRunSelect ()
#14 0x00000000007278d1 in PortalRun ()
#15 0x0000000000723f51 in exec_simple_query ()
#16 0x0000000000724f09 in PostgresMain ()
#17 0x00000000006b8c8a in PostmasterMain ()
#18 0x00000000006395e0 in main ()

This is the execution time of the SQL

postgres=# select * from pg_stat_statements where query ~ 'test20180503';
-[ RECORD 1 ]-------+----------------------------------
userid              | 16388
dbid                | 13451
queryid             | 1902655528
query               | select count(*) from test20180503
calls               | 608
total_time          | 217341245.157586992
min_time            | 291100.481972999987
max_time            | 378225.054139000014
mean_time           | 357469.153219715168
stddev_time         | 24984.333161207207
rows                | 608
shared_blks_hit     | 2689946
shared_blks_read    | 2278
shared_blks_dirtied | 4425
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 33.5047190000000015
blk_write_time      | 0


While huge_pages=off thres are no connections hanged in startup or auth states, but SQL execution is also very slow for several hundred seconds, during which time the CPU utilization rate is 100%.

This is the SQL execution time when huge_pages=off

postgres=# select * from pg_stat_statements where query ~ 'test20180503';
-[ RECORD 1 ]-------+----------------------------------
userid              | 16388
dbid                | 13451
queryid             | 1902655528
query               | select count(*) from test20180503
calls               | 1001
total_time          | 650132764.661145091
min_time            | 600196.529144000029
max_time            | 672296.767421000055
mean_time           | 649483.281379765598
stddev_time         | 19423.3311969555834
rows                | 1001
shared_blks_hit     | 4432428
shared_blks_read    | 0
shared_blks_dirtied | 4425
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 0
blk_write_time      | 0


While On PostgreSQL9.6@CentOS6.5 and PostgreSQL10.2@CentOS7.3. The same way to execute SQL, only takes 2 seconds.

postgres=# select * from pg_stat_statements where query ~ 'test20180503';
-[ RECORD 1 ]-------+-----------------------------------
userid              | 16389
dbid                | 13269
queryid             | 3703798644
query               | select count(*) from test20180503;
calls               | 1001
total_time          | 2751692.53700000001
min_time            | 1108.32999999999993
max_time            | 4031.02199999999993
mean_time           | 2748.94359340659184
stddev_time         | 642.984323244542566
rows                | 1001
shared_blks_hit     | 4428003
shared_blks_read    | 4425
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 90.8639999999999759
blk_write_time      | 0
-------
Regards,Chen Huajun

On 2018-05-06 23:45:17 +0800, chenhj wrote:
> >>
> >>Chen, have you disabled transparent hugepages and zone reclaim?
> >>
> >>Greetings,
> >>
> >>Andres Freund>>c) Depend on huge page >huge_page=on, happen(no matter transparent_hugepage is [always] or [never])
>huge_page=off,not happen
 
> >
> >When disable transparent hugepages ,this problem also occurs.
> >Aboud zone reclaim,I will see it later.
> >What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
> >d) Depend on PostgreSQL Version
> >PostgreSQL 10.2 happen
> >PostgreSQL 9.6 not happen
> >Chen Huajun
> The problem occurs whether vm.zone_reclaim_mode is set to 0 or 1.
> 
> In addition, what needs to be corrected is that even huge_pages=off is problematic.
> 
> Huge_pages = on SQL execution is a very slow , and with hangd connections in startup and auth state.
> 

You'd probably need to provide a few perf profiles to get further
insight.

Greetings,

Andres Freund


At 2018-05-07 02:57:12, "Andres Freund" <andres@anarazel.de> wrote:
>On 2018-05-06 23:45:17 +0800, chenhj wrote:
>> >>
>> >>Chen, have you disabled transparent hugepages and zone reclaim?
>> >>
>> >>Greetings,
>> >>
>> >>Andres Freund>>c) Depend on huge page >huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) >huge_page=off, not happen
>> >
>> >When disable transparent hugepages ,this problem also occurs.
>> >Aboud zone reclaim,I will see it later.
>> >What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
>> >d) Depend on PostgreSQL Version
>> >PostgreSQL 10.2 happen
>> >PostgreSQL 9.6 not happen
>> >Chen Huajun
>> The problem occurs whether vm.zone_reclaim_mode is set to 0 or 1.
>> 
>> In addition, what needs to be corrected is that even huge_pages=off is problematic.
>> 
>> Huge_pages = on SQL execution is a very slow , and with hangd connections in startup and auth state.
>> 
>
>You'd probably need to provide a few perf profiles to get further
>insight.
>
>Greetings,
>
>Andres Freund

According to test, this question is related to commit "ecb0d20a9d2e09b7112d3b192047f711f9ff7e59", which changed from Using SysV semaphores to Using POSIX semaphores on Linux. 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ecb0d20a9d2e09b7112d3b192047f711f9ff7e59
Use unnamed POSIX semaphores, if available, on Linux and FreeBSD.
We've had support for using unnamed POSIX semaphores instead of System Vsemaphores for quite some time, but it was not used by default on anyplatform.  Since many systems have rather small limits on the number ofSysV semaphores allowed, it seems desirable to switch to POSIX semaphoreswhere they're available and don't create performance or kernel resourceproblems.  Experimentation by me shows that unnamed POSIX semaphoresare at least as good as SysV semaphores on Linux, and we previously hada report from Maksym Sobolyev that FreeBSD is significantly worse withSysV semaphores than POSIX ones.  So adjust those two platforms to useunnamed POSIX semaphores, if configure can find the necessary libraryfunctions.  If this goes well, we may switch other platforms as well,but it would be advisable to test them individually first.
It's not currently contemplated that we'd encourage users to selecta semaphore API for themselves, but anyone who wants to experimentcan add PREFERRED_SEMAPHORES=UNNAMED_POSIX (or NAMED_POSIX, or SYSV)to their configure command line to do so.
I also tweaked configure to report which API it's selected, mainlyso that we can tell that from buildfarm reports.
I did not touch the user documentation's discussion about semaphores;that will need some adjustment once the dust settles.
Discussion: <8536.1475704230@sss.pgh.pa.us>

This is why, this problem does not occur on 9.6.2, and it occurs on 10.2.

As to why? Perhaps this is a bug in the Linux kernel. However, it is not clear from which version of the Linux kernel "fixed?" this problem. The problem still occurs after upgrading the CentOS 6.5 kernel from 2.6.32-431 to 2.6.32-504.23.4.
To avoid this problem, may be the only way is upgrading the CentOS to higher version(such as CentOS 7.3).

Regards,
Chen Huajun

At 2018-05-15 06:16:39, "chenhj" <chjischj@163.com> wrote:
At 2018-05-07 02:57:12, "Andres Freund" <andres@anarazel.de> wrote:
>On 2018-05-06 23:45:17 +0800, chenhj wrote:
>> >>
>> >>Chen, have you disabled transparent hugepages and zone reclaim?
>> >>
>> >>Greetings,
>> >>
>> >>Andres Freund>>c) Depend on huge page >huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) >huge_page=off, not happen
>> >
>> >When disable transparent hugepages ,this problem also occurs.
>> >Aboud zone reclaim,I will see it later.
>> >What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
>> >d) Depend on PostgreSQL Version
>> >PostgreSQL 10.2 happen
>> >PostgreSQL 9.6 not happen
>> >Chen Huajun
>> The problem occurs whether vm.zone_reclaim_mode is set to 0 or 1.
>> 
>> In addition, what needs to be corrected is that even huge_pages=off is problematic.
>> 
>> Huge_pages = on SQL execution is a very slow , and with hangd connections in startup and auth state.
>> 
>
>You'd probably need to provide a few perf profiles to get further
>insight.
>
>Greetings,
>
>Andres Freund

According to test, this question is related to commit "ecb0d20a9d2e09b7112d3b192047f711f9ff7e59", which changed from Using SysV semaphores to Using POSIX semaphores on Linux. 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ecb0d20a9d2e09b7112d3b192047f711f9ff7e59
Use unnamed POSIX semaphores, if available, on Linux and FreeBSD.
We've had support for using unnamed POSIX semaphores instead of System Vsemaphores for quite some time, but it was not used by default on anyplatform.  Since many systems have rather small limits on the number ofSysV semaphores allowed, it seems desirable to switch to POSIX semaphoreswhere they're available and don't create performance or kernel resourceproblems.  Experimentation by me shows that unnamed POSIX semaphoresare at least as good as SysV semaphores on Linux, and we previously hada report from Maksym Sobolyev that FreeBSD is significantly worse withSysV semaphores than POSIX ones.  So adjust those two platforms to useunnamed POSIX semaphores, if configure can find the necessary libraryfunctions.  If this goes well, we may switch other platforms as well,but it would be advisable to test them individually first.
It's not currently contemplated that we'd encourage users to selecta semaphore API for themselves, but anyone who wants to experimentcan add PREFERRED_SEMAPHORES=UNNAMED_POSIX (or NAMED_POSIX, or SYSV)to their configure command line to do so.
I also tweaked configure to report which API it's selected, mainlyso that we can tell that from buildfarm reports.
I did not touch the user documentation's discussion about semaphores;that will need some adjustment once the dust settles.
Discussion: <8536.1475704230@sss.pgh.pa.us>

This is why, this problem does not occur on 9.6.2, and it occurs on 10.2.

As to why? Perhaps this is a bug in the Linux kernel. However, it is not clear from which version of the Linux kernel "fixed?" this problem. The problem still occurs after upgrading the CentOS 6.5 kernel from 2.6.32-431 to 2.6.32-504.23.4.
To avoid this problem, may be the only way is upgrading the CentOS to higher version(such as CentOS 7.3).
Regards,
Chen Huajun
We have confirmed this to be a known Linux kernel bug. And fixed by the following commmit. Thanks for all help.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=13d60f4b6ab5b702dc8d2ee20999f98a93728aec  
  futex: Take hugepages into account when generating futex_key

Regards,
Chen Huajun