BUG #15187: When use huge page,there may be a lot of hanged connections with status startup orauthentication - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15187: When use huge page,there may be a lot of hanged connections with status startup orauthentication |
Date | |
Msg-id | 152545168641.9460.10767825715065371428@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #15187: When use huge page, there may be a lot of hanged connections with status startup or authentication
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15187 Logged by: Chen huajun Email address: chjischj@163.com PostgreSQL version: 10.2 Operating system: CentOS release 6.5 Description: ◆phenomenon When use huge page, there may be a lot of hanged connections with status startup or authentication state when multiple concurrent query operations are performed. ◆Env hardware:Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 16C/128G/300GB SSD OS:CentOS release 6.5 x64 PostgreSQL:10.2 ◆How to reproduct? 1) set huge_pages to on in postgresql.conf. 2) run the following script count=0 while [ $count -le 2000 ]; do psql -c 'select count(*) from test20180502;' & count=$((count + 1)) done table define ---------------------------- CREATE TABLE test20180502 ( c1 int not null, c2 character(10000) DEFAULT 'AAAAA', PRIMARY KEY(c1) ); ◆The investigation of the cause 1、there will be a lot of improper connections with status startup or authentication or SELECT state: # ps -eaf | grep postgres |grep SELECT |wc 2 24 204 # ps -eaf | grep postgres |grep authen |wc 410 4920 45097 # ps -eaf | grep postgres |grep startup |wc 78 936 8034 2、According to the stack information (*), it can be seen that the function PGSemaphoreLock has been in a waiting state because of the incorrect SEMA intelligence when performing the following operation. errStatus = sem_wait(PG_SEM_REF(sema)); 3、By looking at the system table pg_stat_activity, I've found that the wait_event_type and wait_event of some query operations are in the following state. postgres=# select pid,wait_event_type,wait_event,state,backend_xid,query from pg_stat_activity; pid | wait_event_type | wait_event | state | backend_xid | query ------+-----------------+---------------------+--------+-------------+-------------------------------------------------------------------------------------- 6333 | Activity | AutoVacuumMain | | | 6337 | Activity | LogicalLauncherMain | | | 6408 | LWLock | buffer_mapping | active | | select count(*) from test20180502; 6414 | LWLock | buffer_mapping | active | | select count(*) from test20180502; 7509 | | | active | | select pid,wait_event_type,wait_event,state,backend_xid,query from pg_stat_activity; 6331 | Activity | BgWriterMain | | | 6330 | Activity | CheckpointerMain | | | 6332 | Activity | WalWriterMain | | | postgres=# select locktype,relation::regclass as table,a.pid,mode,granted,substring(query,1,50) from pg_locks a, pg_stat_activity b where a.pid = b.pid; locktype | table | pid | mode | granted | substring ------------+---------------------------+------+-----------------+---------+---------------------------------------------------- relation | test20180502_pkey | 6408 | AccessShareLock | t | select count(*) from test20180502; relation | test20180502 | 6408 | AccessShareLock | t | select count(*) from test20180502; virtualxid | | 6408 | ExclusiveLock | t | select count(*) from test20180502; relation | test20180502_pkey | 6414 | AccessShareLock | t | select count(*) from test20180502; relation | test20180502 | 6414 | AccessShareLock | t | select count(*) from test20180502; virtualxid | | 6414 | ExclusiveLock | t | select count(*) from test20180502; *When there is a large number of improper connections in the startup or authentication or SELECT state, the stack information is as follows: ◇authentication state #0 0x000000321dc0d930 in sem_wait () from /lib64/libpthread.so.0 #1 0x00000000006a5240 in PGSemaphoreLock () #2 0x00000000007152e5 in LWLockAcquire () #3 0x000000000070fc8f in LockRelease () #4 0x000000000070c323 in UnlockRelationId () #5 0x00000000004ad2d2 in relation_close () #6 0x000000000080ed21 in CatalogCacheInitializeCache () #7 0x0000000000810ee0 in SearchCatCache () #8 0x000000000081dc08 in GetSysCacheOid () #9 0x00000000007391a7 in get_role_oid () #10 0x000000000062d102 in hba_getauthmethod () #11 0x000000000062b729 in ClientAuthentication () #12 0x0000000000833fb3 in InitPostgres () #13 0x0000000000724b52 in PostgresMain () #14 0x00000000006b8c8a in PostmasterMain () #15 0x00000000006395e0 in main () ◇startup state #0 0x000000321dc0d930 in sem_wait () from /lib64/libpthread.so.0 #1 0x00000000006a5240 in PGSemaphoreLock () #2 0x00000000007152e5 in LWLockAcquire () #3 0x0000000000710508 in LockAcquireExtended () #4 0x000000000070cce8 in LockRelationOid () #5 0x00000000004ad405 in relation_open () #6 0x00000000004ad436 in heap_open () #7 0x000000000080ecce in CatalogCacheInitializeCache () #8 0x0000000000810ee0 in SearchCatCache () #9 0x0000000000845efd in superuser_arg () #10 0x0000000000833b17 in InitPostgres () #11 0x0000000000724b52 in PostgresMain () #12 0x00000000006b8c8a in PostmasterMain () #13 0x00000000006395e0 in main () ◇SELECT state #0 0x000000321dc0d930 in sem_wait () from /lib64/libpthread.so.0 #1 0x00000000006a5240 in PGSemaphoreLock () #2 0x00000000007152e5 in LWLockAcquire () #3 0x0000000000710508 in LockAcquireExtended () #4 0x000000000070cce8 in LockRelationOid () #5 0x00000000004ad405 in relation_open () #6 0x00000000004ad436 in heap_open () #7 0x000000000080ecce in CatalogCacheInitializeCache () #8 0x0000000000810ee0 in SearchCatCache () #9 0x000000000081d4e8 in get_tablespace () #10 0x000000000081d5c1 in get_tablespace_page_costs () #11 0x000000000066ba15 in cost_seqscan () #12 0x000000000069cc23 in create_seqscan_path () #13 0x0000000000665423 in set_rel_pathlist () #14 0x0000000000666716 in make_one_rel () #15 0x0000000000682e91 in query_planner () #16 0x0000000000687055 in grouping_planner () #17 0x00000000006887c9 in subquery_planner () #18 0x00000000006895bd in standard_planner () #19 0x00000000007238ca in pg_plan_query () #20 0x000000000072399e in pg_plan_queries () #21 0x000000000072406a in exec_simple_query () #22 0x0000000000724f09 in PostgresMain () #23 0x00000000006b8c8a in PostmasterMain () #24 0x00000000006395e0 in main () ◆The additional occurrence condition a)Depend on the number of table records. The probleam do not happen when table contains few records tablename record result test20180502 10000 not happen 50000 not happen 57500 happen 65000 happen 80000 happen 100000 happen 1000000 happen test20180503 10000 not happen 50000 not happen 57500 not happen 65000 not happen 80000 not happen 100000 happen 1000000 happen table define ---------------------------- CREATE TABLE test20180502 ( c1 int not null, c2 character(10000) DEFAULT 'AAAAA', PRIMARY KEY(c1) ); CREATE TABLE test20180503 ( c1 int not null, PRIMARY KEY(c1) ); b) Depend on OS CentOS release 6.5 (Final) 2.6.32-431.el6.x86_64, happen CentOS Linux release 7.3.1611 (Core) 3.10.0-514.el7.x86_64, not happen c) Depend on huge page huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) huge_page=off, not happen d) Depend on PostgreSQL Version PostgreSQL 10.2 happen PostgreSQL 9.6 not happen
pgsql-bugs by date: