Slave server sometimes locks up - Mailing list pgsql-general

From Boris Sagadin
Subject Slave server sometimes locks up
Date
Msg-id CAEzn=HQm1wx97Aaq0zY8=9X8yrHBeRpOEOvz44Xm3wa3NHxtTw@mail.gmail.com
Whole thread Raw
Responses Re: Slave server sometimes locks up  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-general
PgSQL 10.7, Ubuntu 16.04 LTS

Symptoms:

- server accepts new queries until connections exhausted (all queries are SELECT)
- queries are active, never end, but no disk IO
- queries can't be killed with kill -TERM or pg_terminate_backend()
- system load is minimal (vmstat shows 100% idle)
- perf top shows nothing
- statement_timeout is ignored
- no locks with SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;
- server exits only on kill -9
- strace on SELECT process indefinitely shows:

futex(0x7f00fe94c938, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff^Cstrace: Process 121319 detached
 <detached ...>

GDB backtrace:

(gdb) bt
#0  0x00007f05256f1827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f00fe94ba38) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x7f00fe94ba38, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f05256f18d4 in __new_sem_wait_slow (sem=0x7f00fe94ba38, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f05256f197a in __new_sem_wait (sem=sem@entry=0x7f00fe94ba38) at sem_wait.c:29
#4  0x000055c9b95eb792 in PGSemaphoreLock (sema=0x7f00fe94ba38) at pg_sema.c:316
#5  0x000055c9b965eaec in LWLockAcquire (lock=0x7f00fe96f880, mode=mode@entry=LW_EXCLUSIVE) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/storage/lmgr/lwlock.c:1233
#6  0x000055c9b96497f7 in dsm_create (size=size@entry=105544, flags=flags@entry=1) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/storage/ipc/dsm.c:493
#7  0x000055c9b94139ff in InitializeParallelDSM (pcxt=pcxt@entry=0x55c9bb8d9d58) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/access/transam/parallel.c:268
#8  0x000055c9b9538187 in ExecInitParallelPlan (planstate=0x55c9bb949178, estate=0x55c9bb948568, nworkers=1) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execParallel.c:470
#9  0x000055c9b9547d20 in ExecGather (pstate=0x55c9bb948e58) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeGather.c:158
#10 0x000055c9b95543d5 in ExecProcNode (node=0x55c9bb948e58) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#11 ExecNestLoop (pstate=0x55c9bb948c68) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeNestloop.c:109
#12 0x000055c9b9556b56 in ExecProcNode (node=0x55c9bb948c68) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#13 ExecSort (pstate=0x55c9bb9489f8) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeSort.c:106
#14 0x000055c9b9556e57 in ExecProcNode (node=0x55c9bb9489f8) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#15 ExecUnique (pstate=0x55c9bb948808) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeUnique.c:73
#16 0x000055c9b9533fcb in ExecProcNode (node=0x55c9bb948808) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#17 ExecutePlan (execute_once=<optimized out>, dest=0x55c9bb8a9328, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55c9bb948808, estate=0x55c9bb948568)
    at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execMain.c:1723
#18 standard_ExecutorRun (queryDesc=0x55c9bb98ba78, direction=<optimized out>, count=0, execute_once=<optimized out>) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execMain.c:364
#19 0x000055c9b9670d6b in PortalRunSelect (portal=portal@entry=0x55c9bb8f1d68, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x55c9bb8a9328)
    at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/pquery.c:932
#20 0x000055c9b9672380 in PortalRun (portal=portal@entry=0x55c9bb8f1d68, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x55c9bb8a9328, altdest=altdest@entry=0x55c9bb8a9328,
    completionTag=0x7ffdde0387c0 "") at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/pquery.c:773
#21 0x000055c9b966eef9 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x55c9bb8a8f18 "") at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/postgres.c:2007
#22 PostgresMain (argc=<optimized out>, argv=argv@entry=0x55c9bb845328, dbname=0x55c9bb7ec9e8 "prod", username=<optimized out>) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/postgres.c:4180
#23 0x000055c9b93a46b9 in BackendRun (port=0x55c9bb840ce0) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/postmaster/postmaster.c:4405
#24 BackendStartup (port=0x55c9bb840ce0) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/postmaster/postmaster.c:4077
#25 ServerLoop () at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/postmaster/postmaster.c:1755
#26 0x000055c9b95fd6fb in PostmasterMain (argc=5, argv=<optimized out>) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/postmaster/postmaster.c:1363
#27 0x000055c9b93a5ab5 in main (argc=5, argv=0x55c9bb7ea850) at /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/main/main.c:228


It looks like it might be a semaphore issue(?):

cat /proc/sys/kernel/sem
32000    1024000000    500    32000


Thanks,
Boris

pgsql-general by date:

Previous
From: Mohit Kumar Sahni
Date:
Subject: Connection Drop from PostgreSQL Database Server
Next
From: Nicola Contu
Date:
Subject: Re: Performance comparison between Pgsql 10.5 and Pgsql 11.2