Hello all,
Consider following stacktrace:
(gdb) bt
#0 0x00007f77c81fae87 in semop () syscall-template.S:81
#1 0x000000000063b721 in PGSemaphoreLock pg_sema.c:387
#2 0x0000000000692e1b in LWLockAcquire lwlock.c:1026
#3 0x000000000068d4c5 in LockAcquireExtended lock.c:881
#4 0x000000000068dcc1 in LockAcquire lock.c:672
#5 0x000000000068b7a8 in LockRelationOid lmgr.c:112
#6 0x0000000000501d18 in find_inheritance_children pg_inherits.c:120
#7 0x0000000000501d80 in find_all_inheritors pg_inherits.c:182
#8 0x000000000062db8d in expand_inherited_rtentry prepunion.c:1285
#9 expand_inherited_tables prepunion.c:1212
#10 0x0000000000622705 in subquery_planner planner.c:501
#11 0x0000000000622d31 in standard_planner planner.c:285
#12 0x000000000069ef0c in pg_plan_query postgres.c:809
#13 0x000000000069f004 in pg_plan_queries postgres.c:868
#14 0x00000000006a0fc2 in exec_simple_query postgres.c:1033
#15 PostgresMain postgres.c:4032
#16 0x0000000000467479 in BackendRun postmaster.c:4237
#17 BackendStartup postmaster.c:3913
#18 ServerLoop () postmaster.c:1684
#19 0x000000000064c828 in PostmasterMain postmaster.c:1292
#20 0x0000000000467f3e in main main.c:223
Turns out PostgreSQL can spend a lot of time waiting for a lock in this
particular place, especially if you are running PostgreSQL on 60-core
server. Which obviously is a pretty bad sign.
You can easily reproduce this issue on regular Core i7 server. Use
attached schema.sql file to create a database and run:
pgbench -j 8 -c 8 -f pgbench.sql -T 300 my_database 2>/dev/null &
While this example is running connect to some PostgreSQL process with
gdb and run bt/c from time to time. You will see that PostgreSQL waits
for this particular lock quite often.
The problem is that code between LWLockAcquire (lock.c:881) and
LWLockRelease (lock.c:1020) can _sometimes_ run up to 3-5 ms. Using
old-good gettimeofday and logging method I managed to find a bottleneck:
-- proclock = SetupLockInTable [lock.c:892]
`-- proclock = (PROCLOCK *) hash_search_with_hash_value [lock.c:1105]
`-- currBucket = get_hash_entry(hashp) [dynahash.c:985]
`-- SpinLockAcquire(&hctl->mutex) [dynahash.c:1187]
If my measurements are not wrong (I didn't place gettimeofday between
SpinLockAquire/SpinLockRelease, etc) we sometimes spend about 3 ms here
waiting for a spinlock, which doesn't seems right.
I managed to fix this behaviour by modifying choose_nelem_alloc
procedure in dynahash.c (see attached patch). The idea is to double
number of items we allocate when there is no more free items in hash
table. So we need twice less allocations which reduces contention.
This patch doesn't cause any performance degradation according to
pgbench, `make check` passes, etc.
Best regards,
Aleksander