Re: BUG #18732: Segfault in pgbench on max_connections starvation - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #18732: Segfault in pgbench on max_connections starvation
Date
Msg-id 54bbc27e-73d8-4e56-9fcd-99f2de52ca97@iki.fi
Whole thread Raw
In response to BUG #18732: Segfault in pgbench on max_connections starvation  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #18732: Segfault in pgbench on max_connections starvation
List pgsql-bugs
On 03/12/2024 14:23, PG Bug reporting form wrote:
> When --client connections in pgbench exceed max_connections in postgres,
> pgbench 16 sometimes exits with segfault when a (presumably) ssl
> certificate
> validation error occurs.
> 
> ...
> 
> Steps to reproduce:
> 1. Launch a postgres server with max_connections=900
> 2. Launch pgbench a couple of times with -c 2000
> 
> I was also able to reproduce this error by running multiple pgbench
> instances
> with same launch parameters. This error doesn't reproduce on pgbench 17.2 or
> 15.10
> I can provide the coredump upon request.

I was able to reproduce this on both REL_16_STABLE and REL_17_STABLE. 
Didn't try v15, but I presume this issue is present in all branches (see 
analysis below).

Backtrace from thread 1:

#0  0x00007f19dfa55516 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#1  0x00007f19dfa55bce in OPENSSL_LH_retrieve () from 
/lib/x86_64-linux-gnu/libcrypto.so.3
#2  0x00007f19dfb456d5 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#3  0x00007f19dfa2e943 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#4  0x00007f19dfa2edc1 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#5  0x00007f19dfa17eee in EVP_MD_fetch () from 
/lib/x86_64-linux-gnu/libcrypto.so.3
#6  0x00007f19dfa1855b in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#7  0x00007f19dfa4c22a in HMAC_Init_ex () from 
/lib/x86_64-linux-gnu/libcrypto.so.3
#8  0x00007f19e00a9296 in pg_hmac_init (ctx=ctx@entry=0x7f19cc51bb90, 
key=key@entry=0x7f19cc50d560 "foo", len=len@entry=3) at 
../src/common/hmac_openssl.c:180
#9  0x00007f19e00a62b0 in scram_SaltedPassword (password=0x7f19cc50d560 
"foo", hash_type=<optimized out>, key_length=32, salt=<optimized out>, 
saltlen=<optimized out>, iterations=4096,
     result=0x7f19cc51bb08 
"w\351אI\256\035\330\003y\021ւ\205\327ƿ\217Q\332\362}\a\0364\243^\324\321a\034H0\250P\314\031\177", 
errstr=0x7f19dd4bb928) at ../src/common/scram-common.c:87
#10 0x00007f19e0089bcd in calculate_client_proof (state=0x7f19cc51bae0,
     client_final_message_without_proof=0x7f19cc50b040 

"c=cD10bHMtc2VydmVyLWVuZC1wb2ludCwsvkIO06ZPSH1cmElOgC2DbPafilVET0yej6RhzH30Rzw=,r=Wkk2fofG+RP23HT1tBMqx0ijin6taf2xdjPuJBYqBqw2853/",


     result=<optimized out>, errstr=<optimized out>) at 
../src/interfaces/libpq/fe-auth-scram.c:788
#11 build_client_final_message (state=0x7f19cc51bae0) at 
../src/interfaces/libpq/fe-auth-scram.c:565
#12 scram_exchange (opaq=0x7f19cc51bae0, input=<optimized out>, 
inputlen=<optimized out>, output=0x7f19dd4bba28, outputlen=<optimized 
out>, done=<optimized out>, success=<optimized out>)
     at ../src/interfaces/libpq/fe-auth-scram.c:255
#13 0x00007f19e008a642 in pg_SASL_continue (conn=0x7f19cc4ff1f0, 
payloadlen=84, final=<optimized out>) at 
../src/interfaces/libpq/fe-auth.c:654
#14 pg_fe_sendauth (areq=11, payloadlen=84, 
conn=conn@entry=0x7f19cc4ff1f0) at ../src/interfaces/libpq/fe-auth.c:1139
#15 0x00007f19e008f756 in PQconnectPoll (conn=conn@entry=0x7f19cc4ff1f0) 
at ../src/interfaces/libpq/fe-connect.c:3802
#16 0x00007f19e008bae8 in connectDBComplete 
(conn=conn@entry=0x7f19cc4ff1f0) at 
../src/interfaces/libpq/fe-connect.c:2511
#17 0x00007f19e008b2bf in PQconnectdbParams 
(keywords=keywords@entry=0x7f19dd4bc1f0, 
values=values@entry=0x7f19dd4bc1b0, expand_dbname=expand_dbname@entry=1)
     at ../src/interfaces/libpq/fe-connect.c:685
#18 0x000056350c35efa5 in doConnect () at ../src/bin/pgbench/pgbench.c:1560
#19 0x000056350c35f2c5 in threadRun (arg=0x56350d1184a0) at 
../src/bin/pgbench/pgbench.c:7396
#20 0x00007f19dfe1b112 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:447
#21 0x00007f19dfe998f8 in __GI___clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 2:

#0  0x00007f19dfe28a04 in _int_free_merge_chunk 
(av=av@entry=0x7f19dff70ac0 <main_arena>, p=0x56350d126280, size=144) at 
./malloc/malloc.c:4675
#1  0x00007f19dfe28d31 in _int_free (av=0x7f19dff70ac0 <main_arena>, 
p=<optimized out>, have_lock=<optimized out>, have_lock@entry=0) at 
./malloc/malloc.c:4646
#2  0x00007f19dfe2b4ff in __GI___libc_free (mem=<optimized out>) at 
./malloc/malloc.c:3398
#3  0x00007f19dfa5580e in OPENSSL_LH_free () from 
/lib/x86_64-linux-gnu/libcrypto.so.3
#4  0x00007f19dfb4489f in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#5  0x00007f19dfa6e0e7 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#6  0x00007f19dfb44c35 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#7  0x00007f19dfa565a5 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#8  0x00007f19dfa56aa0 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#9  0x00007f19dfa5ac32 in OPENSSL_cleanup () from 
/lib/x86_64-linux-gnu/libcrypto.so.3
#10 0x00007f19dfdcb1e1 in __run_exit_handlers (status=status@entry=1, 
listp=0x7f19dff70680 <__exit_funcs>, 
run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
     at ./stdlib/exit.c:108
#11 0x00007f19dfdcb29a in __GI_exit (status=status@entry=1) at 
./stdlib/exit.c:138
#12 0x000056350c362ae6 in threadRun (arg=<optimized out>) at 
../src/bin/pgbench/pgbench.c:7399
#13 0x00007f19dfe1b112 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:447
#14 0x00007f19dfe998f8 in __GI___clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Sometimes you also get this error instead of a crash, which is 
presumably another symptom of the same race condition:

pgbench (16.6, server 18devel)
starting vacuum...end.
pgbench: error: connection to server at "localhost" (::1), port 5432 
failed: FATAL:  sorry, too many clients already
pgbench: error: could not create connection for client 1145
pgbench: error: connection to server at "localhost" (::1), port 5432 
failed: could not verify server signature: OpenSSL failure

Once I also got this:

pgbench (17.2, server 18devel)
starting vacuum...end.
pgbench: error: connection to server at "localhost" (::1), port 5432 
failed: FATAL:  sorry, too many clients already
pgbench: error: could not create connection for client 1045
k5_mutex_lock: Received error 22 (Invalid argument)
*** %n in writable segment detected ***

It looks like a race condition between OpenSSL's exit handler and the . 
HMAC_Init_ex() call in another thread. I think we could use the 
OPENSSL_INIT_NO_ATEXIT option to prevent the atexit handler from 
running. The OpenSSL man page on OPENSSL_init_crypto says:

> OPENSSL_INIT_NO_ATEXIT
> 
> By default OpenSSL will attempt to clean itself up when the process
> exits via an "atexit" handler. Using this option suppresses that
> behaviour. This means that the application will have to clean up
> OpenSSL explicitly using OPENSSL_cleanup().

I don't understand why that cleanup would be needed. When the program 
exits, all resources are gone anyway.

-- 
Heikki Linnakangas
Neon (https://neon.tech)



pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: BUG #18730: Inequality comparison operators and SMALLINT negative immediate value
Next
From: Andres Freund
Date:
Subject: Re: BUG #18732: Segfault in pgbench on max_connections starvation