Re: BUG #18732: Segfault in pgbench on max_connections starvation - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: BUG #18732: Segfault in pgbench on max_connections starvation |
Date | |
Msg-id | 54bbc27e-73d8-4e56-9fcd-99f2de52ca97@iki.fi Whole thread Raw |
In response to | BUG #18732: Segfault in pgbench on max_connections starvation (PG Bug reporting form <noreply@postgresql.org>) |
Responses |
Re: BUG #18732: Segfault in pgbench on max_connections starvation
|
List | pgsql-bugs |
On 03/12/2024 14:23, PG Bug reporting form wrote: > When --client connections in pgbench exceed max_connections in postgres, > pgbench 16 sometimes exits with segfault when a (presumably) ssl > certificate > validation error occurs. > > ... > > Steps to reproduce: > 1. Launch a postgres server with max_connections=900 > 2. Launch pgbench a couple of times with -c 2000 > > I was also able to reproduce this error by running multiple pgbench > instances > with same launch parameters. This error doesn't reproduce on pgbench 17.2 or > 15.10 > I can provide the coredump upon request. I was able to reproduce this on both REL_16_STABLE and REL_17_STABLE. Didn't try v15, but I presume this issue is present in all branches (see analysis below). Backtrace from thread 1: #0 0x00007f19dfa55516 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #1 0x00007f19dfa55bce in OPENSSL_LH_retrieve () from /lib/x86_64-linux-gnu/libcrypto.so.3 #2 0x00007f19dfb456d5 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #3 0x00007f19dfa2e943 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #4 0x00007f19dfa2edc1 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #5 0x00007f19dfa17eee in EVP_MD_fetch () from /lib/x86_64-linux-gnu/libcrypto.so.3 #6 0x00007f19dfa1855b in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #7 0x00007f19dfa4c22a in HMAC_Init_ex () from /lib/x86_64-linux-gnu/libcrypto.so.3 #8 0x00007f19e00a9296 in pg_hmac_init (ctx=ctx@entry=0x7f19cc51bb90, key=key@entry=0x7f19cc50d560 "foo", len=len@entry=3) at ../src/common/hmac_openssl.c:180 #9 0x00007f19e00a62b0 in scram_SaltedPassword (password=0x7f19cc50d560 "foo", hash_type=<optimized out>, key_length=32, salt=<optimized out>, saltlen=<optimized out>, iterations=4096, result=0x7f19cc51bb08 "w\351אI\256\035\330\003y\021ւ\205\327ƿ\217Q\332\362}\a\0364\243^\324\321a\034H0\250P\314\031\177", errstr=0x7f19dd4bb928) at ../src/common/scram-common.c:87 #10 0x00007f19e0089bcd in calculate_client_proof (state=0x7f19cc51bae0, client_final_message_without_proof=0x7f19cc50b040 "c=cD10bHMtc2VydmVyLWVuZC1wb2ludCwsvkIO06ZPSH1cmElOgC2DbPafilVET0yej6RhzH30Rzw=,r=Wkk2fofG+RP23HT1tBMqx0ijin6taf2xdjPuJBYqBqw2853/", result=<optimized out>, errstr=<optimized out>) at ../src/interfaces/libpq/fe-auth-scram.c:788 #11 build_client_final_message (state=0x7f19cc51bae0) at ../src/interfaces/libpq/fe-auth-scram.c:565 #12 scram_exchange (opaq=0x7f19cc51bae0, input=<optimized out>, inputlen=<optimized out>, output=0x7f19dd4bba28, outputlen=<optimized out>, done=<optimized out>, success=<optimized out>) at ../src/interfaces/libpq/fe-auth-scram.c:255 #13 0x00007f19e008a642 in pg_SASL_continue (conn=0x7f19cc4ff1f0, payloadlen=84, final=<optimized out>) at ../src/interfaces/libpq/fe-auth.c:654 #14 pg_fe_sendauth (areq=11, payloadlen=84, conn=conn@entry=0x7f19cc4ff1f0) at ../src/interfaces/libpq/fe-auth.c:1139 #15 0x00007f19e008f756 in PQconnectPoll (conn=conn@entry=0x7f19cc4ff1f0) at ../src/interfaces/libpq/fe-connect.c:3802 #16 0x00007f19e008bae8 in connectDBComplete (conn=conn@entry=0x7f19cc4ff1f0) at ../src/interfaces/libpq/fe-connect.c:2511 #17 0x00007f19e008b2bf in PQconnectdbParams (keywords=keywords@entry=0x7f19dd4bc1f0, values=values@entry=0x7f19dd4bc1b0, expand_dbname=expand_dbname@entry=1) at ../src/interfaces/libpq/fe-connect.c:685 #18 0x000056350c35efa5 in doConnect () at ../src/bin/pgbench/pgbench.c:1560 #19 0x000056350c35f2c5 in threadRun (arg=0x56350d1184a0) at ../src/bin/pgbench/pgbench.c:7396 #20 0x00007f19dfe1b112 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447 #21 0x00007f19dfe998f8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 Thread 2: #0 0x00007f19dfe28a04 in _int_free_merge_chunk (av=av@entry=0x7f19dff70ac0 <main_arena>, p=0x56350d126280, size=144) at ./malloc/malloc.c:4675 #1 0x00007f19dfe28d31 in _int_free (av=0x7f19dff70ac0 <main_arena>, p=<optimized out>, have_lock=<optimized out>, have_lock@entry=0) at ./malloc/malloc.c:4646 #2 0x00007f19dfe2b4ff in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3398 #3 0x00007f19dfa5580e in OPENSSL_LH_free () from /lib/x86_64-linux-gnu/libcrypto.so.3 #4 0x00007f19dfb4489f in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #5 0x00007f19dfa6e0e7 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #6 0x00007f19dfb44c35 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #7 0x00007f19dfa565a5 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #8 0x00007f19dfa56aa0 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3 #9 0x00007f19dfa5ac32 in OPENSSL_cleanup () from /lib/x86_64-linux-gnu/libcrypto.so.3 #10 0x00007f19dfdcb1e1 in __run_exit_handlers (status=status@entry=1, listp=0x7f19dff70680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:108 #11 0x00007f19dfdcb29a in __GI_exit (status=status@entry=1) at ./stdlib/exit.c:138 #12 0x000056350c362ae6 in threadRun (arg=<optimized out>) at ../src/bin/pgbench/pgbench.c:7399 #13 0x00007f19dfe1b112 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447 #14 0x00007f19dfe998f8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 Sometimes you also get this error instead of a crash, which is presumably another symptom of the same race condition: pgbench (16.6, server 18devel) starting vacuum...end. pgbench: error: connection to server at "localhost" (::1), port 5432 failed: FATAL: sorry, too many clients already pgbench: error: could not create connection for client 1145 pgbench: error: connection to server at "localhost" (::1), port 5432 failed: could not verify server signature: OpenSSL failure Once I also got this: pgbench (17.2, server 18devel) starting vacuum...end. pgbench: error: connection to server at "localhost" (::1), port 5432 failed: FATAL: sorry, too many clients already pgbench: error: could not create connection for client 1045 k5_mutex_lock: Received error 22 (Invalid argument) *** %n in writable segment detected *** It looks like a race condition between OpenSSL's exit handler and the . HMAC_Init_ex() call in another thread. I think we could use the OPENSSL_INIT_NO_ATEXIT option to prevent the atexit handler from running. The OpenSSL man page on OPENSSL_init_crypto says: > OPENSSL_INIT_NO_ATEXIT > > By default OpenSSL will attempt to clean itself up when the process > exits via an "atexit" handler. Using this option suppresses that > behaviour. This means that the application will have to clean up > OpenSSL explicitly using OPENSSL_cleanup(). I don't understand why that cleanup would be needed. When the program exits, all resources are gone anyway. -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-bugs by date: