Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Refactoring postmaster's code to cleanup after child exit |
Date | |
Msg-id | 574340ca-c90e-4086-9ae2-cd2f1359e989@iki.fi Whole thread Raw |
In response to | Refactoring postmaster's code to cleanup after child exit (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Refactoring postmaster's code to cleanup after child exit
|
List | pgsql-hackers |
On 09/12/2024 14:47, Tomas Vondra wrote: > On 12/9/24 13:30, Heikki Linnakangas wrote: >> On 09/12/2024 01:12, Tomas Vondra wrote: >>> On 11/14/24 15:13, Heikki Linnakangas wrote: >>>> On 09/10/2024 23:40, Heikki Linnakangas wrote: >>>>> I pushed the first three patches, with the new test and one of the >>>>> small >>>>> refactoring patches. Thanks for all the comments so far! Here is a new >>>>> version of the remaining patches. >>>>> >>> Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84 >>> seems to have problems with valgrind :-( I reliably get this failure: >> >> How exactly do you run the test with valgrind? What platform? >> > > It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian). > >> It works for me, with this: >> >> (cd build && ninja && rm -rf tmp_install && meson test --suite setup && >> valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/ >> heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes >> --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/ >> pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster ) >> > > I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I > just do > > ./configure --enable-debug --prefix=/home/user/builds/master > --enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3 > -DUSE_VALGRIND" > > and then the usual "make check" or whatever. > > The patch has a hardcoded path to the .supp file, and places the > valgrind log into /tmp. It has worked for me fine up until that commit, > and it still seems to be working in every other test directory. Ok, I was able to reproduce this with that setup. Unsurprisingly, it's a timing issue. It can be reproduced without valgrind by adding this delay: diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 289059435a9..1eb6bad72ca 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -583,6 +583,7 @@ errfinish(const char *filename, int lineno, const char *funcname) * FATAL termination. The postmaster may or may not consider this * worthy of panic, depending on which subprocess returns it. */ + sleep(1); proc_exit(1); } The test opens a connection that is expected to fail with the "remaining connection slots are reserved for roles with the SUPERUSER attribute" error. Right after that, it opens a new connection as superuser, and expects it to succeed. But if the previous backend hasn't exited yet, the new connection fails with "too many clients already". Not sure how to fix this. A small sleep in the test would work, but in principle there's no delay that's guaranteed to be enough. A more robust solution would be to run a "select count(*) from pg_stat_activity" and wait until the number of connections are what's expected. I'll try that and see how complicated that gets.. -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: