Re: conchuela timeouts since 2021-10-09 system upgrade - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: conchuela timeouts since 2021-10-09 system upgrade |
Date | |
Msg-id | CA+hUKG+jTMdxkyZJbUGyG7ggFKABWsBwSxSzLNVm-X87DeLfqg@mail.gmail.com Whole thread Raw |
In response to | Re: conchuela timeouts since 2021-10-09 system upgrade (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: conchuela timeouts since 2021-10-09 system upgrade
|
List | pgsql-bugs |
I managed to reproduce the 017_shm.pl hang using my Vagrant file (see earlier). It happens for "gmake -s check-world", but not "gmake -s -C src/test/recovery check", which might have confused me last time. The symptoms are as described already, though this time I learned with truss that it's in a retry loop waiting 1 second at a time. I don't know how to get debug symbols for libc on this operating system (well, probably needed to select debug option while installing the OS, but I didn't install the OS, I'm using a lazy bones pre-rolled Vagrant image). So in order to be able to inspect the arguments to select, I made my own LD_PRELOAD wrapper for select (see end), and I observed that it was calling select() with an empty fd_set: (gdb) bt #0 0x00000008014c1e6c in select () from /lib/libc.so.8 #1 0x00000008009f0fe6 in select () from /usr/lib/libpthread.so.0 #2 0x000000080044c622 in select (nfds=16, readfds=0x8017f6de0, writefds=0x8017f6d50, exceptfds=0x0, timeout=0x7fffffdfd4a0) at select.c:13 #3 0x00000008007bfaf9 in Perl_pp_sselect () from /usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32 #4 0x000000080076b036 in Perl_runops_standard () from /usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32 #5 0x00000008006da3b7 in perl_run () from /usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32 #6 0x0000000000400de4 in main () (gdb) f 2 #2 0x000000080044c622 in select (nfds=16, readfds=0x8017f6de0, writefds=0x8017f6d50, exceptfds=0x0, timeout=0x7fffffdfd4a0) at select.c:13 13 return real_function(nfds, readfds, writefds, exceptfds, timeout); (gdb) print nfds $1 = 16 (gdb) print *readfds $2 = {fds_bits = {0 <repeats 16 times>}} (gdb) print *writefds $3 = {fds_bits = {0 <repeats 16 times>}} So it looks a lot like something on the perl side has lost track of the pipe it's supposed to be selecting on. If I understand correctly, it's supposed to be waiting for one of the following pipes to appear as readable, whichever is the one that the zombie psql process previously held the other end of: $ fstat -p 73032 USER CMD PID FD PATH INUM MODE SZ|DV R/W vagrant perl 73032 root / 1 drwxr-xr-x offset:0 r vagrant perl 73032 wd /home/vagrant/postgres/src/test/recovery 4434524058 drwxr-xr-x offset:0 r vagrant perl 73032 text /pfs/@@-1:00001/local/bin/perl 4329919842 -rwxr-xr-x offset:0 r vagrant perl 73032 0 /dev/pts/0 1335 crw--w---- pts/0:155171 rw vagrant perl 73032 1 /home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm 4477427235 -rw-r--r-- offset:9457 w vagrant perl 73032 2 /home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm 4477427235 -rw-r--r-- offset:9457 w vagrant perl 73032 3 /home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm 4477427235 -rw-r--r-- offset:9457 w vagrant perl 73032 4* pipe fffff800aac1dca0 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 5* pipe fffff800ab6e9020 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 6* pipe fffff800aac1dca0 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 7* pipe fffff800ab6e9020 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 8* pipe fffff800aac1dca0 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 9* pipe fffff800ab6e9020 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 13* pipe fffff800ab157560 (B<->A) ravail 0 wavail 0 rw === select wrapper === $ cat select.c #include <dlfcn.h> #include <sys/select.h> int select(int nfds,fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout) { static int (*real_function)(int, fd_set *, fd_set *, fd_set *, struct timeval *); if (!real_function) real_function = dlsym(RTLD_NEXT, "select"); return real_function(nfds, readfds, writefds, exceptfds, timeout); } $ cc -Wall -fPIC -shared -g -o myselect.so select.c -ldl $ cd postgres $ LD_PRELOAD=/home/vagrant/myselect.so gmake check-world -s
pgsql-bugs by date: