Re: Strange failure on mamba - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Strange failure on mamba |
Date | |
Msg-id | 20221130054225.3ydn5bxdrmel5ssu@awork3.anarazel.de Whole thread Raw |
In response to | Re: Strange failure on mamba (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Strange failure on mamba
|
List | pgsql-hackers |
Hi, On 2022-11-29 20:44:34 -0500, Tom Lane wrote: > Thanks to commit 51b5834cd I've now been able to capture some info > from mamba's last couple of failures [1][2]. Sure enough, what is > happening is that postmaster children are getting stuck in recursive > rtld symbol resolution. A couple of the stack traces I collected are > > #0 0xfdeede4c in ___lwp_park60 () from /usr/libexec/ld.elf_so > #1 0xfdee3e08 in _rtld_exclusive_enter () from /usr/libexec/ld.elf_so > #2 0xfdee59e4 in dlopen () from /usr/libexec/ld.elf_so > #3 0x01e54ed0 in internal_load_library ( > libname=libname@entry=0xfd74cc88 "/home/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/home/buildfarm/bf-data/HEAD/inst/lib/postgresql/libpqwalreceiver.so") atdfmgr.c:239 > #4 0x01e55c78 in load_file (filename=<optimized out>, restricted=<optimized out>) at dfmgr.c:156 > #5 0x01c5ba24 in WalReceiverMain () at walreceiver.c:292 > #6 0x01c090f8 in AuxiliaryProcessMain (auxtype=auxtype@entry=WalReceiverProcess) at auxprocess.c:161 > #7 0x01c10970 in StartChildProcess (type=WalReceiverProcess) at postmaster.c:5310 > #8 0x01c123ac in MaybeStartWalReceiver () at postmaster.c:5475 > #9 MaybeStartWalReceiver () at postmaster.c:5468 > #10 sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5131 > #11 <signal handler called> > #12 0xfdee6b44 in _rtld_symlook_obj () from /usr/libexec/ld.elf_so > #13 0xfdee6fc0 in _rtld_symlook_list () from /usr/libexec/ld.elf_so > #14 0xfdee7644 in _rtld_symlook_default () from /usr/libexec/ld.elf_so > #15 0xfdee795c in _rtld_find_symdef () from /usr/libexec/ld.elf_so > #16 0xfdee7ad0 in _rtld_find_plt_symdef () from /usr/libexec/ld.elf_so > #17 0xfdee1918 in _rtld_bind () from /usr/libexec/ld.elf_so > #18 0xfdee1dc0 in _rtld_bind_secureplt_start () from /usr/libexec/ld.elf_so > Backtrace stopped: frame did not save the PC Do you have any idea why the stack can't be unwound further here? Is it possibly indicative of a corrupted stack? I guess we'd need to dig into the the netbsd libc code :( > which is pretty much just the same thing we were seeing before > commit 8acd8f869 :-> What libraries is postgres linked against? I don't know whether -z now only affects the "top-level" dependencies of postgres, or also the dependencies of shared libraries that haven't been built with -z now. The only dependencies that I could see being relevant are libintl and openssl. You could try if anything changes if you set LD_BIND_NOW, that should trigger "recursive" dependencies to be loaded eagerly as well. Greetings, Andres Freund
pgsql-hackers by date: