Thread: strange valgrind reports about wrapper_handler on 64-bit arm
Hi, while running check-world on 64-bit arm (rpi5 with Debian 12.9), I got a couple reports like this: ==64550== Use of uninitialised value of size 8 ==64550== at 0xA62FE0: wrapper_handler (pqsignal.c:107) ==64550== by 0x580BB9E7: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux) ==64550== Uninitialised value was created by a stack allocation ==64550== at 0x4F94660: strcoll_l (strcoll_l.c:258) ==64550== { <insert_a_suppression_name_here> Memcheck:Value8 fun:wrapper_handler obj:/usr/libexec/valgrind/memcheck-arm64-linux } **64550** Valgrind detected 1 error(s) during execution of "ANALYZE mcv_lists;" The exact command varies, I don't think it's necessarily about analyze or extended stats. The line the report refers to is this: (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg); so I guess it can't be about postgres_signal_arg (as that's an int). But that leaves just pqsignal_handlers, and why would that be uninitialized? The closest thing I found in archives is [1] from about a year ago, but we haven't found any clear explanation there either :-( [1] https://www.postgresql.org/message-id/f1a022e5-9bec-42c5-badd-cfc00b60515c@enterprisedb.com regards -- Tomas Vondra
On Fri, Mar 07, 2025 at 12:03:47AM +0100, Tomas Vondra wrote: > while running check-world on 64-bit arm (rpi5 with Debian 12.9), I got a > couple reports like this: > > ==64550== Use of uninitialised value of size 8 > ==64550== at 0xA62FE0: wrapper_handler (pqsignal.c:107) > ==64550== by 0x580BB9E7: ??? (in > /usr/libexec/valgrind/memcheck-arm64-linux) > ==64550== Uninitialised value was created by a stack allocation > ==64550== at 0x4F94660: strcoll_l (strcoll_l.c:258) > ==64550== > { > <insert_a_suppression_name_here> > Memcheck:Value8 > fun:wrapper_handler > obj:/usr/libexec/valgrind/memcheck-arm64-linux > } > **64550** Valgrind detected 1 error(s) during execution of "ANALYZE > mcv_lists;" > > The exact command varies, I don't think it's necessarily about analyze > or extended stats. > > The line the report refers to is this: > > (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg); > > so I guess it can't be about postgres_signal_arg (as that's an int). But > that leaves just pqsignal_handlers, and why would that be uninitialized? > > The closest thing I found in archives is [1] from about a year ago, but > we haven't found any clear explanation there either :-( Hm. The pointer to strcoll_l makes me wonder if there might be an issue in one of the handler functions that wrapper_handler calls, and wrapper_handler is getting the blame. I don't see how pqsignal_handlers could be uninitialized. It's static, and we are careful to set it appropriately before we call sigaction(). -- nathan
Hi, On 2025-03-07 00:03:47 +0100, Tomas Vondra wrote: > while running check-world on 64-bit arm (rpi5 with Debian 12.9), I got a > couple reports like this: > > ==64550== Use of uninitialised value of size 8 > ==64550== at 0xA62FE0: wrapper_handler (pqsignal.c:107) > ==64550== by 0x580BB9E7: ??? (in > /usr/libexec/valgrind/memcheck-arm64-linux) > ==64550== Uninitialised value was created by a stack allocation > ==64550== at 0x4F94660: strcoll_l (strcoll_l.c:258) > ==64550== > { > <insert_a_suppression_name_here> > Memcheck:Value8 > fun:wrapper_handler > obj:/usr/libexec/valgrind/memcheck-arm64-linux > } > **64550** Valgrind detected 1 error(s) during execution of "ANALYZE > mcv_lists;" > The exact command varies, I don't think it's necessarily about analyze > or extended stats. Do you have a few other examples from where it was triggered? Is the source of the uninitialized value always strcoll_l? Can you reliably reproduce it in certain scenarios or is it probabilistic in some form? Do you know what signal was delivered (I think that could be detected using valgrinds --vgdb)? > The line the report refers to is this: > > (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg); > > so I guess it can't be about postgres_signal_arg (as that's an int). But > that leaves just pqsignal_handlers, and why would that be uninitialized? Is it possible that the signal number we're getting called for is above PG_NSIG? That'd explain why the source value is something fairly random? ISTM that we should add an Assert() to wrapper_handler() that ensures that the signal arg is below PG_NSIG. Might also be worth trying to run without valgrind but with address and undefined behaviour sanitizers enabled. I don't currently have access to an armv8 machine that's not busy doing other stuff... Greetings, Andres Freund
On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote: > Is it possible that the signal number we're getting called for is above > PG_NSIG? That'd explain why the source value is something fairly random? > > ISTM that we should add an Assert() to wrapper_handler() that ensures that the > signal arg is below PG_NSIG. We have such an assertion in pqsignal() before we install wrapper_handler for anything. Is there another way it could be getting called with a different signo? -- nathan
Hi, On 2025-03-07 10:36:35 -0600, Nathan Bossart wrote: > On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote: > > Is it possible that the signal number we're getting called for is above > > PG_NSIG? That'd explain why the source value is something fairly random? > > > > ISTM that we should add an Assert() to wrapper_handler() that ensures that the > > signal arg is below PG_NSIG. > > We have such an assertion in pqsignal() before we install wrapper_handler > for anything. Is there another way it could be getting called with a > different signo? Who the hell knows :). One potential way would be that we got SIGNAL_ARGS wrong for the platform and are interpreting some random thing as the signal number. Or something went wrong in the windows signal emulation code. Or ... It seems cheap insurance to add it both places. Greetings, Andres Freund
On Fri, Mar 07, 2025 at 11:41:38AM -0500, Andres Freund wrote: > On 2025-03-07 10:36:35 -0600, Nathan Bossart wrote: >> On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote: >> > Is it possible that the signal number we're getting called for is above >> > PG_NSIG? That'd explain why the source value is something fairly random? >> > >> > ISTM that we should add an Assert() to wrapper_handler() that ensures that the >> > signal arg is below PG_NSIG. >> >> We have such an assertion in pqsignal() before we install wrapper_handler >> for anything. Is there another way it could be getting called with a >> different signo? > > Who the hell knows :). > > One potential way would be that we got SIGNAL_ARGS wrong for the platform and > are interpreting some random thing as the signal number. Or something went > wrong in the windows signal emulation code. Or ... > > It seems cheap insurance to add it both places. Good enough for me. I'll commit/back-patch to v17 the attached soon. -- nathan
Attachment
On Fri, Mar 07, 2025 at 10:52:10AM -0600, Nathan Bossart wrote: > Good enough for me. I'll commit/back-patch to v17 the attached soon. On second thought, since the signal number is a signed integer, I think we also ought to check that it's > 0. I'm running the attached patch through the CI tests to make sure that's correct for the common platforms. If that looks good, I'm planning to commit it. -- nathan
Attachment
On Fri, Mar 07, 2025 at 02:38:47PM -0600, Nathan Bossart wrote: > If that looks good, I'm planning to commit it. And committed. -- nathan