Thread: strange valgrind reports about wrapper_handler on 64-bit arm

strange valgrind reports about wrapper_handler on 64-bit arm

From
Tomas Vondra
Date:
Hi,

while running check-world on 64-bit arm (rpi5 with Debian 12.9), I got a
couple reports like this:

==64550== Use of uninitialised value of size 8
==64550==    at 0xA62FE0: wrapper_handler (pqsignal.c:107)
==64550==    by 0x580BB9E7: ??? (in
/usr/libexec/valgrind/memcheck-arm64-linux)
==64550==  Uninitialised value was created by a stack allocation
==64550==    at 0x4F94660: strcoll_l (strcoll_l.c:258)
==64550==
{
   <insert_a_suppression_name_here>
   Memcheck:Value8
   fun:wrapper_handler
   obj:/usr/libexec/valgrind/memcheck-arm64-linux
}
**64550** Valgrind detected 1 error(s) during execution of "ANALYZE
mcv_lists;"

The exact command varies, I don't think it's necessarily about analyze
or extended stats.

The line the report refers to is this:

    (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg);

so I guess it can't be about postgres_signal_arg (as that's an int). But
that leaves just pqsignal_handlers, and why would that be uninitialized?

The closest thing I found in archives is [1] from about a year ago, but
we haven't found any clear explanation there either :-(


[1]
https://www.postgresql.org/message-id/f1a022e5-9bec-42c5-badd-cfc00b60515c@enterprisedb.com


regards

-- 
Tomas Vondra




Re: strange valgrind reports about wrapper_handler on 64-bit arm

From
Nathan Bossart
Date:
On Fri, Mar 07, 2025 at 12:03:47AM +0100, Tomas Vondra wrote:
> while running check-world on 64-bit arm (rpi5 with Debian 12.9), I got a
> couple reports like this:
> 
> ==64550== Use of uninitialised value of size 8
> ==64550==    at 0xA62FE0: wrapper_handler (pqsignal.c:107)
> ==64550==    by 0x580BB9E7: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==64550==  Uninitialised value was created by a stack allocation
> ==64550==    at 0x4F94660: strcoll_l (strcoll_l.c:258)
> ==64550==
> {
>    <insert_a_suppression_name_here>
>    Memcheck:Value8
>    fun:wrapper_handler
>    obj:/usr/libexec/valgrind/memcheck-arm64-linux
> }
> **64550** Valgrind detected 1 error(s) during execution of "ANALYZE
> mcv_lists;"
> 
> The exact command varies, I don't think it's necessarily about analyze
> or extended stats.
> 
> The line the report refers to is this:
> 
>     (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg);
> 
> so I guess it can't be about postgres_signal_arg (as that's an int). But
> that leaves just pqsignal_handlers, and why would that be uninitialized?
> 
> The closest thing I found in archives is [1] from about a year ago, but
> we haven't found any clear explanation there either :-(

Hm.  The pointer to strcoll_l makes me wonder if there might be an issue in
one of the handler functions that wrapper_handler calls, and
wrapper_handler is getting the blame.  I don't see how pqsignal_handlers
could be uninitialized.  It's static, and we are careful to set it
appropriately before we call sigaction().

-- 
nathan



Re: strange valgrind reports about wrapper_handler on 64-bit arm

From
Andres Freund
Date:
Hi,

On 2025-03-07 00:03:47 +0100, Tomas Vondra wrote:
> while running check-world on 64-bit arm (rpi5 with Debian 12.9), I got a
> couple reports like this:
> 
> ==64550== Use of uninitialised value of size 8
> ==64550==    at 0xA62FE0: wrapper_handler (pqsignal.c:107)
> ==64550==    by 0x580BB9E7: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==64550==  Uninitialised value was created by a stack allocation
> ==64550==    at 0x4F94660: strcoll_l (strcoll_l.c:258)
> ==64550==
> {
>    <insert_a_suppression_name_here>
>    Memcheck:Value8
>    fun:wrapper_handler
>    obj:/usr/libexec/valgrind/memcheck-arm64-linux
> }
> **64550** Valgrind detected 1 error(s) during execution of "ANALYZE
> mcv_lists;"

> The exact command varies, I don't think it's necessarily about analyze
> or extended stats.

Do you have a few other examples from where it was triggered?

Is the source of the uninitialized value always strcoll_l?

Can you reliably reproduce it in certain scenarios or is it probabilistic in
some form?

Do you know what signal was delivered (I think that could be detected using
valgrinds --vgdb)?


> The line the report refers to is this:
> 
>     (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg);
> 
> so I guess it can't be about postgres_signal_arg (as that's an int). But
> that leaves just pqsignal_handlers, and why would that be uninitialized?

Is it possible that the signal number we're getting called for is above
PG_NSIG? That'd explain why the source value is something fairly random?

ISTM that we should add an Assert() to wrapper_handler() that ensures that the
signal arg is below PG_NSIG.


Might also be worth trying to run without valgrind but with address and
undefined behaviour sanitizers enabled.  I don't currently have access to an
armv8 machine that's not busy doing other stuff...

Greetings,

Andres Freund



Re: strange valgrind reports about wrapper_handler on 64-bit arm

From
Nathan Bossart
Date:
On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote:
> Is it possible that the signal number we're getting called for is above
> PG_NSIG? That'd explain why the source value is something fairly random?
> 
> ISTM that we should add an Assert() to wrapper_handler() that ensures that the
> signal arg is below PG_NSIG.

We have such an assertion in pqsignal() before we install wrapper_handler
for anything.  Is there another way it could be getting called with a
different signo?

-- 
nathan



Re: strange valgrind reports about wrapper_handler on 64-bit arm

From
Andres Freund
Date:
Hi,

On 2025-03-07 10:36:35 -0600, Nathan Bossart wrote:
> On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote:
> > Is it possible that the signal number we're getting called for is above
> > PG_NSIG? That'd explain why the source value is something fairly random?
> > 
> > ISTM that we should add an Assert() to wrapper_handler() that ensures that the
> > signal arg is below PG_NSIG.
> 
> We have such an assertion in pqsignal() before we install wrapper_handler
> for anything.  Is there another way it could be getting called with a
> different signo?

Who the hell knows :).

One potential way would be that we got SIGNAL_ARGS wrong for the platform and
are interpreting some random thing as the signal number.  Or something went
wrong in the windows signal emulation code.  Or ...

It seems cheap insurance to add it both places.

Greetings,

Andres Freund



Re: strange valgrind reports about wrapper_handler on 64-bit arm

From
Nathan Bossart
Date:
On Fri, Mar 07, 2025 at 11:41:38AM -0500, Andres Freund wrote:
> On 2025-03-07 10:36:35 -0600, Nathan Bossart wrote:
>> On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote:
>> > Is it possible that the signal number we're getting called for is above
>> > PG_NSIG? That'd explain why the source value is something fairly random?
>> > 
>> > ISTM that we should add an Assert() to wrapper_handler() that ensures that the
>> > signal arg is below PG_NSIG.
>> 
>> We have such an assertion in pqsignal() before we install wrapper_handler
>> for anything.  Is there another way it could be getting called with a
>> different signo?
> 
> Who the hell knows :).
> 
> One potential way would be that we got SIGNAL_ARGS wrong for the platform and
> are interpreting some random thing as the signal number.  Or something went
> wrong in the windows signal emulation code.  Or ...
> 
> It seems cheap insurance to add it both places.

Good enough for me.  I'll commit/back-patch to v17 the attached soon.

-- 
nathan

Attachment

Re: strange valgrind reports about wrapper_handler on 64-bit arm

From
Nathan Bossart
Date:
On Fri, Mar 07, 2025 at 10:52:10AM -0600, Nathan Bossart wrote:
> Good enough for me.  I'll commit/back-patch to v17 the attached soon.

On second thought, since the signal number is a signed integer, I think we
also ought to check that it's > 0.  I'm running the attached patch through
the CI tests to make sure that's correct for the common platforms.  If that
looks good, I'm planning to commit it.

-- 
nathan

Attachment

Re: strange valgrind reports about wrapper_handler on 64-bit arm

From
Nathan Bossart
Date:
On Fri, Mar 07, 2025 at 02:38:47PM -0600, Nathan Bossart wrote:
> If that looks good, I'm planning to commit it.

And committed.

-- 
nathan