Hi,
Some time recently valgrind suppressions in the older backbranches stopped
fully working. E.g.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2022%3A31%3A25
failed on 14, with:
2025-03-04 22:33:13.359 UTC [3534679][postmaster][:0] LOG: database system is ready to accept connections
==3534881== VALGRINDERROR-BEGIN
==3534881== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==3534881== at 0x4D975C7: __internal_syscall_cancel (cancellation.c:64)
==3534881== by 0x4D975EC: __syscall_cancel (cancellation.c:75)
==3534881== by 0x4E1A1F6: send (send.c:28)
==3534881== by 0x4F32DC: pgstat_send (pgstat.c:2970)
==3534881== by 0x4F33C5: pgstat_send_inquiry (pgstat.c:1889)
==3534881== by 0x4F5508: backend_read_statsfile (pgstat.c:4650)
==3534881== by 0x4F7812: pgstat_fetch_stat_dbentry (pgstat.c:2710)
==3534881== by 0x4EB3AB: rebuild_database_list (autovacuum.c:1055)
==3534881== by 0x4EBF2F: AutoVacLauncherMain (autovacuum.c:635)
==3534881== by 0x4EC2B1: StartAutoVacLauncher (autovacuum.c:422)
==3534881== by 0x4FC815: reaper (postmaster.c:3067)
==3534881== by 0x4D47DAF: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==3534881== Address 0x1ffeffedac is on thread 1's stack
==3534881== in frame #4, created by pgstat_send_inquiry (pgstat.c:1882)
==3534881== Uninitialised value was created by a stack allocation
==3534881== at 0x4F3386: pgstat_send_inquiry (pgstat.c:1882)
==3534881==
==3534881== VALGRINDERROR-END
{
<insert_a_suppression_name_here>
Memcheck:Param
socketcall.sendto(msg)
fun:__internal_syscall_cancel
fun:__syscall_cancel
fun:send
fun:pgstat_send
fun:pgstat_send_inquiry
fun:backend_read_statsfile
fun:pgstat_fetch_stat_dbentry
fun:rebuild_database_list
fun:AutoVacLauncherMain
fun:StartAutoVacLauncher
fun:reaper
obj:/usr/lib/x86_64-linux-gnu/libc.so.6
}
I was confused by this for a while, because we seem to have a suppression for
it:
{
padding_pgstat_sendto
Memcheck:Param
socketcall.sendto(msg)
fun:*send*
fun:pgstat_send
}
An embarassing amount of staring later I realized this must be due to the
cancellation related functions in the stack trace.
I think all we need to do is to add a ... line to the two relevant
suppressions, similar to what we do for other suppressions. See attached.
I don't think we need it for other suppressions, they either already use
... or aren't below a syscall.
Greetings,
Andres Freund