Re: Why our Valgrind reports suck - Mailing list pgsql-hackers

From Yasir
Subject Re: Why our Valgrind reports suck
Date
Msg-id CAA9OW9eh0+12PekdV8pNtdYFSOMyAJgVU7fop=oWFmf6DQZ-0w@mail.gmail.com
Whole thread Raw
In response to Re: Why our Valgrind reports suck  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Mon, May 12, 2025 at 12:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> And, since there's nothing new under the sun around here,
> we already had a discussion about that back in 2021:
> https://www.postgresql.org/message-id/flat/3471359.1615937770%40sss.pgh.pa.us
> That thread seems to have led to fixing some specific bugs,
> but we never committed any of the discussed valgrind infrastructure
> improvements.  I'll have a go at resurrecting that...

Okay, here is a patch series that updates the
0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
patch you posted in that thread, and makes various follow-up
fixes that either fix or paper over various leaks.  Some of it
is committable I think, but other parts are just WIP.  Anyway,
as of the 0010 patch we can run through the core regression tests
and see no more than a couple of kilobytes total reported leakage
in any process, except for two tests that expose leaks in TS
dictionary building.  (That's fixable but I ran out of time,
and I wanted to get this posted before Montreal.)  There is
work left to do before we can remove the suppressions added in
0002, but this is already huge progress compared to where we were.

A couple of these patches are bug fixes that need to be applied and
even back-patched.  In particular, I had not realized that autovacuum
leaks a nontrivial amount of memory per relation processed (cf 0009),
and apparently has done for a few releases now.  This is horrid in
databases with many tables, and I'm surprised that we've not gotten
complaints about it.

                        regards, tom lane


Thanks for sharing the patch series. I've applied the patches on my end and rerun the tests. Valgrind now reports 8 bytes leakage only, and the previously noisy outputs are almost entirely gone.
Here's valgrind output:

==00:00:01:50.385 90463== LEAK SUMMARY:
==00:00:01:50.385 90463==    definitely lost: 8 bytes in 1 blocks
==00:00:01:50.385 90463==    indirectly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463==      possibly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463==    still reachable: 1,182,132 bytes in 2,989 blocks
==00:00:01:50.385 90463==         suppressed: 0 bytes in 0 blocks
==00:00:01:50.385 90463== Rerun with --leak-check=full to see details of leaked memory
==00:00:01:50.385 90463==
==00:00:01:50.385 90463== For lists of detected and suppressed errors, rerun with: -s
==00:00:01:50.385 90463== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 34 from 3)

 Regards, 

Yasir Hussain
Data Bene

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: PG 18 release notes draft committed
Next
From: Dilip Kumar
Date:
Subject: Re: Suggestion to add --continue-client-on-abort option to pgbench