Re: Why our Valgrind reports suck - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Why our Valgrind reports suck
Date
Msg-id 454130.1746812692@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2025-05-08 22:04:06 -0400, Tom Lane wrote:
>> A nearby thread [1] reminded me to wonder why we seem to have
>> so many false-positive leaks reported by Valgrind these days.

> Huh. We use the memory pool client requests to inform valgrind about memory
> contexts. I seem to recall that that "hid" many leak warnings from valgrind. I
> wonder if we somehow broke (or weakened) that.

The problem with dynahash has been there since day one, so I think
we've just gotten used to ignoring "leak" reports associated with
hash tables --- I have, anyway.  But the other triggers I listed
have appeared within the last five-ish years, if memory serves,
and we just didn't notice because of the existing dynahash noise.

> We currently don't reset TopMemoryContext at exit, which, obviously, does
> massively increase the number of leaks. But OTOH, without that there's not a
> whole lot of value in the leak check...

Hmm.  Yeah, we could just reset or delete TopMemoryContext, but
surely that would be counterproductive.  It would mask any real
leak of palloc'd blocks.  I'm a little suspicious of your other
idea of shutting down the caches, for the same reason: I wonder
if it wouldn't hide leaks rather than help find them.

One thing I noticed while reading the Valgrind manual is that
they describe a facility for "two level" tracking of custom
allocators such as ours.  Apparently, what you're really supposed
to do is use VALGRIND_MEMPOOL_ALLOC to mark the malloc blocks
that the allocator works in, and VALGRIND_MALLOCLIKE_BLOCK to
mark the sub-blocks handed out by the allocator.  I wonder if this
feature postdates our implementation of Valgrind support, and I
wonder even more if using it would improve our results.

I did experiment with marking context headers as accessible with
VALGRIND_MEMPOOL_ALLOC, and that made the complaints about
MemoryContextCopyAndSetIdentifier strings go away, confirming
that Valgrind is simply not considering the context->ident
pointers.  Unfortunately it also added a bunch of other failures,
so that evidently is Not The Right Thing.  I suspect what is
going on is related to this bit in valgrind.h:

   For Memcheck users: if you use VALGRIND_MALLOCLIKE_BLOCK to carve out
   custom blocks from within a heap block, B, that has been allocated with
   malloc/calloc/new/etc, then block B will be *ignored* during leak-checking
   -- the custom blocks will take precedence.

We're not using VALGRIND_MALLOCLIKE_BLOCK (yet, anyway), but I'm
suspecting that Valgrind probably also ignores heap blocks that
match VALGRIND_CREATE_MEMPOOL requests, except for the portions
thereof that are covered by VALGRIND_MEMPOOL_ALLOC requests.

Anyway, I'm now feeling motivated to go try some experiments.
Watch this space ...

            regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Next
From: Sami Imseih
Date:
Subject: Re: queryId constant squashing does not support prepared statements