Home > mailing lists

Re: Getting better results from valgrind leak tracking - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Getting better results from valgrind leak tracking
Date	March 18, 2021 00:26:56
Msg-id	20210318002656.7fezwucw4qdlivzk@alap3.anarazel.de Whole thread Raw
In response to	Re: Getting better results from valgrind leak tracking (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Getting better results from valgrind leak tracking
List	pgsql-hackers

Tree view

Hi,

On 2021-03-17 18:07:36 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> >> I found a way around that late last night. Need to mark the context
> >> itself as an allocation. But I made a mess on the way to that and need
> >> to clean the patch up before sending it (and need to drop my
> >> girlfriend off first).
> 
> > Unfortunately I didn't immediately find a way to do this while keeping
> > the MEMPOOL_CREATE/DESTROY in mcxt.c. The attached patch moves the pool
> > creation into the memory context implementations, "allocates" the
> > context itself as part of that pool, and changes the reset
> > implementation from MEMPOOL_DESTROY + MEMPOOL_CREATE to instead do
> > MEMPOOL_TRIM. That leaves the memory context itself valid (and thus
> > tracking ->ident etc), but marks all the other memory as freed.
> 
> Huh, interesting.  I wonder why that makes the ident problem go away?
> I'd supposed that valgrind would see the context headers as ordinary
> memory belonging to the global "malloc" pool, so that any pointers
> inside them ought to be considered valid.

I'm didn't quite understand either at the time of writing the change. It
just seemed the only explanation for some the behaviour I was seeing, so
I tried it. Just to be initially be rebuffed due to errors when
accessing the recycled sets...

I spent a bit of time looking at valgrind, and it looks to be explicit
behaviour:

memcheck/mc_leakcheck.c

static MC_Chunk**
find_active_chunks(Int* pn_chunks)
{
   // Our goal is to construct a set of chunks that includes every
   // mempool chunk, and every malloc region that *doesn't* contain a
   // mempool chunk.
...
   // Then we loop over the mempool tables. For each chunk in each
   // pool, we set the entry in the Bool array corresponding to the
   // malloc chunk containing the mempool chunk.
   VG_(HT_ResetIter)(MC_(mempool_list));
   while ( (mp = VG_(HT_Next)(MC_(mempool_list))) ) {
      VG_(HT_ResetIter)(mp->chunks);
      while ( (mc = VG_(HT_Next)(mp->chunks)) ) {

         // We'll need to record this chunk.
         n_chunks++;

         // Possibly invalidate the malloc holding the beginning of this chunk.
         m = find_chunk_for(mc->data, mallocs, n_mallocs);
         if (m != -1 && malloc_chunk_holds_a_pool_chunk[m] == False) {
            tl_assert(n_chunks > 0);
            n_chunks--;
            malloc_chunk_holds_a_pool_chunk[m] = True;
         }

I think that means it explicitly ignores the entire malloced allocation
whenever there's *any* mempool chunk in it, instead considering only the
mempool chunks. So once aset allocats something in the init block, the
context itself is ignored for leak checking purposes. But that wouldn't
be the case if we didn't have the init block...

I guess that makes sense, but would definitely be nice to have had
documented...

> Anyway, I don't have a problem with rearranging the responsibility
> like this.  It gives the individual allocators more freedom to do
> odd stuff, at the cost of very minor duplication of valgrind calls.

Yea, similar.

> I agree we need more comments -- would you like me to have a go at
> writing them?

Gladly!

> One thing I was stewing over last night is that a MemoryContextReset
> will mess up any context identifier assigned with
> MemoryContextCopyAndSetIdentifier.

Valgrind should catch such problems. Not having the danger would be
better, of course.

We could also add an assertion at reset time that the identifier isn't
in the current context.

> The very simplest fix would be to allocate non-constant idents with
> malloc; which'd require adding a flag to track whether context->ident
> needs to be free()d.  We have room for another bool near the top of
> struct MemoryContextData (and at some point we could turn those
> bool fields into a flags word).  The only real cost here is one
> more free() while destroying a labeled context, which is probably
> negligible.

Hm. A separate malloc()/free() could conceivably actually show up in
profiles at some point.

What if we instead used that flag to indicate that MemoryContextReset()
needs to save the identifier? Then any cost would only be paid if the
context is actually reset.

> > There are a few leak warnings around guc.c that look like they might be
> > real, not false positives, and thus a bit concerning. Looks like several
> > guc check hooks don't bother to free the old *extra before allocating a
> > new one.
> 
> I'll take a look, but I'm pretty certain that guc.c, not the hooks, is
> responsible for freeing those.

Yea, I had misattributed the leak to the callbacks. One of the things I
saw definitely is a leak: if call_string_check_hook() ereport(ERRORs)
the guc_strdup() of newval->stringval is lost.

There's another set of them that seems to be related to paralellism. But
I've not hunted it down yet.

I think it might be worth adding a VALGRIND_DO_CHANGED_LEAK_CHECK() at
the end of a transaction or such? That way it'd not be quite as hard to
pinpoint the source of a leak to individual statements as it is right
now.

> Might be another case of valgrind not understanding what's happening.

Those allocations seem very straightforward, plain malloc/free that is
referenced by a pointer to the start of the allocation. So that doesn't
seem very likely?

> > I suspect we might get better results from valgrind, not just for leaks
> > but also undefined value tracking, if we changed the way we represent
> > pools to utilize VALGRIND_MEMPOOL_METAPOOL |
> > VALGRIND_MEMPOOL_AUTO_FREE.
> 
> Don't really see why that'd help?  I mean, it could conceivably help
> catch bugs in the allocators themselves, but I don't follow the argument
> that it'd improve anything else.  Defined is defined, as far as I can
> tell from the valgrind manual.

I think it might improve the attribution of some use-after-free errors
to the memory context. Looks like it also might reduce the cost of
running valgrind a bit.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Tom Lane
Date: 17 March 2021, 23:36:46
Subject: Re: replication slot stats memory bug

From: Thomas Munro
Date: 18 March 2021, 00:54:18
Subject: Re: WIP: WAL prefetch (another approach)

Re: Getting better results from valgrind leak tracking - Mailing list pgsql-hackers

Previous

Next