On Sat, Mar 14, 2020 at 02:41:09PM -0400, James Coleman wrote:
>
>It looks like the issue is actually into the `tuplecontext`, which is
>currently a child context of `sortcontext`:
>
>#3 0x0000558cd153b565 in AllocSetCheck
>(context=context@entry=0x558cd28e0b70) at aset.c:1573
>1573 Assert(total_allocated == context->mem_allocated);
>(gdb) p total_allocated
>$1 = 16384
>(gdb) p context->mem_allocated
>$2 = 8192
>(gdb) p context->name
>$3 = 0x558cd16c8ccd "Caller tuples"
>
>I stuck in several more AllocSetCheck calls in aset.c and got the
>attached backtrace.
>
I think the problem is pretty simple - tuplesort_reset does call
tuplesort_reset, which resets the sortcontext. But that *deletes* the
tuplecontext, so the state->tuplecontext gets stale. I'd haven't looked
into the exact details, but it clearly confuses the accouting.
The attached patch fixes the issue for me - I'm not claiming it's the
right fix, but it's the simplest thing I could think of. Maybe the
tuplesort_rest should work differently, not sure.
And it seems to resolve the memory leak too - I suspect we've freed the
context (so it was not part of the tree of contexts) but the struct was
still valid and we kept allocating memory in it - but it was invisible
to MemoryContextDump etc.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services