Thread: Potential memory leak in contrib/intarray's g_intbig_compress

Potential memory leak in contrib/intarray's g_intbig_compress

From
Matthias van de Meent
Date:
Hi,

My collegue Konstantin (cc-ed) noticed that the GiST code of intarray
may leak memory in certain index operations:

> g_intbig_compress(...):
> [...]
>         ArrayType  *in = DatumGetArrayTypeP(entry->key);
> [...]
>         if (in != DatumGetArrayTypeP(entry->key))
>             pfree(in);

DatumGetArrayTypeP will allocate a new, uncompressed copy if the given
Datum is compressed. So in this code, if entry->key is compressed we'd
allocate two decompressed copies, while only we only deallocate the
first of these two. I believe the attached patch fixes the issue.

It looks like this bug has existed since the code was first committed,
so backpatching would go back to 11 if this is an active issue.

Kind regards,

Matthias van de Meent

Attachment

Re: Potential memory leak in contrib/intarray's g_intbig_compress

From
Tom Lane
Date:
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> My collegue Konstantin (cc-ed) noticed that the GiST code of intarray
> may leak memory in certain index operations:

Can you demonstrate an actual problem here, that is a query-lifespan leak?

IMO, the coding rule in the GiST and GIN AMs is that the AM code is
responsible for running all opclass-supplied functions in suitably
short-lived memory contexts, so that leaks within those functions
don't cause problems.  This is different from btree which requires
comparison functions to not leak.  The rationale for having different
conventions is that btree comparison functions are typically simple
enough to be able to deal with such a restriction, whereas GiST
and GIN opclasses are often complex critters for which it'd be too
bug-prone to insist on leakproofness.  So it seems worth the cost
to make the AM itself set up a throwaway memory context.

(I don't recall offhand about which rule the other AMs use.
I'm also not sure where or if this choice is documented.)

In the case at hand, I think the pfree is useless and was installed
by somebody who had mis-extrapolated from btree rules.  So what
I'm thinking would be sufficient is to drop it altogether:

-        if (in != DatumGetArrayTypeP(entry->key))
-             pfree(in);

            regards, tom lane



Re: Potential memory leak in contrib/intarray's g_intbig_compress

From
Tom Lane
Date:
I wrote:
> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
>> My collegue Konstantin (cc-ed) noticed that the GiST code of intarray
>> may leak memory in certain index operations:

> Can you demonstrate an actual problem here, that is a query-lifespan leak?

> IMO, the coding rule in the GiST and GIN AMs is that the AM code is
> responsible for running all opclass-supplied functions in suitably
> short-lived memory contexts, so that leaks within those functions
> don't cause problems.

I tried adding "MemoryContextStats(CurrentMemoryContext);" at the top
of g_intbig_compress() and running the intarray regression tests
(which do reach the pfree in question).  This confirmed that the
compress function is always called in the "GiST temporary context"
made by createTempGistContext.  Also, the amount of memory reported as
consumed didn't seem to vary when I removed the pfree, which indicates
that we do manage to reset that context often enough that leakage here
doesn't matter.  It's hard to make an exact comparison because of
GiST's habit of randomizing page-split decisions, so that the sequence
of calls to the compress function isn't identical from one run to the
next.  But at least in the cases exercised by the regression tests,
we do not need that pfree --- and if you believe the comment for
createTempGistContext, it would be a GiST bug not an intarray bug
if we did.

I'll go remove the pfree.  Perhaps there is a TODO item here to
improve the documentation about these memory management conventions.

            regards, tom lane



Re: Potential memory leak in contrib/intarray's g_intbig_compress

From
Matthias van de Meent
Date:
On Thu, 13 Jul 2023 at 17:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> > My collegue Konstantin (cc-ed) noticed that the GiST code of intarray
> > may leak memory in certain index operations:
>
> Can you demonstrate an actual problem here, that is a query-lifespan leak?
>
> IMO, the coding rule in the GiST and GIN AMs is that the AM code is
> responsible for running all opclass-supplied functions in suitably
> short-lived memory contexts, so that leaks within those functions
> don't cause problems.  This is different from btree which requires
> comparison functions to not leak.  The rationale for having different
> conventions is that btree comparison functions are typically simple
> enough to be able to deal with such a restriction, whereas GiST
> and GIN opclasses are often complex critters for which it'd be too
> bug-prone to insist on leakproofness.  So it seems worth the cost
> to make the AM itself set up a throwaway memory context.
>
> (I don't recall offhand about which rule the other AMs use.
> I'm also not sure where or if this choice is documented.)
>
> In the case at hand, I think the pfree is useless and was installed
> by somebody who had mis-extrapolated from btree rules.  So what
> I'm thinking would be sufficient is to drop it altogether:
>
> -               if (in != DatumGetArrayTypeP(entry->key))
> -                       pfree(in);

Looks like it's indeed a useless pfree call here - all paths that I
could find that lead to GiST's compress procedure are encapsulated in
a temporary context that is reset fairly quickly after use (at most
this memory context would live for the duration of the recursive
splitting of pages up the tree, but I haven't verified this
hypotheses).

There are similar pfree calls in the _int_gist.c file's g_int_compress
function, which made me think we do need to clean up after use, but
indeed these pfrees are useless (or even harmful if bug #17888 can be
trusted)

Kind regards,

Matthias van de Meent.



Re: Potential memory leak in contrib/intarray's g_intbig_compress

From
Michael Paquier
Date:
On Thu, Jul 13, 2023 at 06:28:39PM +0200, Matthias van de Meent wrote:
> There are similar pfree calls in the _int_gist.c file's g_int_compress
> function, which made me think we do need to clean up after use, but
> indeed these pfrees are useless (or even harmful if bug #17888 can be
> trusted)

Indeed, all these are in a GiST temporary context.  So you'd mean
something like the attached perhaps, for both the decompress and
compress paths?
--
Michael

Attachment

Re: Potential memory leak in contrib/intarray's g_intbig_compress

From
Matthias van de Meent
Date:
On Fri, 14 Jul 2023 at 07:57, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jul 13, 2023 at 06:28:39PM +0200, Matthias van de Meent wrote:
> > There are similar pfree calls in the _int_gist.c file's g_int_compress
> > function, which made me think we do need to clean up after use, but
> > indeed these pfrees are useless (or even harmful if bug #17888 can be
> > trusted)
>
> Indeed, all these are in a GiST temporary context.  So you'd mean
> something like the attached perhaps, for both the decompress and
> compress paths?

Yes, looks good to me. Thanks!

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)