Re: [HACKERS] aggregation memory leak and fix - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] aggregation memory leak and fix
Date
Msg-id 28066.921948538@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] aggregation memory leak and fix  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [HACKERS] aggregation memory leak and fix
List pgsql-hackers
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> My only quick solution would seem to be to add a new "expression" memory
> context, that can be cleared after every tuple is processed, clearing
> out temporary values allocated inside an expression.

Right, this whole problem of growing backend memory use during a large
SELECT (or COPY, or probably a few other things) is one of the things
that we were talking about addressing by revising the memory management
structure.

I think what we want inside the executor is a distinction between
storage that must live to the end of the statement and storage that is
only needed while processing the current tuple.  The second kind of
storage would go into a separate context that gets flushed every so
often.  (It could be every tuple, or every dozen or hundred tuples
depending on what seems the best tradeoff of cycles against memory
usage.)

I'm not sure that just two contexts is enough, either.  For example inSELECT field1, SUM(field2) GROUP BY field1;
the working memory for the SUM aggregate could not be released after
each tuple, but perhaps we don't want it to live for the whole statement
either --- in that case we'd need a per-group context.  (This particular
example isn't very convincing, because the same storage for the SUM
*could* be recycled from group to group.  But I don't know whether it
actually *is* reused or not.  If fresh storage is palloc'd for each
instantiation of SUM then we have a per-group leak in this scenario.
In any case, I'm not sure all aggregate functions have constant memory
requirements that would let them recycle storage across groups.)

What we need to do is work out what the best set of memory context
definitions is, and then decide on a strategy for making sure that
lower-level routines allocate their return values in the right context.
It'd be nice if the lower-level routines could still call palloc() and
not have to worry about this explicitly --- otherwise we'll break not
only a lot of our own code but perhaps a lot of user code.  (User-
specific data types and SPI code all use palloc, no?)

I think it is too late to try to fix this for 6.5, but it ought to be a
top priority for 6.6.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] CVS target for docs
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Removing derived files from CVS