Thread: Memory leaks on SRF rescan

Memory leaks on SRF rescan

From

Neil Conway

Date:

21 February 2008, 22:16:21

Consider this test case:

    create table baz (a int, b int);
    insert into baz (select 1, generate_series(1, 3000000));

    select baz.*, (select 1 from generate_series(1, 1) g(a)
                   where g.a < baz.b)
           from baz;

The final query consumes ~800 MB of memory on my system before it
finishes executing. The guilty memory context is pretty obvious:

    ExecutorState: 612360192 total in 82 blocks; 1664004 free (117
    chunks); 610696188 used

Two different classes of allocations in the EState's per-query context
are leaked:

(1) In FunctionNext(), we call ExecMakeTableFunctionResult() to compute
the result set of the SRF, which allocates the TupleDesc out-parameter
in the per-query memory context. Since rescan of a function scan (with
chgParam != NULL) results in taking this code path again, we can leak 1
TupleDesc for each rescan of a function scan. I think this is plainly a
bug -- the first attached patch fixes it.

(2) In various SRFs, allocations are made in the "multi-call context"
but are not released before calling SRF_RETURN_DONE. This results in
leaking those allocations for the duration of the query, since the
multi-call context is the EState's per-query context. There appears to
have been some thought that the SRF author would be enlightened enough
to free allocations made in the multi-call context before calling
SRF_RETURN_DONE, but this is rarely done. From a cursory look at the
builtin SRFs, generate_series_step_int4, generate_series_step_int8,
pg_logdir_ls, ts_token_type_byid, ts_token_type_byname, ts_parse_byid,
ts_parse_byname and pg_timezone_abbrevs all get this wrong, at which
point I stopped looking further. I wouldn't think very many user-written
SRFs have gotten this right either.

We could fix this by patching all the guilty SRFs (the second attached
patch does this for the int4 and int8 variants of generate_series()),
but I think it would be more robust to arrange for the FuncCallContext's
multi-call context to have a shorter lifetime: instead of using the
EState's per-query context, we could instead use a context that is
deleted after SRF_RETURN_DONE() is called (or at a minimum, reset each
time the function scan is rescanned).

BTW, it seems to me that shutdown_MultiFuncCall() is wrong in any case:
it frees the SRF's AttInMetadata, but not any of its fields.

Thanks to my colleague Alan Li at Truviso for reporting this issue.

-Neil

Attachment

Re: Memory leaks on SRF rescan

From

Tom Lane

Date:

21 February 2008, 22:42:43

Neil Conway <neilc@samurai.com> writes:
> Two different classes of allocations in the EState's per-query context
> are leaked:

> (1) In FunctionNext(), we call ExecMakeTableFunctionResult() to compute
> the result set of the SRF, which allocates the TupleDesc out-parameter
> in the per-query memory context. Since rescan of a function scan (with
> chgParam != NULL) results in taking this code path again, we can leak 1
> TupleDesc for each rescan of a function scan. I think this is plainly a
> bug -- the first attached patch fixes it.

Given your point (2), is this worth fixing by itself?

> (2) In various SRFs, allocations are made in the "multi-call context"
> but are not released before calling SRF_RETURN_DONE.

Yeah.  I think it's hopeless to expect these functions to all hew to
the straight and narrow path.  It seems to me that the right way is for
the sub-select to somehow run in its own "per-query" context.  Not sure
about the implications of that bit of arm-waving ... it might be a bit
tricky since I think we expect the executor state tree to get set up
only once.
        regards, tom lane

Re: Memory leaks on SRF rescan

From

Neil Conway

Date:

21 February 2008, 22:53:45

On Thu, 2008-02-21 at 21:42 -0500, Tom Lane wrote:
> Given your point (2), is this worth fixing by itself?

Right, probably not.

> Yeah.  I think it's hopeless to expect these functions to all hew to
> the straight and narrow path.  It seems to me that the right way is for
> the sub-select to somehow run in its own "per-query" context.

Hmm, I was thinking of just fixing this by arranging for the
FuncCallContext's multi-call context to be a special context created by
the function scan, and that is reset/deleted at the appropriate time.
Would this not fix the issue as well?

-Neil

Re: Memory leaks on SRF rescan

From

Tom Lane

Date:

21 February 2008, 23:07:39

Neil Conway <neilc@samurai.com> writes:
> On Thu, 2008-02-21 at 21:42 -0500, Tom Lane wrote:
>> Yeah.  I think it's hopeless to expect these functions to all hew to
>> the straight and narrow path.  It seems to me that the right way is for
>> the sub-select to somehow run in its own "per-query" context.

> Hmm, I was thinking of just fixing this by arranging for the
> FuncCallContext's multi-call context to be a special context created by
> the function scan, and that is reset/deleted at the appropriate time.
> Would this not fix the issue as well?

That might work, particularly if we could arrange for all the functions
invoked in a particular subquery to share the same "per query" context.
Want to take a whack at it?
        regards, tom lane