Thread: BUG #15321: XMLTABLE leaks memory like crazy

BUG #15321: XMLTABLE leaks memory like crazy

From

PG Bug reporting form

Date:

11 August 2018, 02:33:55

The following bug has been logged on the website:

Bug reference:      15321
Logged by:          Andrew Gierth
Email address:      andrew@tao11.riddles.org.uk
PostgreSQL version: 10.5
Operating system:   any
Description:

From a report on IRC:

XMLTABLE runs a lot of setup code in the per-query memory context -
resulting in allocations of copies of namespace names, other values, and
_multiple copies of the passed-in XML document_, which are not freed
anywhere.

Accordingly, virtually any lateral call to XMLTABLE on non-toy amounts of
data will blow up the server memory usage:

select count(*)
  from (select ('<rec xmlns="http://foobar">'
                ||
repeat('<obj><col1>foo</col1><col2>bar</col2></obj>',10+(i%10))
                || '</rec>')::xml as content
          from generate_series(1,1000000) i) s,
       xmltable(xmlnamespaces('http://foobar' AS x),
                'x:obj'
                passing t.content
                columns col1 text path 'x:col1'
                columns col2 text path 'x:col2'
                );
-- uses about 6GB of RAM in my tests

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

11 August 2018, 03:28:47

>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> From a report on IRC:

 PG> XMLTABLE runs a lot of setup code in the per-query memory context -
 PG> resulting in allocations of copies of namespace names, other
 PG> values, and _multiple copies of the passed-in XML document_, which
 PG> are not freed anywhere.

Alvaro, I think this comment of yours from when you committed this work
is relevant:

 >> I just pushed XMLTABLE, after some additional changes. Please test
 >> it thoroughly and report any problems.
 [...]
 >> Some changes I made:
 >> * I removed the "buildercxt" memory context. It seemed mostly
 >> pointless, and I was disturbed by the MemoryContextResetOnly().
 >> Per-value memory still uses the per-value memory context, but the
 >> rest of the stuff is in the per-query context, which should be
 >> pretty much the same.

A quick reading suggests that the per-value context should have been
changed to not be a child of "buildercxt" (which would avoid the
MemoryContextResetOnly issue - that's a good sign that you've put a
child context under the wrong parent). But the use of the per-query
context instead is exactly what causes this blowup; compare with what
nodeFunctionscan does with its "argcontext" (and the corresponding
comments in ExecMakeTableFunctionResult).

-- 
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

11 August 2018, 06:00:04

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 >>> * I removed the "buildercxt" memory context. It seemed mostly
 >>> pointless, and I was disturbed by the MemoryContextResetOnly().
 >>> Per-value memory still uses the per-value memory context, but the
 >>> rest of the stuff is in the per-query context, which should be
 >>> pretty much the same.

 Andrew> A quick reading suggests that the per-value context should have
 Andrew> been changed to not be a child of "buildercxt" (which would
 Andrew> avoid the MemoryContextResetOnly issue - that's a good sign
 Andrew> that you've put a child context under the wrong parent). But
 Andrew> the use of the per-query context instead is exactly what causes
 Andrew> this blowup; compare with what nodeFunctionscan does with its
 Andrew> "argcontext" (and the corresponding comments in
 Andrew> ExecMakeTableFunctionResult).

And here's a patch (against pg10, but I don't think the code's changed
since) that I think does it the right way.

This works by changing the meaning of perValueCxt - it's now the
per-INPUT-value context (equivalent to FunctionScan's argcontext, but I
didn't change the name for simplicity), not per-output-value; for the
latter, we use the result exprcontext's per-tuple memory like everything
else does (cf. FunctionScan).

I've verified that this fixes the leak; it also passes all other tests I
have thrown at it. Anyone see any issues with this? (CC'ing in Pavel as
original author)

I'll apply it in due course unless anyone says otherwise.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index b03d2ef762..222711bb5a 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -288,6 +288,9 @@ tfuncFetchRows(TableFuncScanState *tstate, ExprContext *econtext)
     oldcxt = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
     tstate->tupstore = tuplestore_begin_heap(false, false, work_mem);

+    MemoryContextReset(tstate->perValueCxt);
+    MemoryContextSwitchTo(tstate->perValueCxt);
+
     PG_TRY();
     {
         routine->InitOpaque(tstate,
@@ -319,8 +322,7 @@ tfuncFetchRows(TableFuncScanState *tstate, ExprContext *econtext)
     }
     PG_END_TRY();

-    /* return to original memory context, and clean up */
-    MemoryContextSwitchTo(oldcxt);
+    /* clean up and return to original memory context */

     if (tstate->opaque != NULL)
     {
@@ -328,6 +330,9 @@ tfuncFetchRows(TableFuncScanState *tstate, ExprContext *econtext)
         tstate->opaque = NULL;
     }

+    MemoryContextSwitchTo(oldcxt);
+    MemoryContextReset(tstate->perValueCxt);
+
     return;
 }

@@ -433,7 +438,7 @@ tfuncLoadRows(TableFuncScanState *tstate, ExprContext *econtext)

     ordinalitycol =
         ((TableFuncScan *) (tstate->ss.ps.plan))->tablefunc->ordinalitycol;
-    oldcxt = MemoryContextSwitchTo(tstate->perValueCxt);
+    oldcxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);

     /*
      * Keep requesting rows from the table builder until there aren't any.
@@ -496,7 +501,7 @@ tfuncLoadRows(TableFuncScanState *tstate, ExprContext *econtext)

         tuplestore_putvalues(tstate->tupstore, tupdesc, values, nulls);

-        MemoryContextReset(tstate->perValueCxt);
+        MemoryContextReset(econtext->ecxt_per_tuple_memory);
     }

     MemoryContextSwitchTo(oldcxt);

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Pavel Stehule

Date:

11 August 2018, 10:02:58

2018-08-11 5:00 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

>>> * I removed the "buildercxt" memory context. It seemed mostly
>>> pointless, and I was disturbed by the MemoryContextResetOnly().
>>> Per-value memory still uses the per-value memory context, but the
>>> rest of the stuff is in the per-query context, which should be
>>> pretty much the same.

Andrew> A quick reading suggests that the per-value context should have
Andrew> been changed to not be a child of "buildercxt" (which would
Andrew> avoid the MemoryContextResetOnly issue - that's a good sign
Andrew> that you've put a child context under the wrong parent). But
Andrew> the use of the per-query context instead is exactly what causes
Andrew> this blowup; compare with what nodeFunctionscan does with its
Andrew> "argcontext" (and the corresponding comments in
Andrew> ExecMakeTableFunctionResult).

And here's a patch (against pg10, but I don't think the code's changed
since) that I think does it the right way.

This works by changing the meaning of perValueCxt - it's now the
per-INPUT-value context (equivalent to FunctionScan's argcontext, but I
didn't change the name for simplicity), not per-output-value; for the
latter, we use the result exprcontext's per-tuple memory like everything
else does (cf. FunctionScan).

I've verified that this fixes the leak; it also passes all other tests I
have thrown at it. Anyone see any issues with this? (CC'ing in Pavel as
original author)

I'll apply it in due course unless anyone says otherwise.

I'll look there this evening.

I am not sure if combination

+ MemoryContextReset(tstate->perValueCxt);
+ MemoryContextSwitchTo(tstate->perValueCxt);

is valid. Usually MemoryContext is reset after some action, not before. But now, I have not time to look there

Regards

Pavel

--
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Pavel Stehule

Date:

12 August 2018, 10:21:12

2018-08-11 9:02 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

2018-08-11 5:00 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

>>> * I removed the "buildercxt" memory context. It seemed mostly
>>> pointless, and I was disturbed by the MemoryContextResetOnly().
>>> Per-value memory still uses the per-value memory context, but the
>>> rest of the stuff is in the per-query context, which should be
>>> pretty much the same.

Andrew> A quick reading suggests that the per-value context should have
Andrew> been changed to not be a child of "buildercxt" (which would
Andrew> avoid the MemoryContextResetOnly issue - that's a good sign
Andrew> that you've put a child context under the wrong parent). But
Andrew> the use of the per-query context instead is exactly what causes
Andrew> this blowup; compare with what nodeFunctionscan does with its
Andrew> "argcontext" (and the corresponding comments in
Andrew> ExecMakeTableFunctionResult).

And here's a patch (against pg10, but I don't think the code's changed
since) that I think does it the right way.

This works by changing the meaning of perValueCxt - it's now the
per-INPUT-value context (equivalent to FunctionScan's argcontext, but I
didn't change the name for simplicity), not per-output-value; for the
latter, we use the result exprcontext's per-tuple memory like everything
else does (cf. FunctionScan).

I've verified that this fixes the leak; it also passes all other tests I
have thrown at it. Anyone see any issues with this? (CC'ing in Pavel as
original author)

I'll apply it in due course unless anyone says otherwise.

I'll look there this evening.

I am not sure if combination

+ MemoryContextReset(tstate->perValueCxt);
+ MemoryContextSwitchTo(tstate->perValueCxt);

is valid. Usually MemoryContext is reset after some action, not before. But now, I have not time to look there

Regards

+   MemoryContextReset(tstate->perValueCxt);
+   MemoryContextSwitchTo(tstate->perValueCxt);
+
    PG_TRY();

The reset of memory context is useless, because the reset of perValueCxt is there already on the end of tfuncFetchRows function

I don't understand to using tuple memory context

        ((TableFuncScan *) (tstate->ss.ps.plan))->tablefunc->ordinalitycol;
-   oldcxt = MemoryContextSwitchTo(tstate->perValueCxt);
+   oldcxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);

    /*
     * Keep requesting rows from the table builder until there aren't any.
@@ -493,7 +498,7 @@ tfuncLoadRows(TableFuncScanState *tstate, ExprContext *econtext)

        tuplestore_putvalues(tstate->tupstore, tupdesc, values, nulls);

-       MemoryContextReset(tstate->perValueCxt);
+       MemoryContextReset(econtext->ecxt_per_tuple_memory);
    }

When we are running under perValueCxt, then there changing memory context is useless

I modified your patch. Please, check it.

Regards

Pavel

Pavel

--
Andrew (irc:RhodiumToad)

Attachment

xmlmem-2.patch

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

12 August 2018, 10:38:06

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 Pavel>  +   MemoryContextReset(tstate->perValueCxt);
 Pavel> +   MemoryContextSwitchTo(tstate->perValueCxt);
 Pavel> +
 Pavel>     PG_TRY();

 Pavel> The reset of memory context is useless, because the reset of
 Pavel> perValueCxt is there already on the end of tfuncFetchRows
 Pavel> function

It's overkill, yes, but it's also harmless because resetting a context
that's not been touched since the last reset has very little overhead.

 Pavel> I don't understand to using tuple memory context

We still need a per-result-tuple memory context otherwise we're leaking
whatever memory got allocated in each GetValue call into the
per-input-value context. (We can use our projection econtext's per-tuple
memory for this because we haven't done any evaluation of output items
for the current cycle yet at the point this code is reached.)

Again, look at functionscan for how this is supposed to work.

 Pavel> When we are running under perValueCxt, then there changing
 Pavel> memory context is useless

It's not useless at all; it's needed to avoid excess memory usage when a
single XMLTABLE() call returns many rows.

-- 
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Pavel Stehule

Date:

12 August 2018, 11:52:05

2018-08-12 9:38 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

Pavel> + MemoryContextReset(tstate->perValueCxt);
Pavel> + MemoryContextSwitchTo(tstate->perValueCxt);
Pavel> +
Pavel> PG_TRY();

Pavel> The reset of memory context is useless, because the reset of
Pavel> perValueCxt is there already on the end of tfuncFetchRows
Pavel> function

It's overkill, yes, but it's also harmless because resetting a context
that's not been touched since the last reset has very little overhead.

Pavel> I don't understand to using tuple memory context

We still need a per-result-tuple memory context otherwise we're leaking
whatever memory got allocated in each GetValue call into the
per-input-value context. (We can use our projection econtext's per-tuple
memory for this because we haven't done any evaluation of output items
for the current cycle yet at the point this code is reached.)

Again, look at functionscan for how this is supposed to work.

it is done by tuplestore_putvalues

Pavel> When we are running under perValueCxt, then there changing
Pavel> memory context is useless

It's not useless at all; it's needed to avoid excess memory usage when a
single XMLTABLE() call returns many rows.

When this context was not necessary before, then it is not need be used now. Tuplestore does all work

--
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Pavel Stehule

Date:

12 August 2018, 11:59:10

2018-08-12 9:38 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

Pavel> + MemoryContextReset(tstate->perValueCxt);
Pavel> + MemoryContextSwitchTo(tstate->perValueCxt);
Pavel> +
Pavel> PG_TRY();

Pavel> The reset of memory context is useless, because the reset of
Pavel> perValueCxt is there already on the end of tfuncFetchRows
Pavel> function

It's overkill, yes, but it's also harmless because resetting a context
that's not been touched since the last reset has very little overhead.

The overhead is +/- zero. But the code will work exactly same without this line. So it is little bit confusing to use there.

Pavel> I don't understand to using tuple memory context

We still need a per-result-tuple memory context otherwise we're leaking
whatever memory got allocated in each GetValue call into the
per-input-value context. (We can use our projection econtext's per-tuple
memory for this because we haven't done any evaluation of output items
for the current cycle yet at the point this code is reached.)

Again, look at functionscan for how this is supposed to work.

Pavel> When we are running under perValueCxt, then there changing
Pavel> memory context is useless

It's not useless at all; it's needed to avoid excess memory usage when a
single XMLTABLE() call returns many rows.

--
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

12 August 2018, 12:44:18

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 >> We still need a per-result-tuple memory context otherwise we're
 >> leaking whatever memory got allocated in each GetValue call into the
 >> per-input-value context. (We can use our projection econtext's
 >> per-tuple memory for this because we haven't done any evaluation of
 >> output items for the current cycle yet at the point this code is
 >> reached.)
 >> 
 >> Again, look at functionscan for how this is supposed to work.

 Pavel> it is done by tuplestore_putvalues

No, tuplestore_putvalues is only responsible for the memory it allocates
itself, which belongs to the tuplestore, it has nothing to do with the
memory allocated by its caller - and XmlTableGetValue does quite a few
allocations.

It's true that the only way (I think) to get a lot of result rows from
XMLTABLE is to use a large input value, the memory usage for which will
dwarf that of the output rows (a ~128MB xml value can easily mean using
~4.5GB of backend memory, which seems quite excessively profligate), but
my tests show that using that value, the lack of a per-GetValue context
reset costs perhaps another ~500MB on top for my specific testcase:

select count(*)
  from (select ('<rec xmlns="http://x">'
                || repeat('<o><c>foo</c></o>',8000000+(i%10))
                || '</rec>')::xml as content
          from generate_series(1,10) i offset 0) s,
       xmltable(xmlnamespaces('http://x' as x),
                'x:o'
                passing s.content
                columns
                col1 text path 'x:c');

 >> It's not useless at all; it's needed to avoid excess memory usage
 >> when a single XMLTABLE() call returns many rows.

 Pavel> When this context was not necessary before, then it is not need
 Pavel> be used now. Tuplestore does all work

Before, you were using perValueCxt and resetting it once per GetValue
call. Since my patch takes perValueCxt to use for another purpose
instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
is suitable for the job (and consistent with functionscan).

The tuplestore does not do all the work - just look at XmlTableGetValue,
and notice that it has calls to text_to_cstring, pstrdup,
appendStringInfoText, InputFunctionCall, xml_xmlnodetoxmltype, all of
which will allocate memory in the current memory context. All of that
needs to be reset on a per-output-tuple basis.

-- 
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Pavel Stehule

Date:

12 August 2018, 13:02:31

2018-08-12 11:44 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

>> We still need a per-result-tuple memory context otherwise we're
>> leaking whatever memory got allocated in each GetValue call into the
>> per-input-value context. (We can use our projection econtext's
>> per-tuple memory for this because we haven't done any evaluation of
>> output items for the current cycle yet at the point this code is
>> reached.)
>>
>> Again, look at functionscan for how this is supposed to work.

Pavel> it is done by tuplestore_putvalues

No, tuplestore_putvalues is only responsible for the memory it allocates
itself, which belongs to the tuplestore, it has nothing to do with the
memory allocated by its caller - and XmlTableGetValue does quite a few
allocations.

Maybe I used wrong words. Sorry, my English lang is not good.

In master tfuncLoadRows switch to tstate->perValueCxt. You change it by switch econtext->ecxt_per_tuple_memory what is wrong I thing.

If we don't change memory context, we will stay inside perValueCxt. And this context will be cleaned outside.

It's true that the only way (I think) to get a lot of result rows from
XMLTABLE is to use a large input value, the memory usage for which will
dwarf that of the output rows (a ~128MB xml value can easily mean using
~4.5GB of backend memory, which seems quite excessively profligate), but
my tests show that using that value, the lack of a per-GetValue context
reset costs perhaps another ~500MB on top for my specific testcase:

select count(*)
from (select ('<rec xmlns="http://x">'
|| repeat('<o><c>foo</c></o>',8000000+(i%10))
|| '</rec>')::xml as content
from generate_series(1,10) i offset 0) s,
xmltable(xmlnamespaces('http://x' as x),
'x:o'
passing s.content
columns
col1 text path 'x:c');

>> It's not useless at all; it's needed to avoid excess memory usage
>> when a single XMLTABLE() call returns many rows.

Pavel> When this context was not necessary before, then it is not need
Pavel> be used now. Tuplestore does all work

Before, you were using perValueCxt and resetting it once per GetValue
call. Since my patch takes perValueCxt to use for another purpose
instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
is suitable for the job (and consistent with functionscan).

I think so using cxt_per_tuple_memory is not necessary - and my patch is working too.

Just I removed MemoryContextReset(tstate->perValueCxt) after tuplestore_putvalues. It is possible, because tstate->perValueCxt is cleaned immediately in caller tfuncFetchRows function.

The tuplestore does not do all the work - just look at XmlTableGetValue,
and notice that it has calls to text_to_cstring, pstrdup,
appendStringInfoText, InputFunctionCall, xml_xmlnodetoxmltype, all of
which will allocate memory in the current memory context. All of that
needs to be reset on a per-output-tuple basis.

But it has own context or it can works under perValueCxt

I think using two memory contexts is not necessary, and with just one context, the code can be simpler.

Regards

Pavel

--
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

12 August 2018, 13:25:39

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 >> No, tuplestore_putvalues is only responsible for the memory it
 >> allocates itself, which belongs to the tuplestore, it has nothing to
 >> do with the memory allocated by its caller - and XmlTableGetValue
 >> does quite a few allocations.

 Pavel> Maybe I used wrong words. Sorry, my English lang is not good.

 Pavel> In master tfuncLoadRows switch to tstate->perValueCxt. You
 Pavel> change it by switch econtext->ecxt_per_tuple_memory what is
 Pavel> wrong I thing.

 Pavel> If we don't change memory context, we will stay inside
 Pavel> perValueCxt. And this context will be cleaned outside.

Yes, but it won't be cleaned up until _all_ the rows from a single call
have been generated, which means that the allocations from GetValue have
been (uselessly) accumulating during this time, wasting memory.

We do need a context that is reset for every output row, as perValueCxt
used to be. I don't see why this is even in question.

 >> Before, you were using perValueCxt and resetting it once per GetValue
 >> call. Since my patch takes perValueCxt to use for another purpose
 >> instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
 >> is suitable for the job (and consistent with functionscan).

 Pavel> I think so using cxt_per_tuple_memory is not necessary - and my
 Pavel> patch is working too.

Working, just using more memory than it needs to.

 Pavel> Just I removed MemoryContextReset(tstate->perValueCxt) after
 Pavel> tuplestore_putvalues. It is possible, because
 Pavel> tstate->perValueCxt is cleaned immediately in caller
 Pavel> tfuncFetchRows function.

But that's not "immediately" because tfuncLoadRows is looping over the
FetchRow call, and calling GetValue for each column in that row, and in
your version it is _not cleaning up memory in that loop_.

-- 
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Pavel Stehule

Date:

12 August 2018, 13:39:44

2018-08-12 12:25 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

>> No, tuplestore_putvalues is only responsible for the memory it
>> allocates itself, which belongs to the tuplestore, it has nothing to
>> do with the memory allocated by its caller - and XmlTableGetValue
>> does quite a few allocations.

Pavel> Maybe I used wrong words. Sorry, my English lang is not good.

Pavel> In master tfuncLoadRows switch to tstate->perValueCxt. You
Pavel> change it by switch econtext->ecxt_per_tuple_memory what is
Pavel> wrong I thing.

Pavel> If we don't change memory context, we will stay inside
Pavel> perValueCxt. And this context will be cleaned outside.

Yes, but it won't be cleaned up until _all_ the rows from a single call
have been generated, which means that the allocations from GetValue have
been (uselessly) accumulating during this time, wasting memory.

We do need a context that is reset for every output row, as perValueCxt
used to be. I don't see why this is even in question.

>> Before, you were using perValueCxt and resetting it once per GetValue
>> call. Since my patch takes perValueCxt to use for another purpose
>> instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
>> is suitable for the job (and consistent with functionscan).

Pavel> I think so using cxt_per_tuple_memory is not necessary - and my
Pavel> patch is working too.

Working, just using more memory than it needs to.

Pavel> Just I removed MemoryContextReset(tstate->perValueCxt) after
Pavel> tuplestore_putvalues. It is possible, because
Pavel> tstate->perValueCxt is cleaned immediately in caller
Pavel> tfuncFetchRows function.

But that's not "immediately" because tfuncLoadRows is looping over the
FetchRow call, and calling GetValue for each column in that row, and in
your version it is _not cleaning up memory in that loop_.

ok, now I am maybe understand to your motivation.

Usually, loading row should be memory cheap operation, but sure some bytes it can take.

Then I don't like too much using ecxt_per_tuple_memory for this. Maybe better to create own short life context for purpose. Or do better comments about using this memory context for very short life task, please.

Regards

Pavel

--
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

12 August 2018, 14:18:07

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 >> But that's not "immediately" because tfuncLoadRows is looping over
 >> the FetchRow call, and calling GetValue for each column in that row,
 >> and in your version it is _not cleaning up memory in that loop_.

 Pavel> ok, now I am maybe understand to your motivation.

 Pavel> Usually, loading row should be memory cheap operation, but sure
 Pavel> some bytes it can take.

Yes, it'll usually be small, but you shouldn't assume that (and some
type input functions may use more memory than you think, since doing a
lot of retail pfree() calls can really slow things down).

 Pavel> Then I don't like too much using ecxt_per_tuple_memory for this.

It's there, it has the right lifetime, allocating another one just for
this is a waste. Furthermore, this is the same pattern that FunctionScan
uses, so it's more consistent.

 Pavel> Or do better comments about using this memory context for very
 Pavel> short life task, please.

What specifically do you think needs explaining?

Attached patch is the same logic as before but with more comments.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index b03d2ef762..d2380995aa 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -288,6 +288,17 @@ tfuncFetchRows(TableFuncScanState *tstate, ExprContext *econtext)
     oldcxt = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
     tstate->tupstore = tuplestore_begin_heap(false, false, work_mem);
 
+    /*
+     * Each call to fetch a new set of rows - of which there may be very many
+     * if XMLTABLE is being used in a lateral join - will allocate a possibly
+     * substantial amount of memory, so we cannot use the per-query context
+     * here. perValueCxt now serves the same function as "argcontext" does in
+     * FunctionScan - a place to store per-call lifetime data (as opposed to
+     * per-query or per-result-tuple).
+     */
+    MemoryContextReset(tstate->perValueCxt);
+    MemoryContextSwitchTo(tstate->perValueCxt);
+
     PG_TRY();
     {
         routine->InitOpaque(tstate,
@@ -319,8 +330,7 @@ tfuncFetchRows(TableFuncScanState *tstate, ExprContext *econtext)
     }
     PG_END_TRY();
 
-    /* return to original memory context, and clean up */
-    MemoryContextSwitchTo(oldcxt);
+    /* clean up and return to original memory context */
 
     if (tstate->opaque != NULL)
     {
@@ -328,6 +338,9 @@ tfuncFetchRows(TableFuncScanState *tstate, ExprContext *econtext)
         tstate->opaque = NULL;
     }
 
+    MemoryContextSwitchTo(oldcxt);
+    MemoryContextReset(tstate->perValueCxt);
+
     return;
 }
 
@@ -433,7 +446,14 @@ tfuncLoadRows(TableFuncScanState *tstate, ExprContext *econtext)
 
     ordinalitycol =
         ((TableFuncScan *) (tstate->ss.ps.plan))->tablefunc->ordinalitycol;
-    oldcxt = MemoryContextSwitchTo(tstate->perValueCxt);
+
+    /*
+     * We need a short-lived memory context that we can clean up each time
+     * around the loop, to avoid wasting space. Our default per-tuple context
+     * is fine for the job, since we won't have used it for anything yet in
+     * this tuple cycle.
+     */
+    oldcxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 
     /*
      * Keep requesting rows from the table builder until there aren't any.
@@ -496,7 +516,7 @@ tfuncLoadRows(TableFuncScanState *tstate, ExprContext *econtext)
 
         tuplestore_putvalues(tstate->tupstore, tupdesc, values, nulls);
 
-        MemoryContextReset(tstate->perValueCxt);
+        MemoryContextReset(econtext->ecxt_per_tuple_memory);
     }
 
     MemoryContextSwitchTo(oldcxt);

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Pavel Stehule

Date:

12 August 2018, 15:27:56

2018-08-12 13:18 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

>> But that's not "immediately" because tfuncLoadRows is looping over
>> the FetchRow call, and calling GetValue for each column in that row,
>> and in your version it is _not cleaning up memory in that loop_.

Pavel> ok, now I am maybe understand to your motivation.

Pavel> Usually, loading row should be memory cheap operation, but sure
Pavel> some bytes it can take.

Yes, it'll usually be small, but you shouldn't assume that (and some
type input functions may use more memory than you think, since doing a
lot of retail pfree() calls can really slow things down).

Pavel> Then I don't like too much using ecxt_per_tuple_memory for this.

It's there, it has the right lifetime, allocating another one just for
this is a waste. Furthermore, this is the same pattern that FunctionScan
uses, so it's more consistent.

Pavel> Or do better comments about using this memory context for very
Pavel> short life task, please.

What specifically do you think needs explaining?

I don't feel well, when context named GetValue has longer life than ecxt_tuple_memory context. I understand so it is not important in SRF function, but it looks strange for me.

It is my subjective option, and I have not any strong arguments for it. If commiter think so it is ok, then I can live with it :)

Attached patch is the same logic as before but with more comments.

Regards

Pavel

--
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Tom Lane

Date:

12 August 2018, 18:11:32

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Attached patch is the same logic as before but with more comments.

This looks generally reasonable to me, but what's the point of doing
two MemoryContextReset calls in tfuncFetchRows?  Doing one at the
end should be sufficient to guarantee that the context is empty already
at the next call.

Also, I'd be inclined to rename "perValueCxt" to something else,
it's not really correctly named for this usage pattern.

            regards, tom lane

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

13 August 2018, 00:30:33

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> Also, I'd be inclined to rename "perValueCxt" to something else,
 Tom> it's not really correctly named for this usage pattern.

perCallCxt? or do you have a better name?

-- 
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Tom Lane

Date:

13 August 2018, 01:12:11

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Also, I'd be inclined to rename "perValueCxt" to something else,
>  Tom> it's not really correctly named for this usage pattern.

> perCallCxt? or do you have a better name?

I was hoping to avoid being pinned down on that ;-).  But maybe
fetchRowsCxt or tableBuildCxt or something along that line?
It's not very clear what "perCall" is per call of, so I'm not
in love with that idea.

            regards, tom lane

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

13 August 2018, 01:43:58

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> perCallCxt? or do you have a better name?

 Tom> I was hoping to avoid being pinned down on that ;-).

Well, they do say there are 2 hard problems in computing: finding names
for things, cache invalidation, and off-by-one errors :-)

 Tom> But maybe fetchRowsCxt or tableBuildCxt or something along that
 Tom> line? It's not very clear what "perCall" is per call of, so I'm
 Tom> not in love with that idea.

Meh. perTableCxt?

-- 
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Andrew Gierth

Date:

13 August 2018, 04:22:46

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Tom> But maybe fetchRowsCxt or tableBuildCxt or something along that
 Tom> line? It's not very clear what "perCall" is per call of, so I'm
 Tom> not in love with that idea.

 Andrew> Meh. perTableCxt?

Pushed it with perTableCxt as the name.

-- 
Andrew (irc:RhodiumToad)

Re: BUG #15321: XMLTABLE leaks memory like crazy

From

Alvaro Herrera

Date:

13 August 2018, 16:38:46

On 2018-Aug-13, Andrew Gierth wrote:

> >>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> 
>  Tom> But maybe fetchRowsCxt or tableBuildCxt or something along that
>  Tom> line? It's not very clear what "perCall" is per call of, so I'm
>  Tom> not in love with that idea.
> 
>  Andrew> Meh. perTableCxt?
> 
> Pushed it with perTableCxt as the name.

Thanks for taking care of this!  Much appreciated.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services