Re: BUG #15321: XMLTABLE leaks memory like crazy - Mailing list pgsql-bugs

From Pavel Stehule
Subject Re: BUG #15321: XMLTABLE leaks memory like crazy
Date
Msg-id CAFj8pRDWORKuuLEvFN-tMp06nuD5g_+yPAWWVW7umanoriEwwg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15321: XMLTABLE leaks memory like crazy  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: BUG #15321: XMLTABLE leaks memory like crazy  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-bugs


2018-08-12 11:44 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 >> We still need a per-result-tuple memory context otherwise we're
 >> leaking whatever memory got allocated in each GetValue call into the
 >> per-input-value context. (We can use our projection econtext's
 >> per-tuple memory for this because we haven't done any evaluation of
 >> output items for the current cycle yet at the point this code is
 >> reached.)
 >>
 >> Again, look at functionscan for how this is supposed to work.

 Pavel> it is done by tuplestore_putvalues

No, tuplestore_putvalues is only responsible for the memory it allocates
itself, which belongs to the tuplestore, it has nothing to do with the
memory allocated by its caller - and XmlTableGetValue does quite a few
allocations.

Maybe I used wrong words. Sorry, my English lang is not good.

In master tfuncLoadRows switch to tstate->perValueCxt. You change it by switch econtext->ecxt_per_tuple_memory what is wrong I thing.

If we don't change memory context, we will stay inside  perValueCxt. And this context will be cleaned outside.



It's true that the only way (I think) to get a lot of result rows from
XMLTABLE is to use a large input value, the memory usage for which will
dwarf that of the output rows (a ~128MB xml value can easily mean using
~4.5GB of backend memory, which seems quite excessively profligate), but
my tests show that using that value, the lack of a per-GetValue context
reset costs perhaps another ~500MB on top for my specific testcase:

select count(*)
  from (select ('<rec xmlns="http://x">'
                || repeat('<o><c>foo</c></o>',8000000+(i%10))
                || '</rec>')::xml as content
          from generate_series(1,10) i offset 0) s,
       xmltable(xmlnamespaces('http://x' as x),
                'x:o'
                passing s.content
                columns
                col1 text path 'x:c');

 >> It's not useless at all; it's needed to avoid excess memory usage
 >> when a single XMLTABLE() call returns many rows.

 Pavel> When this context was not necessary before, then it is not need
 Pavel> be used now. Tuplestore does all work

Before, you were using perValueCxt and resetting it once per GetValue
call. Since my patch takes perValueCxt to use for another purpose
instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
is suitable for the job (and consistent with functionscan).


I think so using cxt_per_tuple_memory is not necessary - and my patch is working too.

Just I removed MemoryContextReset(tstate->perValueCxt) after tuplestore_putvalues. It is possible, because tstate->perValueCxt is cleaned immediately in caller tfuncFetchRows function.

 
The tuplestore does not do all the work - just look at XmlTableGetValue,
and notice that it has calls to text_to_cstring, pstrdup,
appendStringInfoText, InputFunctionCall, xml_xmlnodetoxmltype, all of
which will allocate memory in the current memory context. All of that
needs to be reset on a per-output-tuple basis.

But it has own  context or it can works under perValueCxt

I think using two memory contexts is not necessary, and with just one context, the code can be simpler.

Regards

Pavel



--
Andrew (irc:RhodiumToad)

pgsql-bugs by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: BUG #15321: XMLTABLE leaks memory like crazy
Next
From: Andrew Gierth
Date:
Subject: Re: BUG #15321: XMLTABLE leaks memory like crazy