Re: BUG #15321: XMLTABLE leaks memory like crazy - Mailing list pgsql-bugs
From | Pavel Stehule |
---|---|
Subject | Re: BUG #15321: XMLTABLE leaks memory like crazy |
Date | |
Msg-id | CAFj8pRDWORKuuLEvFN-tMp06nuD5g_+yPAWWVW7umanoriEwwg@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15321: XMLTABLE leaks memory like crazy (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: BUG #15321: XMLTABLE leaks memory like crazy
|
List | pgsql-bugs |
2018-08-12 11:44 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:
>> We still need a per-result-tuple memory context otherwise we're
>> leaking whatever memory got allocated in each GetValue call into the
>> per-input-value context. (We can use our projection econtext's
>> per-tuple memory for this because we haven't done any evaluation of
>> output items for the current cycle yet at the point this code is
>> reached.)
>>
>> Again, look at functionscan for how this is supposed to work.
Pavel> it is done by tuplestore_putvalues
No, tuplestore_putvalues is only responsible for the memory it allocates
itself, which belongs to the tuplestore, it has nothing to do with the
memory allocated by its caller - and XmlTableGetValue does quite a few
allocations.
Maybe I used wrong words. Sorry, my English lang is not good.
In master tfuncLoadRows switch to tstate->perValueCxt. You change it by switch econtext->ecxt_per_tuple_ memory what is wrong I thing.
If we don't change memory context, we will stay inside perValueCxt. And this context will be cleaned outside.
It's true that the only way (I think) to get a lot of result rows from
XMLTABLE is to use a large input value, the memory usage for which will
dwarf that of the output rows (a ~128MB xml value can easily mean using
~4.5GB of backend memory, which seems quite excessively profligate), but
my tests show that using that value, the lack of a per-GetValue context
reset costs perhaps another ~500MB on top for my specific testcase:
select count(*)
from (select ('<rec xmlns="http://x">'
|| repeat('<o><c>foo</c></o>',8000000+(i%10))
|| '</rec>')::xml as content
from generate_series(1,10) i offset 0) s,
xmltable(xmlnamespaces('http://x' as x),
'x:o'
passing s.content
columns
col1 text path 'x:c');
>> It's not useless at all; it's needed to avoid excess memory usage
>> when a single XMLTABLE() call returns many rows.
Pavel> When this context was not necessary before, then it is not need
Pavel> be used now. Tuplestore does all work
Before, you were using perValueCxt and resetting it once per GetValue
call. Since my patch takes perValueCxt to use for another purpose
instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
is suitable for the job (and consistent with functionscan).
I think so using cxt_per_tuple_memor y is not necessary - and my patch is working too.
Just I removed MemoryContextReset(tstate->perValueCxt) after tuplestore_putvalues. It is possible, because tstate->perValueCxt is cleaned immediately in caller tfuncFetchRows function.
The tuplestore does not do all the work - just look at XmlTableGetValue,
and notice that it has calls to text_to_cstring, pstrdup,
appendStringInfoText, InputFunctionCall, xml_xmlnodetoxmltype, all of
which will allocate memory in the current memory context. All of that
needs to be reset on a per-output-tuple basis.
But it has own context or it can works under perValueCxt
I think using two memory contexts is not necessary, and with just one context, the code can be simpler.
Regards
Pavel
--
Andrew (irc:RhodiumToad)
pgsql-bugs by date: