Re: Inheritance planner CPU and memory usage change since 9.3.2 - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Inheritance planner CPU and memory usage change since 9.3.2
Date
Msg-id CAEZATCVePyt1Xd5tZ5sW7VGMqquuRgufvyiqO0JkpHgBrvP8jQ@mail.gmail.com
Whole thread Raw
In response to Re: Inheritance planner CPU and memory usage change since 9.3.2  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Inheritance planner CPU and memory usage change since 9.3.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 18 June 2015 at 14:48, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 17, 2015 at 9:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jun 17, 2015 at 9:32 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> We saw a rather extreme performance problem in a cluster upgraded from
>>> 9.1 to 9.3.  It uses a largish number of child tables (partitions) and
>>> many columns.  Planning a simple UPDATE via the base table started
>>> using a very large amount of memory and CPU time.
>>>
>>> My colleague Rushabh Lathia tracked the performance change down to
>>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c03ad5602f529787968fa3201b35c119bbc6d782
>>> .
>>>
>>> The call to copyObject in the loop introduced here seems to be
>>> problematic (copying append_rel_list for every element in
>>> append_rel_list unconditionally), though we're still trying to figure
>>> it out.  Attached is a simple repro script, with variables to tweak.
>>>
>>> Quite a few others have posted about this sort of thing and been
>>> politely reminded of the 100 table caveat [1][2] which is fair enough,
>>> but the situations seems to have got dramatically worse for some users
>>> after an upgrade.
>>
>> Yes.  The copyObject() call introduced by this commit seems to have
>> complexity O(T^2*C) where T is the number of child tables and C is the
>> number of columns per child.  That's because the List that is being
>> copied is a list of AppendRelInfo nodes, each of which has a
>> translated_vars member that is a list of every Var in one table, and
>> we copy that list once per child.
>>
>> It appears that in a lot of cases this work is unnecessary.  The
>> second (long) for loop in inheritance_planner copies root->rowMarks
>> and root->append_rel_list so as to be able to apply ChangeVarNodes()
>> to the result, but we only actually do that if the rte is of type
>> RTE_SUBQUERY or if it has security quals.  In the common case where we
>> reach inheritance_planner not because of UNION ALL but just because
>> the table has a bunch of inheritance children (none of which have RLS
>> policies applied), we copy everything and then modify none of it,
>> using up startling large amounts of memory in ways that pre-9.2
>> versions did not.
>
> The attached patch helps.  It does two things:
>
> 1. It arranges for inheritance_planner to throw away the memory
> consumed by the subroot's rowMarks and append_rel_list after the call
> to grouping_planner for that subroot returns.  This prevents the
> explosive growth of memory usage in all cases I've tested so far, but
> planning is still really slow.
>
> 2. It arranges not to deep-copy append_rel_list when the root's
> append_rel_list doesn't need to be modified for the subroot.  This
> makes planning much much faster in simple cases, like a simple update
> on a table with many partitions.  But if you do attach a FROM clause
> containing a subquery to such an update, then this optimization
> doesn't kick in any more and things are still very slow (though still
> memory-bounded, due to part 1).
>
> I feel I might be missing a trick here.  It seems unlikely to me that
> we actually need the entire append_rel_list for every subquery; and we
> almost certainly don't need to modify every element of the
> append_rel_list for every subquery.  Even the ones that no
> ChangeVarNodes() call mutates still get deep-copied.
>

Yeah, you could probably pre-compute the indexes of the RTEs that need
to copied, outside of the big loop, and store them in a bitmapset.
Then, instead of copying the entire list of rowmarks/append_rel_infos
each time, you could just copy the ones that referred to those RTE
indexes (and only if the bitmapset was non-empty, which is the
equivalent of your second optimisation). However, for AppendRelInfos,
ChangeVarNodes() descends into the Vars in the translated_vars list,
so short-cutting the copying of the AppendRelInfo isn't obviously
safe. But, looking more closely, does ChangeVarNodes actually need to
examine translated_vars (the fall-through case) when child_relid isn't
the old rt_index? If not, that could be a big saving in cases like
this.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: [PATCH] Function to get size of asynchronous notification queue
Next
From: Tom Lane
Date:
Subject: Re: Inheritance planner CPU and memory usage change since 9.3.2