Re: Inheritance planner CPU and memory usage change since 9.3.2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Inheritance planner CPU and memory usage change since 9.3.2
Date
Msg-id CA+TgmoZEQs2wHbhz-5AXF7A2Pk+SUajy=7LSmVBwBPS8JBTVPQ@mail.gmail.com
Whole thread Raw
In response to Re: Inheritance planner CPU and memory usage change since 9.3.2  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Inheritance planner CPU and memory usage change since 9.3.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Jun 18, 2015 at 3:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> On 18 June 2015 at 14:48, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I feel I might be missing a trick here.  It seems unlikely to me that
>>> we actually need the entire append_rel_list for every subquery; and we
>>> almost certainly don't need to modify every element of the
>>> append_rel_list for every subquery.  Even the ones that no
>>> ChangeVarNodes() call mutates still get deep-copied.
>
>> Yeah, you could probably pre-compute the indexes of the RTEs that need
>> to copied, outside of the big loop, and store them in a bitmapset.
>> Then, instead of copying the entire list of rowmarks/append_rel_infos
>> each time, you could just copy the ones that referred to those RTE
>> indexes (and only if the bitmapset was non-empty, which is the
>> equivalent of your second optimisation). However, for AppendRelInfos,
>> ChangeVarNodes() descends into the Vars in the translated_vars list,
>> so short-cutting the copying of the AppendRelInfo isn't obviously
>> safe. But, looking more closely, does ChangeVarNodes actually need to
>> examine translated_vars (the fall-through case) when child_relid isn't
>> the old rt_index? If not, that could be a big saving in cases like
>> this.
>
> I'm a bit surprised that duplicating the append_rel_list is a noticeable
> performance problem.  It ought to be far smaller than the Query tree that
> we've always duplicated in this loop --- in particular, it's really a
> subset of what we have in the RTE list, no?

Well, append_rel_list has an AppendRelInfo for every RTE and that
contains a List (translated_vars) which in turn contains a Var node
for every column.  I'm not sure how that compares to the RTE itself.
I think it's the cost of copying the translated_vars list that is
really the problem here - you can have 200 or 500 columns in a table,
so that's a Var and a ListCell for each one.  In the problem cases,
the number of AppendRelInfo elements is a small percentage of the
number of Var nodes under them.

That having been said, I don't know how the size of all that compares
to the size of the Query.  But I think the Query must be smaller,
because arranging to discard the AppendRelInfo and its translated_vars
list, and the per-subroot rowMarks list, after every call to
grouping_planner stops the memory blowup.

The whole translated_vars representation seems needless inefficient.
For a subquery, you probably need something like that.  But for an
inheritance child, you just need to swap the varno and maybe remap
some varattnos.  Indexing into a C array to grab the varattno seems
like it would be a heck of a lot more efficient than calling
list_nth().  It might be worth making AppendRelInfo support a choice
of representations so that we can use something more optimized for the
simple case while not losing the full generality when we need it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Inheritance planner CPU and memory usage change since 9.3.2
Next
From: Tom Lane
Date:
Subject: Re: Inheritance planner CPU and memory usage change since 9.3.2