RE: speeding up planning with partitions - Mailing list pgsql-hackers

From Imai, Yoshikazu
Subject RE: speeding up planning with partitions
Date
Msg-id 0F97FA9ABBDBE54F91744A9B37151A51212454@g01jpexmbkw24
Whole thread Raw
In response to Re: speeding up planning with partitions  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List pgsql-hackers
Hi Amit,

On Thu, Nov 8, 2018 at 8:26 PM, Amit Langote wrote:
> On 2018/11/07 10:00, Imai, Yoshikazu wrote:
> > About inheritance_make_rel_from_joinlist(), I considered how it processes
> > joins for sub-partitioned-table.
> > 
> > sub-partitioned-table image:
> > part
> >   sub1
> >     leaf1
> >     leaf2
> > 
> > inheritance_make_rel_from_joinlist() translates join_list and join_info_list
> > for each leafs(leaf1, leaf2 in above image). To translate those two lists for
> > leaf1, inheritance_make_rel_from_joinlist() translates lists from part to sub1
> > and nextly from sub1 to leaf1. For leaf2, inheritance_make_rel_from_joinlist() 
> > translates lists from part to sub1 and from sub1 to leaf2. There are same
> > translation from part to sub1, and I think it is better if we can eliminate it.
> > I attached 0002-delta.patch.
> > 
> > In the patch, inheritance_make_rel_from_joinlist() translates lists not only for
> > leafs but for mid-nodes, in a depth-first manner, so it can use lists of
> > mid-nodes for translating lists from mid-node to leafs, which eliminates same
> > translation.
> 
> I don't think the translation happens twice for the same leaf partitions.
> 
> Applying adjust_appendrel_attrs_*multilevel* for only leaf nodes, as
> happens with the current code, is same as first translating using
> adjust_appendrel_attrs from part to sub1 and then from sub1 to leaf1 and
> leaf2 during recursion with sub1 as the parent.

Thanks for replying.
I interpreted your thoughts about translation as below.

adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1
adjust_appendrel_attrs_multilevel for leaf2: sub1(produced at above) -> leaf2

But I wonder translation of leaf2 actually reuses the results of sub1 which is
produced at leaf1 translation. ISTM translation for leaf1, leaf2 are executed
as below.

adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1
adjust_appendrel_attrs_multilevel for leaf2: root -> sub1 -> leaf2


> > I think it might be better if we can apply same logic to inheritance_planner(),
> > which once implemented the same logic as below. 
> > 
> > 
> >     foreach(lc, root->append_rel_list)
> >     {
> >         AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
> >         ...
> >         /*
> >          * expand_inherited_rtentry() always processes a parent before any of
> >          * that parent's children, so the parent_root for this relation should
> >          * already be available.
> >          */
> >         parent_root = parent_roots[appinfo->parent_relid];
> >         Assert(parent_root != NULL);
> >         parent_parse = parent_root->parse;
> >         ...
> >         subroot->parse = (Query *)
> >             adjust_appendrel_attrs(parent_root,
> >                                    (Node *) parent_parse,
> >                                    1, &appinfo);
> 
> Actually, inheritance_planner is also using
> adjust_appendrel_attrs_multilevel.  I'm not sure if you're looking at the
> latest (10/26) patch.

Sorry for my poor explanation. I described the above code as old code which is
not patch applied. 


Since it is difficult to explain my thoughts with words, I will show the
performance degration case.

Partition tables are below two sets.

Set1:
[create 800 partitions directly under root]
CREATE TABLE rt (a int, b int) PARTITION BY RANGE (a);
\o /dev/null
SELECT 'CREATE TABLE leaf' || x::text || ' PARTITION OF rt FOR VALUES FROM ('
|| (x)::text || ') TO (' || (x+1)::text || ');' FROM generate_series(0, 800) x;
\gexec
\o

Set2:
[create 800 partitions under a partitioned table which is under root]
CREATE TABLE rt (a int, b int) PARTITION BY RANGE (a);
CREATE TABLE sub1 PARTITION OF rt FOR VALUES FROM (0) TO (100) PARTITION BY
RANGE (b);
\o /dev/null
SELECT 'CREATE TABLE leaf' || x::text || ' PARTITION OF sub1 FOR VALUES FROM ('
|| (x)::text || ') TO (' || (x+1)::text || ');' FROM generate_series(0, 800) x;
\gexec
\o


Create a generic plan of updation or deletion.

[create a delete generic plan]
set plan_cache_mode = 'force_generic_plan';
prepare delete_stmt(int) as delete from rt where b = $1;
execute delete_stmt(1);


In creating generic plan, paths/plans for all partitions are created because
we don't know which plan is used before "EXECUTE" command happens.
In creating paths in inheritance_planner(),
adjust_appendrel_attrs()/adjust_appendrel_attrs_multilevel() is executed many
times and it allocates a lot of memory in total if there are many partitions.

How amount of memory is used with above tests is...

without v5 patches, Set1: 242MB
without v5 patches, Set2: 247MB
with v5 patches, Set1: 420MB
with v5 patches, Set2: 820MB
# Thanks for supplying v5 patches :)

Without sub-partition(Set1), memory allocated by adjust_appendrel_attrs() with
v5 patches is 1.7x larger than without v5 patches. With sub-partition(Set2),
memory allocated by adjust_appendrel_attrs() with v5 patches is 3.3x larger
than without v5 patches.
I think why memory usage in "with v5 patches, Set2" is almost 2 times than
"with v5 patches, Set1" is that adjust_appendrel_attrs() from root to sub1 is
occurred at each translation for each partitions in "with v5 patches, Set2".

Currently, a query to a partition table is processed faster by a custom plan
than by a generic plan, so we would not use a generic plan that I don't know
whether we should solve this large memory allocation problem.

--
Yoshikazu Imai 


pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: csv format for psql
Next
From: Edmund Horner
Date:
Subject: Re: Tid scan improvements