RE: speeding up planning with partitions - Mailing list pgsql-hackers
From | Imai, Yoshikazu |
---|---|
Subject | RE: speeding up planning with partitions |
Date | |
Msg-id | 0F97FA9ABBDBE54F91744A9B37151A51212454@g01jpexmbkw24 Whole thread Raw |
In response to | Re: speeding up planning with partitions (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
List | pgsql-hackers |
Hi Amit, On Thu, Nov 8, 2018 at 8:26 PM, Amit Langote wrote: > On 2018/11/07 10:00, Imai, Yoshikazu wrote: > > About inheritance_make_rel_from_joinlist(), I considered how it processes > > joins for sub-partitioned-table. > > > > sub-partitioned-table image: > > part > > sub1 > > leaf1 > > leaf2 > > > > inheritance_make_rel_from_joinlist() translates join_list and join_info_list > > for each leafs(leaf1, leaf2 in above image). To translate those two lists for > > leaf1, inheritance_make_rel_from_joinlist() translates lists from part to sub1 > > and nextly from sub1 to leaf1. For leaf2, inheritance_make_rel_from_joinlist() > > translates lists from part to sub1 and from sub1 to leaf2. There are same > > translation from part to sub1, and I think it is better if we can eliminate it. > > I attached 0002-delta.patch. > > > > In the patch, inheritance_make_rel_from_joinlist() translates lists not only for > > leafs but for mid-nodes, in a depth-first manner, so it can use lists of > > mid-nodes for translating lists from mid-node to leafs, which eliminates same > > translation. > > I don't think the translation happens twice for the same leaf partitions. > > Applying adjust_appendrel_attrs_*multilevel* for only leaf nodes, as > happens with the current code, is same as first translating using > adjust_appendrel_attrs from part to sub1 and then from sub1 to leaf1 and > leaf2 during recursion with sub1 as the parent. Thanks for replying. I interpreted your thoughts about translation as below. adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1 adjust_appendrel_attrs_multilevel for leaf2: sub1(produced at above) -> leaf2 But I wonder translation of leaf2 actually reuses the results of sub1 which is produced at leaf1 translation. ISTM translation for leaf1, leaf2 are executed as below. adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1 adjust_appendrel_attrs_multilevel for leaf2: root -> sub1 -> leaf2 > > I think it might be better if we can apply same logic to inheritance_planner(), > > which once implemented the same logic as below. > > > > > > foreach(lc, root->append_rel_list) > > { > > AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc); > > ... > > /* > > * expand_inherited_rtentry() always processes a parent before any of > > * that parent's children, so the parent_root for this relation should > > * already be available. > > */ > > parent_root = parent_roots[appinfo->parent_relid]; > > Assert(parent_root != NULL); > > parent_parse = parent_root->parse; > > ... > > subroot->parse = (Query *) > > adjust_appendrel_attrs(parent_root, > > (Node *) parent_parse, > > 1, &appinfo); > > Actually, inheritance_planner is also using > adjust_appendrel_attrs_multilevel. I'm not sure if you're looking at the > latest (10/26) patch. Sorry for my poor explanation. I described the above code as old code which is not patch applied. Since it is difficult to explain my thoughts with words, I will show the performance degration case. Partition tables are below two sets. Set1: [create 800 partitions directly under root] CREATE TABLE rt (a int, b int) PARTITION BY RANGE (a); \o /dev/null SELECT 'CREATE TABLE leaf' || x::text || ' PARTITION OF rt FOR VALUES FROM (' || (x)::text || ') TO (' || (x+1)::text || ');' FROM generate_series(0, 800) x; \gexec \o Set2: [create 800 partitions under a partitioned table which is under root] CREATE TABLE rt (a int, b int) PARTITION BY RANGE (a); CREATE TABLE sub1 PARTITION OF rt FOR VALUES FROM (0) TO (100) PARTITION BY RANGE (b); \o /dev/null SELECT 'CREATE TABLE leaf' || x::text || ' PARTITION OF sub1 FOR VALUES FROM (' || (x)::text || ') TO (' || (x+1)::text || ');' FROM generate_series(0, 800) x; \gexec \o Create a generic plan of updation or deletion. [create a delete generic plan] set plan_cache_mode = 'force_generic_plan'; prepare delete_stmt(int) as delete from rt where b = $1; execute delete_stmt(1); In creating generic plan, paths/plans for all partitions are created because we don't know which plan is used before "EXECUTE" command happens. In creating paths in inheritance_planner(), adjust_appendrel_attrs()/adjust_appendrel_attrs_multilevel() is executed many times and it allocates a lot of memory in total if there are many partitions. How amount of memory is used with above tests is... without v5 patches, Set1: 242MB without v5 patches, Set2: 247MB with v5 patches, Set1: 420MB with v5 patches, Set2: 820MB # Thanks for supplying v5 patches :) Without sub-partition(Set1), memory allocated by adjust_appendrel_attrs() with v5 patches is 1.7x larger than without v5 patches. With sub-partition(Set2), memory allocated by adjust_appendrel_attrs() with v5 patches is 3.3x larger than without v5 patches. I think why memory usage in "with v5 patches, Set2" is almost 2 times than "with v5 patches, Set1" is that adjust_appendrel_attrs() from root to sub1 is occurred at each translation for each partitions in "with v5 patches, Set2". Currently, a query to a partition table is processed faster by a custom plan than by a generic plan, so we would not use a generic plan that I don't know whether we should solve this large memory allocation problem. -- Yoshikazu Imai
pgsql-hackers by date: