Re: [Proposal] Table partition + join pushdown - Mailing list pgsql-hackers
From | Taiki Kondo |
---|---|
Subject | Re: [Proposal] Table partition + join pushdown |
Date | |
Msg-id | 12A9442FBAE80D4E8953883E0B84E0885E6AA9@BPXM01GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: [Proposal] Table partition + join pushdown (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: [Proposal] Table partition + join pushdown
|
List | pgsql-hackers |
Hello, KaiGai-san. Thank you for your comment, and sorry for late response. The attached patch is completely rewritten from previous patch[1], at your suggestion[2]. Please find attached. This patch contains following implementation, but I can't determine this is correct or wrong. 1. Cost estimation In this patch, additional row filter is implemented for Hash, Merge join and Nested Loop. I implemented cost estimation feature for this filter by watching other parts of filters, but I am not sure this implementation is correct. 2. Workaround for failing assertion at allpaths.c In standard_join_search(), we expect to have a single rel at the final level. But this expectation is disappointed by join pushdown feature, because this will search for the combinations not searched by original standard_join_serch() at the final level. Therefore, once trying join pushdown is succeeded, failing assertion occurs in allpaths.c. So I implemented workaround by temporary set NULL to root->join_rel_level while trying join pushdown, but I think this implementation may be wrong. 3. Searching pathkeys for Merge Join When join pushdown feature chooses merge join for pushed-down join operation, planner fails to create merge join node because it is unable to find pathkeys for this merge join. I found this is caused by skipping child table in finding pathkeys. I expect that it is for making planner faster, and I implemented that planner doesn't skip child table in finding pathkeys for merge join. But I am not sure this expectation is correct. Any comments/suggestions are welcome. Remarks : [1] http://www.postgresql.org/message-id/12A9442FBAE80D4E8953883E0B84E0885C01FD@BPXM01GP.gisp.nec.co.jp [2] http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8011345B6@BPXM15GP.gisp.nec.co.jp Best regards, -- Taiki Kondo NEC Solution Innovators, Ltd. -----Original Message----- From: Kaigai Kouhei(海外 浩平) [mailto:kaigai@ak.jp.nec.com] Sent: Tuesday, August 18, 2015 5:47 PM To: Kondo Taiki(近藤 太樹); pgsql-hackers@postgresql.org Cc: Iwaasa Akio(岩浅 晃郎) Subject: RE: [Proposal] Table partition + join pushdown Hello Kondo-san, I briefly checked your patch. Let me put some comments about its design and implementation, even though I have no argumentstowards its concept. :-) * Construction of RelOptInfo In your patch, try_hashjoin_pushdown() called by try_hashjoin_path() constructs RelOptInfo of the join-rel between inner-reland a subpath of Append node. It is entirely wrong implementation. I can understand we (may) have no RelOptInfo for the joinrel between tbl_child_0 and other_table, when planner investigates a joinpath to process join Append path with the other_table. > HashJoin > -> Append > -> SeqScan on tbl_child_0 > -> SeqScan on tbl_child_1 > -> SeqScan on tbl_child_2 > -> SeqScan on tbl_child_3 > -> Hash > -> SeqScan on other_table > How about these alternatives? - calls make_join_rel() to the pair of tbl_child_X and other_table by try_hashjoin_pushdown() or somewhere. make_join_rel() internally construct a RelOptInfo for the supplied pair of relations, so relevant RelOptInfo shall be properly constructed. - make_join_rel() also calls add_paths_to_joinrel() towards all the join logic, so it makes easier to support to push down other join logic including nested-loop or custom-join. - It may be an idea to add an extra argument to make_join_rel() to inform expressions to be applied for tuple filtering on construction of inner hash table. * Why only SeqScan is supported I think it is the role of Hash-node to filter out inner tuples obviously unrelated to the join (if CHECK constraint of outerrelation gives information), because this join-pushdown may be able to support multi-stacked pushdown. For example, if planner considers a path to join this Append-path with another relation, and join clause contains a referenceto X? > Append > -> HashJoin > -> SeqScan on tbl_child_0 > -> Hash ... Filter: hash_func(X) % 4 = 0 > -> SeqScan on other_table > -> HashJoin > -> SeqScan on tbl_child_1 > -> Hash ... Filter: hash_func(X) % 4 = 1 > -> SeqScan on other_table > -> HashJoin > -> SeqScan on tbl_child_2 > -> Hash ... Filter: hash_func(X) % 4 = 2 > -> SeqScan on other_table > -> HashJoin > -> SeqScan on tbl_child_3 > -> Hash ... Filter: hash_func(X) % 4 = 3 > -> SeqScan on other_table > It may be a good challenge to consider additional join pushdown, even if subpaths of Append are HashJoin, not SeqScan, like: Append -> HashJoin -> HashJoin -> SeqScan on tbl_child_0 -> Hash ... Filter: hash_func(X) % 4 = 0 -> SeqScan on other_table -> Hash ... Filter: hash_func(X) % 4 = 0 -> SeqScan on another_table -> HashJoin -> HashJoin -> SeqScan on tbl_child_1 -> Hash ... Filter: hash_func(X) % 4 = 1 -> SeqScan on other_table -> Hash ... Filter: hash_func(X) % 4 = 1 -> SeqScan on another_table -> HashJoin -> HashJoin -> SeqScan on tbl_child_2 -> Hash ... Filter: hash_func(X) % 4 = 2 -> SeqScan on other_table -> Hash ... Filter: hash_func(X) % 4 = 2 -> SeqScan on another_table -> HashJoin -> HashJoin -> SeqScan on tbl_child_3 -> Hash ... Filter: hash_func(X) % 4 = 3 -> SeqScan on other_table -> Hash ... Filter: hash_func(X) % 4 = 3 -> SeqScan on another_table In this case, underlying nodes are not always SeqScan. So, only Hash-node can have filter clauses. * Way to handle expression nodes All this patch supported is CHECK() constraint with equal operation on INT4 data type. You can learn various useful infrastructureof PostgreSQL. For example, ... - expression_tree_mutator() is useful to make a copy of expression node with small modification - pull_varnos() is useful to check which relations are referenced by the expression node. - RestrictInfo->can_join is useful to check whether the clause is binary operator, or not. Anyway, reuse of existing infrastructure is the best way to make a reliable infrastructure and to keep the implementationsimple. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Taiki Kondo > Sent: Thursday, August 13, 2015 6:30 PM > To: pgsql-hackers@postgresql.org > Cc: Kaigai Kouhei(海外 浩平); Iwaasa Akio(岩浅 晃郎) > Subject: [HACKERS] [Proposal] Table partition + join pushdown > > Hi all, > > I saw the email about the idea from KaiGai-san[1], and I worked to > implement this idea. > > Now, I have implemented a part of this idea, so I want to propose this > feature. > > Patch attached just shows my concept of this feature. > It works fine for EXPLAIN, but it returns wrong result for other operations, sadly. > > > > Table partition + join pushdown > =============================== > > Motivation > ---------- > To make join logic working more effectively, it is important to make > the size of relations smaller. > > Especially in Hash-join, it is meaningful to make the inner relation > smaller, because smaller inner relation can be stored within smaller hash table. > This means that memory usage can be reduced when joining with big tables. > > > Design > ------ > It was mentioned by the email from KaiGai-san. > So I quote below here... > > ---- begin quotation --- > Let's assume a table which is partitioned to four portions, and > individual child relations have constraint by hash-value of its ID > field. > > tbl_parent > + tbl_child_0 ... CHECK(hash_func(id) % 4 = 0) > + tbl_child_1 ... CHECK(hash_func(id) % 4 = 1) > + tbl_child_2 ... CHECK(hash_func(id) % 4 = 2) > + tbl_child_3 ... CHECK(hash_func(id) % 4 = 3) > > If someone tried to join another relation with tbl_parent using > equivalence condition, like X = tbl_parent.ID, we know inner tuples > that does not satisfies the condition > hash_func(X) % 4 = 0 > shall be never joined to the tuples in tbl_child_0. > So, we can omit to load these tuples to inner hash table preliminary, > then it potentially allows to split the inner hash-table. > > Current typical plan structure is below: > > HashJoin > -> Append > -> SeqScan on tbl_child_0 > -> SeqScan on tbl_child_1 > -> SeqScan on tbl_child_2 > -> SeqScan on tbl_child_3 > -> Hash > -> SeqScan on other_table > > It may be rewritable to: > > Append > -> HashJoin > -> SeqScan on tbl_child_0 > -> Hash ... Filter: hash_func(X) % 4 = 0 > -> SeqScan on other_table > -> HashJoin > -> SeqScan on tbl_child_1 > -> Hash ... Filter: hash_func(X) % 4 = 1 > -> SeqScan on other_table > -> HashJoin > -> SeqScan on tbl_child_2 > -> Hash ... Filter: hash_func(X) % 4 = 2 > -> SeqScan on other_table > -> HashJoin > -> SeqScan on tbl_child_3 > -> Hash ... Filter: hash_func(X) % 4 = 3 > -> SeqScan on other_table > ---- end quotation --- > > In the quotation above, it was written that filter is set at Hash node. > But I implemented that filter is set at SeqScan node under Hash node. > In my opinion, filtering tuples is work of Scanner. > > Append > -> HashJoin > -> SeqScan on tbl_child_0 > -> Hash > -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 0 > -> HashJoin > -> SeqScan on tbl_child_1 > -> Hash > -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 1 > -> HashJoin > -> SeqScan on tbl_child_2 > -> Hash > -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 2 > -> HashJoin > -> SeqScan on tbl_child_3 > -> Hash > -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 3 > > > API > --- > There are 3 new internal (static) functions to implement this feature. > try_hashjoin_pushdown(), which is main function of this feature, is > called from try_hashjoin_path(), and tries to push down HashPath under > the AppendPath. > > To do so, this function does following operations. > > 1. Check if this Hash-join can be pushed down under AppendPath > 2. To avoid an influence on other Path making operation, > copy inner path's RelOptInfo and make new SeqScan path from it. > At here, get CHECK() constraints from OUTER path, and convert its > Var node according to join condition. And also convert Var nodes > in join condition itself. > 3. Create new HashPath nodes between each sub-paths of AppendPath and > inner path made above. > 4. When operations from 1 to 3 are done for each sub-paths, > create new AppendPath which sub-paths are HashPath nodes made above. > > get_replaced_clause_constr() is called from try_hashjoin_pushdown(), > and get_var_nodes_recurse() is called from get_replaced_cluase_constr(). > These 2 functions help above operations. > (I may revise this part to use expression_tree_walker() and > expression_tree_mutator().) > > In patch attached, it has the following limitations. > o It only works for hash-join operation. > (I want to support not only hash-join but also other logic.) > o Join conditions must be "=" operator with int4 variables. > o Inner path must be SeqScan. > (I want to support other path-node.) > o And now, planner may not choose this plan, > because estimated costs are usually larger than original (non-pushdown) plan. > > And also 1 internal (static) function, get_relation_constraints() > defined in plancat.c, is changed to global. This function will be > called from > get_replaced_clause_constr() to get CHECK() constraints. > > > Usage > ----- > To use this feature, create partition tables and small table to join, > and run select operation with joining these tables. > > For your convenience, I attach DDL and DML script. > And I also attach the result of EXPLAIN. > > > Any comments are welcome. But, at first, I need your advices to > correct this patch's behavior. > > At least, I think it has to expand array of RangeTblEntry and other > arrays defined in PlannerInfo to register new RelOptInfos for new Path nodes mentioned above. > Or, is it better choice to modify query parser to implement this > feature more further? > > > Remarks : > [1] > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80 > 10F672 > B@BPXM15GP.gisp.nec.co.jp > > > Best regards, > -- > Taiki Kondo > > NEC Solution Innovators, Ltd.
Attachment
pgsql-hackers by date: