Home > mailing lists
Re: [Proposal] Table partition + join pushdown - Mailing list pgsql-hackers

From	Kouhei Kaigai
Subject	Re: [Proposal] Table partition + join pushdown
Date	August 18, 2015 08:47:34
Msg-id	9A28C8860F777E439AA12E8AEA7694F8011345B6@BPXM15GP.gisp.nec.co.jp Whole thread
In response to	[Proposal] Table partition + join pushdown (Taiki Kondo <tai-kondo@yk.jp.nec.com>)
Responses	Re: [Proposal] Table partition + join pushdown
List	pgsql-hackers
Tree view
Hello Kondo-san,

I briefly checked your patch. Let me put some comments about
its design and implementation, even though I have no arguments
towards its concept. :-)

* Construction of RelOptInfo

In your patch, try_hashjoin_pushdown() called by try_hashjoin_path()
constructs RelOptInfo of the join-rel between inner-rel and a subpath
of Append node. It is entirely wrong implementation.

I can understand we (may) have no RelOptInfo for the joinrel between
tbl_child_0 and other_table, when planner investigates a joinpath to
process join Append path with the other_table.

>   HashJoin
>     -> Append
>       -> SeqScan on tbl_child_0
>       -> SeqScan on tbl_child_1
>       -> SeqScan on tbl_child_2
>       -> SeqScan on tbl_child_3
>     -> Hash
>       -> SeqScan on other_table
>
How about these alternatives?

- calls make_join_rel() to the pair of tbl_child_X and other_table by try_hashjoin_pushdown() or somewhere.
make_join_rel()internally construct a RelOptInfo for the supplied pair of relations, so relevant RelOptInfo shall be
properlyconstructed. 
- make_join_rel() also calls add_paths_to_joinrel() towards all the join logic, so it makes easier to support to push
downother join logic including nested-loop or custom-join. 
- It may be an idea to add an extra argument to make_join_rel() to inform expressions to be applied for tuple filtering
onconstruction of inner hash table. 

* Why only SeqScan is supported

I think it is the role of Hash-node to filter out inner tuples
obviously unrelated to the join (if CHECK constraint of outer relation
gives information), because this join-pushdown may be able to support
multi-stacked pushdown.

For example, if planner considers a path to join this Append-path
with another relation, and join clause contains a reference to X?

>   Append
>    -> HashJoin
>      -> SeqScan on tbl_child_0
>      -> Hash ... Filter: hash_func(X) % 4 = 0
>        -> SeqScan on other_table
>    -> HashJoin
>      -> SeqScan on tbl_child_1
>      -> Hash ... Filter: hash_func(X) % 4 = 1
>        -> SeqScan on other_table
>    -> HashJoin
>      -> SeqScan on tbl_child_2
>      -> Hash ... Filter: hash_func(X) % 4 = 2
>        -> SeqScan on other_table
>    -> HashJoin
>      -> SeqScan on tbl_child_3
>      -> Hash ... Filter: hash_func(X) % 4 = 3
>        -> SeqScan on other_table
>
It may be a good challenge to consider additional join pushdown,
even if subpaths of Append are HashJoin, not SeqScan, like:
  Append   -> HashJoin     -> HashJoin       -> SeqScan on tbl_child_0       -> Hash ... Filter: hash_func(X) % 4 = 0
     -> SeqScan on other_table     -> Hash ... Filter: hash_func(X) % 4 = 0        -> SeqScan on another_table   ->
HashJoin    -> HashJoin       -> SeqScan on tbl_child_1       -> Hash ... Filter: hash_func(X) % 4 = 1         ->
SeqScanon other_table     -> Hash ... Filter: hash_func(X) % 4 = 1        -> SeqScan on another_table   -> HashJoin
->HashJoin       -> SeqScan on tbl_child_2       -> Hash ... Filter: hash_func(X) % 4 = 2         -> SeqScan on
other_table    -> Hash ... Filter: hash_func(X) % 4 = 2        -> SeqScan on another_table   -> HashJoin     ->
HashJoin      -> SeqScan on tbl_child_3       -> Hash ... Filter: hash_func(X) % 4 = 3         -> SeqScan on
other_table    -> Hash ... Filter: hash_func(X) % 4 = 3        -> SeqScan on another_table 

In this case, underlying nodes are not always SeqScan. So, only
Hash-node can have filter clauses.


* Way to handle expression nodes

All this patch supported is CHECK() constraint with equal operation
on INT4 data type. You can learn various useful infrastructure of
PostgreSQL. For example, ...
- expression_tree_mutator() is useful to make a copy of expression node with small modification
- pull_varnos() is useful to check which relations are referenced by the expression node.
- RestrictInfo->can_join is useful to check whether the clause is binary operator, or not.


Anyway, reuse of existing infrastructure is the best way to make
a reliable infrastructure and to keep the implementation simple.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Taiki Kondo
> Sent: Thursday, August 13, 2015 6:30 PM
> To: pgsql-hackers@postgresql.org
> Cc: Kaigai Kouhei(海外 浩平); Iwaasa Akio(岩浅 晃郎)
> Subject: [HACKERS] [Proposal] Table partition + join pushdown
>
> Hi all,
>
> I saw the email about the idea from KaiGai-san[1],
> and I worked to implement this idea.
>
> Now, I have implemented a part of this idea,
> so I want to propose this feature.
>
> Patch attached just shows my concept of this feature.
> It works fine for EXPLAIN, but it returns wrong result for other operations, sadly.
>
>
>
> Table partition + join pushdown
> ===============================
>
> Motivation
> ----------
> To make join logic working more effectively,
> it is important to make the size of relations smaller.
>
> Especially in Hash-join, it is meaningful to make the inner relation smaller,
> because smaller inner relation can be stored within smaller hash table.
> This means that memory usage can be reduced when joining with big tables.
>
>
> Design
> ------
> It was mentioned by the email from KaiGai-san.
> So I quote below here...
>
> ---- begin quotation ---
> Let's assume a table which is partitioned to four portions,
> and individual child relations have constraint by hash-value
> of its ID field.
>
>   tbl_parent
>    + tbl_child_0 ... CHECK(hash_func(id) % 4 = 0)
>    + tbl_child_1 ... CHECK(hash_func(id) % 4 = 1)
>    + tbl_child_2 ... CHECK(hash_func(id) % 4 = 2)
>    + tbl_child_3 ... CHECK(hash_func(id) % 4 = 3)
>
> If someone tried to join another relation with tbl_parent
> using equivalence condition, like X = tbl_parent.ID, we
> know inner tuples that does not satisfies the condition
>   hash_func(X) % 4 = 0
> shall be never joined to the tuples in tbl_child_0.
> So, we can omit to load these tuples to inner hash table
> preliminary, then it potentially allows to split the
> inner hash-table.
>
> Current typical plan structure is below:
>
>   HashJoin
>     -> Append
>       -> SeqScan on tbl_child_0
>       -> SeqScan on tbl_child_1
>       -> SeqScan on tbl_child_2
>       -> SeqScan on tbl_child_3
>     -> Hash
>       -> SeqScan on other_table
>
> It may be rewritable to:
>
>   Append
>    -> HashJoin
>      -> SeqScan on tbl_child_0
>      -> Hash ... Filter: hash_func(X) % 4 = 0
>        -> SeqScan on other_table
>    -> HashJoin
>      -> SeqScan on tbl_child_1
>      -> Hash ... Filter: hash_func(X) % 4 = 1
>        -> SeqScan on other_table
>    -> HashJoin
>      -> SeqScan on tbl_child_2
>      -> Hash ... Filter: hash_func(X) % 4 = 2
>        -> SeqScan on other_table
>    -> HashJoin
>      -> SeqScan on tbl_child_3
>      -> Hash ... Filter: hash_func(X) % 4 = 3
>        -> SeqScan on other_table
> ---- end quotation ---
>
> In the quotation above, it was written that filter is set at Hash node.
> But I implemented that filter is set at SeqScan node under Hash node.
> In my opinion, filtering tuples is work of Scanner.
>
>   Append
>    -> HashJoin
>      -> SeqScan on tbl_child_0
>      -> Hash
>        -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 0
>    -> HashJoin
>      -> SeqScan on tbl_child_1
>      -> Hash
>        -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 1
>    -> HashJoin
>      -> SeqScan on tbl_child_2
>      -> Hash
>        -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 2
>    -> HashJoin
>      -> SeqScan on tbl_child_3
>      -> Hash
>        -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 3
>
>
> API
> ---
> There are 3 new internal (static) functions to implement this feature.
> try_hashjoin_pushdown(), which is main function of this feature,
> is called from try_hashjoin_path(), and tries to push down HashPath
> under the AppendPath.
>
> To do so, this function does following operations.
>
>   1. Check if this Hash-join can be pushed down under AppendPath
>   2. To avoid an influence on other Path making operation,
>      copy inner path's RelOptInfo and make new SeqScan path from it.
>      At here, get CHECK() constraints from OUTER path, and convert its
>      Var node according to join condition. And also convert Var nodes
>      in join condition itself.
>   3. Create new HashPath nodes between each sub-paths of AppendPath and
>      inner path made above.
>   4. When operations from 1 to 3 are done for each sub-paths,
>      create new AppendPath which sub-paths are HashPath nodes made above.
>
> get_replaced_clause_constr() is called from try_hashjoin_pushdown(),
> and get_var_nodes_recurse() is called from get_replaced_cluase_constr().
> These 2 functions help above operations.
> (I may revise this part to use expression_tree_walker() and
>  expression_tree_mutator().)
>
> In patch attached, it has the following limitations.
>   o It only works for hash-join operation.
>     (I want to support not only hash-join but also other logic.)
>   o Join conditions must be "=" operator with int4 variables.
>   o Inner path must be SeqScan.
>     (I want to support other path-node.)
>   o And now, planner may not choose this plan,
>     because estimated costs are usually larger than original (non-pushdown) plan.
>
> And also 1 internal (static) function, get_relation_constraints() defined in
> plancat.c, is changed to global. This function will be called from
> get_replaced_clause_constr() to get CHECK() constraints.
>
>
> Usage
> -----
> To use this feature, create partition tables and small table to join,
> and run select operation with joining these tables.
>
> For your convenience, I attach DDL and DML script.
> And I also attach the result of EXPLAIN.
>
>
> Any comments are welcome. But, at first, I need your advices
> to correct this patch's behavior.
>
> At least, I think it has to expand array of RangeTblEntry and other arrays defined
> in PlannerInfo to register new RelOptInfos for new Path nodes mentioned above.
> Or, is it better choice to modify query parser to implement this feature more
> further?
>
>
> Remarks :
> [1]
> http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F672
> B@BPXM15GP.gisp.nec.co.jp
>
>
> Best regards,
> --
> Taiki Kondo
>
> NEC Solution Innovators, Ltd.
pgsql-hackers by date:
From: Albe Laurenz
Date: 18 August 2015, 08:32:56
Subject: Re: Proposal: Implement failover on libpq connect level.
From: PostgreSQL - Hans-Jürgen Schönig
Date: 18 August 2015, 08:49:59
Subject: Re: Proposal: Implement failover on libpq connect level.
Re: [Proposal] Table partition + join pushdown - Mailing list pgsql-hackers

Previous

Next