Re: WIP: Join push-down for foreign tables - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: WIP: Join push-down for foreign tables
Date
Msg-id CADyhKSVX=bivbzF0VT9KSwqe=i68tirh+hUZ921XTN770SvUzg@mail.gmail.com
Whole thread Raw
In response to WIP: Join push-down for foreign tables  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Responses Re: WIP: Join push-down for foreign tables
List pgsql-hackers
Hanada-san,

I applied your patch and run a few test cases. while this test, I
noticed a few points.

At first, I tried to use file_fdw, however, it was crashed of course.
It seems to me this logic should be modified to confirm whether the target FDW
support join push down, or not.

+       if (enable_foreignjoin &&
+           joinrel->serverid != InvalidOid &&
+           (IsA(outerpath, ForeignPath) || IsA(outerpath, ForeignJoinPath)) &&
+           (IsA(inner_cheapest_total, ForeignPath) ||
+            IsA(inner_cheapest_total, ForeignJoinPath)))
+
+       {
+           ForeignJoinPath    *path;
+           path = create_foreignjoin_path(root,
+                                          joinrel,
+                                          jointype,
+                                          sjinfo,
+                                          outerpath,
+                                          inner_cheapest_total,
+                                          restrictlist,
+                                          merge_pathkeys);
+           if (path != NULL)
+               add_path(joinrel, (Path *) path);
+       }
+

In my opinion, FdwRoutine should have an additional API to inform the core its
supported features; such as inner-join, outer-join, order-by,
group-by, aggregate
functions, insert, update, delete, etc... in the future version.

Obviously, it is not hard to implement inner/outer-join feature for
pgsql_fdw module,
but it may be a tough work for memcached_fdw module.

> *) SELECT * FROM A JOIN B (...) doesn't work.  Specifying columns in
> SELECT clause explicitly like "SELECT A.col1, A.col2, ..." seems to work.
> *) ORDER BY causes error if no column is specified in SELECT clause from
> sort key's table.
>
I doubt these issues are in pgsql_fdw side, not the proposed patch itself.

In the case when the table and column names/types are compatible between
local-side and remote-side, the problem was not reproduced in my environment.
I'd like to suggest you to add a functionality to map remote column names to
the local ones in pgsql_fdw.

See below:

* I set up three foreign tables on the local side.
CREATE FOREIGN TABLE ft_1 (a int, b text) SERVER local_db;
CREATE FOREIGN TABLE ft_2 (x int, y text) SERVER local_db;
CREATE FOREIGN TABLE ft_3 (s int, t text) SERVER local_db;

* I also set up related tables on the remote side.
CREATE TABLE ft_1 (a int, b text);
CREATE TABLE ft_2 (x int, y text);
CREATE TABLE ft_3 (ss int, tt text);
Please note that column name of ft_3 is not compatible

* JOIN ft_1 and ft_2 works collectly.
postgres=# SELECT * FROM ft_1 JOIN ft_2 ON a = x;a |  b  | x |  y
---+-----+---+-----2 | bbb | 2 | bbb3 | ccc | 3 | ccc4 | ddd | 4 | ddd
(3 rows)

postgres=# EXPLAIN SELECT * FROM ft_1 JOIN ft_2 ON a = x;                                                    QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------Foreign
Scanon multiple foreign tables  (cost=0.00..0.00 rows=5000 width=72)  Remote SQL: SELECT ft_1.a, ft_1.b, ft_2.x, ft_2.y
FROMpublic.ft_1
 
ft_1, public.ft_2 ft_2 WHERE (ft_1.a = ft_2.x)
(2 rows)

* JOIN ft_1 and ft_3 does not works. Error message says ft_3.s does
not exist. Probably, it means ft_3.s does not exist "on the remote
host".

postgres=# SELECT * FROM ft_1 JOIN ft_3 ON a = s;
ERROR:  could not execute foreign query
DETAIL:  ERROR:  column ft_3.s does not exist
LINE 1: SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t FROM public.ft_1 ft_1,...                              ^

HINT:  SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t FROM public.ft_1 ft_1,
public.ft_3 ft_3 WHERE (ft_1.a = ft_3.s)
postgres=# EXPLAIN SELECT * FROM ft_1 JOIN ft_3 ON a = s;                                                    QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------Foreign
Scanon multiple foreign tables  (cost=0.00..0.00 rows=5000 width=72)  Remote SQL: SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t
FROMpublic.ft_1
 
ft_1, public.ft_3 ft_3 WHERE (ft_1.a = ft_3.s)
(2 rows)

In fact, EXPLAIN shows us the remote SQL tries to reference ft_3.s,
instead of ft_3.ss.

Thanks,

2011年9月14日10:24 Shigeru Hanada <shigeru.hanada@gmail.com>:
> Hi all,
>
> I'd like to propose $SUBJECT for further foreign query optimization.
> I've not finished development, but I'd appreciate it if I got someone's
> review on my WIP code and its design.
>
> Changes I made
> ==============
>
> (1) Add foreign server OID to RelOptInfo
> I think it would be nice to know whether a join comes from one foreign
> server or not without digging into child nodes during considering paths
> for a query.  So I added serverid field to RelOptInfo, which defaults to
> InvalidOid ,and is set to OID of the server if the node and all of its
> children are from same foreign server.  This also avoids looking catalog
> up for foreign table entry to determine FDW routine.
>
> (2) Add new planner node, ForeignJoinPath
> ForeignJoinPath derives JoinPath, like other Join nodes, and holds
> FdwPlan like ForeignPath node.
>
> This node is used to represent a pushed-down join between foreign tables
> and/or another foreign join in early planning phase, for all of
> combination such as table-table, table-join, join-table and join-join
> will be considered.  In addition, though they might generate same
> internal (FDW-specific) plan, reversed combination is considered;
> planner generates two ForeignJoinPath for both (A & B) and (B & A).
>
> During creation of this node, planner calls new FDW handler function
> PlanForeignJoin to get a FdwPlan which includes costs and internal plan
> of a foreign join.  If a FDW can't (or doesn't want to) handle this
> join, just return NULL is OK, and then planner gives such optimization
> up and considers other usual join methods such as nested loop and hash join.
>
> A subtree which has a ForeignJoin on its top is translated into a
> ForeignScan node during constructing a plan tree.  This behavior is
> different from other join path nodes such as NestPath and MergePath,
> because they have child plan nodes correspond to path nodes.
>
> (3) Add EXPALIN support for foreign join (currently just for debug)
> ForeignScan might not be a simple foreign table scan, so
> ExplainScanTarget() can't be used for it.  An idea I have is adding
> ExplainForeignScanTarget() to handle ForeignScan separately from other
> scan nodes.
>
> (4) Add new GUC parameter, enable_foreignjoin
> If this was off, planner never generates ForeignJoinPath.  In such case,
> foreign tables will be joined with one of NestLoop, MergeJoin and HashJoin.
>
> Known issue
> ===========
>
> I'm sorry but attached patch, join_pushdown_v1.patch, is WIP, so
> currently some kind of query fails.  Known failure patterns are:
>
> *) SELECT * FROM A JOIN B (...) doesn't work.  Specifying columns in
> SELECT clause explicitly like "SELECT A.col1, A.col2, ..." seems to work.
> *) ORDER BY causes error if no column is specified in SELECT clause from
> sort key's table.
>
> Probably more problems still are there...
>
> PG-wrapper as sample implementation
> ===================================
>
> pgsql_fdw-0.1.0.tar.gz is an WIP implementation of PG-wrapper, which can
> (hopefully) handle both simple foreign table scan and multiple foreign
> joins.  You can build it with placing in contrib/, or using pgxs.  Note
> that it has some issues such as memory leak of PGresult.  I'm planning
> to propose this wrapper as a contrib module, but it would be after
> clearing such issues.
>
> Regards,
> --
> Shigeru Hanada
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: bug of recovery?
Next
From: Dimitri Fontaine
Date:
Subject: Re: Should we get rid of custom_variable_classes altogether?