Re: WIP: Join push-down for foreign tables - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: WIP: Join push-down for foreign tables |
Date | |
Msg-id | CADyhKSVX=bivbzF0VT9KSwqe=i68tirh+hUZ921XTN770SvUzg@mail.gmail.com Whole thread Raw |
In response to | WIP: Join push-down for foreign tables (Shigeru Hanada <shigeru.hanada@gmail.com>) |
Responses |
Re: WIP: Join push-down for foreign tables
|
List | pgsql-hackers |
Hanada-san, I applied your patch and run a few test cases. while this test, I noticed a few points. At first, I tried to use file_fdw, however, it was crashed of course. It seems to me this logic should be modified to confirm whether the target FDW support join push down, or not. + if (enable_foreignjoin && + joinrel->serverid != InvalidOid && + (IsA(outerpath, ForeignPath) || IsA(outerpath, ForeignJoinPath)) && + (IsA(inner_cheapest_total, ForeignPath) || + IsA(inner_cheapest_total, ForeignJoinPath))) + + { + ForeignJoinPath *path; + path = create_foreignjoin_path(root, + joinrel, + jointype, + sjinfo, + outerpath, + inner_cheapest_total, + restrictlist, + merge_pathkeys); + if (path != NULL) + add_path(joinrel, (Path *) path); + } + In my opinion, FdwRoutine should have an additional API to inform the core its supported features; such as inner-join, outer-join, order-by, group-by, aggregate functions, insert, update, delete, etc... in the future version. Obviously, it is not hard to implement inner/outer-join feature for pgsql_fdw module, but it may be a tough work for memcached_fdw module. > *) SELECT * FROM A JOIN B (...) doesn't work. Specifying columns in > SELECT clause explicitly like "SELECT A.col1, A.col2, ..." seems to work. > *) ORDER BY causes error if no column is specified in SELECT clause from > sort key's table. > I doubt these issues are in pgsql_fdw side, not the proposed patch itself. In the case when the table and column names/types are compatible between local-side and remote-side, the problem was not reproduced in my environment. I'd like to suggest you to add a functionality to map remote column names to the local ones in pgsql_fdw. See below: * I set up three foreign tables on the local side. CREATE FOREIGN TABLE ft_1 (a int, b text) SERVER local_db; CREATE FOREIGN TABLE ft_2 (x int, y text) SERVER local_db; CREATE FOREIGN TABLE ft_3 (s int, t text) SERVER local_db; * I also set up related tables on the remote side. CREATE TABLE ft_1 (a int, b text); CREATE TABLE ft_2 (x int, y text); CREATE TABLE ft_3 (ss int, tt text); Please note that column name of ft_3 is not compatible * JOIN ft_1 and ft_2 works collectly. postgres=# SELECT * FROM ft_1 JOIN ft_2 ON a = x;a | b | x | y ---+-----+---+-----2 | bbb | 2 | bbb3 | ccc | 3 | ccc4 | ddd | 4 | ddd (3 rows) postgres=# EXPLAIN SELECT * FROM ft_1 JOIN ft_2 ON a = x; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------Foreign Scanon multiple foreign tables (cost=0.00..0.00 rows=5000 width=72) Remote SQL: SELECT ft_1.a, ft_1.b, ft_2.x, ft_2.y FROMpublic.ft_1 ft_1, public.ft_2 ft_2 WHERE (ft_1.a = ft_2.x) (2 rows) * JOIN ft_1 and ft_3 does not works. Error message says ft_3.s does not exist. Probably, it means ft_3.s does not exist "on the remote host". postgres=# SELECT * FROM ft_1 JOIN ft_3 ON a = s; ERROR: could not execute foreign query DETAIL: ERROR: column ft_3.s does not exist LINE 1: SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t FROM public.ft_1 ft_1,... ^ HINT: SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t FROM public.ft_1 ft_1, public.ft_3 ft_3 WHERE (ft_1.a = ft_3.s) postgres=# EXPLAIN SELECT * FROM ft_1 JOIN ft_3 ON a = s; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------Foreign Scanon multiple foreign tables (cost=0.00..0.00 rows=5000 width=72) Remote SQL: SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t FROMpublic.ft_1 ft_1, public.ft_3 ft_3 WHERE (ft_1.a = ft_3.s) (2 rows) In fact, EXPLAIN shows us the remote SQL tries to reference ft_3.s, instead of ft_3.ss. Thanks, 2011年9月14日10:24 Shigeru Hanada <shigeru.hanada@gmail.com>: > Hi all, > > I'd like to propose $SUBJECT for further foreign query optimization. > I've not finished development, but I'd appreciate it if I got someone's > review on my WIP code and its design. > > Changes I made > ============== > > (1) Add foreign server OID to RelOptInfo > I think it would be nice to know whether a join comes from one foreign > server or not without digging into child nodes during considering paths > for a query. So I added serverid field to RelOptInfo, which defaults to > InvalidOid ,and is set to OID of the server if the node and all of its > children are from same foreign server. This also avoids looking catalog > up for foreign table entry to determine FDW routine. > > (2) Add new planner node, ForeignJoinPath > ForeignJoinPath derives JoinPath, like other Join nodes, and holds > FdwPlan like ForeignPath node. > > This node is used to represent a pushed-down join between foreign tables > and/or another foreign join in early planning phase, for all of > combination such as table-table, table-join, join-table and join-join > will be considered. In addition, though they might generate same > internal (FDW-specific) plan, reversed combination is considered; > planner generates two ForeignJoinPath for both (A & B) and (B & A). > > During creation of this node, planner calls new FDW handler function > PlanForeignJoin to get a FdwPlan which includes costs and internal plan > of a foreign join. If a FDW can't (or doesn't want to) handle this > join, just return NULL is OK, and then planner gives such optimization > up and considers other usual join methods such as nested loop and hash join. > > A subtree which has a ForeignJoin on its top is translated into a > ForeignScan node during constructing a plan tree. This behavior is > different from other join path nodes such as NestPath and MergePath, > because they have child plan nodes correspond to path nodes. > > (3) Add EXPALIN support for foreign join (currently just for debug) > ForeignScan might not be a simple foreign table scan, so > ExplainScanTarget() can't be used for it. An idea I have is adding > ExplainForeignScanTarget() to handle ForeignScan separately from other > scan nodes. > > (4) Add new GUC parameter, enable_foreignjoin > If this was off, planner never generates ForeignJoinPath. In such case, > foreign tables will be joined with one of NestLoop, MergeJoin and HashJoin. > > Known issue > =========== > > I'm sorry but attached patch, join_pushdown_v1.patch, is WIP, so > currently some kind of query fails. Known failure patterns are: > > *) SELECT * FROM A JOIN B (...) doesn't work. Specifying columns in > SELECT clause explicitly like "SELECT A.col1, A.col2, ..." seems to work. > *) ORDER BY causes error if no column is specified in SELECT clause from > sort key's table. > > Probably more problems still are there... > > PG-wrapper as sample implementation > =================================== > > pgsql_fdw-0.1.0.tar.gz is an WIP implementation of PG-wrapper, which can > (hopefully) handle both simple foreign table scan and multiple foreign > joins. You can build it with placing in contrib/, or using pgxs. Note > that it has some issues such as memory leak of PGresult. I'm planning > to propose this wrapper as a contrib module, but it would be after > clearing such issues. > > Regards, > -- > Shigeru Hanada > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
pgsql-hackers by date: