Re: WIP: Join push-down for foreign tables - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: WIP: Join push-down for foreign tables |
Date | |
Msg-id | CADyhKSXS3+WQEzx7XyR=SnziOBzRpSU-dBBNOj6BEJO9wHzatw@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: Join push-down for foreign tables (Kohei KaiGai <kaigai@kaigai.gr.jp>) |
Responses |
Re: WIP: Join push-down for foreign tables
|
List | pgsql-hackers |
Hanada-san, The proposed patch put an invocation of PlanForeignJoin on the create_foreignjoin_path() being also called by match_unsorted_outer(). Is it a suitable position to make a decision whether a join can be pushed-down? I think; it needs an additional functionality to provide higher priority on the foreign-join plan that other plans, when fdw determind a particular join can be pushed-down. (Sorry, I have no idea right now.) Let's see the following result. postgres=# EXPLAIN SELECT * FROM ft1 , ft2, lt3 WHERE ft1.a = ft2.x AND ft1.a = lt3.s; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------Merge Join (cost=205.08..758.83 rows=30750 width=108) Merge Cond: (ft1.a = lt3.s) -> Merge Join (cost=119.66..199.66 rows=5000width=72) Merge Cond: (ft1.a = ft2.x) -> Sort (cost=59.83..62.33 rows=1000 width=36) Sort Key: ft1.a -> Foreign Scan on ft1 (cost=10.00..10.00 rows=1000 width=36) RemoteSQL: DECLARE pgsql_fdw_cursor_0 SCROLL CURSOR FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1 -> Sort (cost=59.83..62.33 rows=1000 width=36) SortKey: ft2.x -> Foreign Scan on ft2 (cost=10.00..10.00 rows=1000 width=36) Remote SQL:DECLARE pgsql_fdw_cursor_1 SCROLL CURSOR FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2 -> Sort (cost=85.43..88.50 rows=1230 width=36) Sort Key: lt3.s -> Seq Scan on lt3 (cost=0.00..22.30 rows=1230 width=36) (15 rows) Then, I turned off the enable_mergejoin. postgres=# EXPLAIN SELECT * FROM ft1 , ft2, lt3 WHERE ft1.a = ft2.x AND ft1.a = lt3.s; QUERY PLAN -------------------------------------------------------------------------------------------------------------Hash Join (cost=37.67..1126.42rows=30750 width=108) Hash Cond: (ft1.a = lt3.s) -> Foreign Scan on multiple foreign tables (cost=0.00..0.00 rows=5000 width=72) Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y FROM public.ft1 ft1, public.ft2 ft2 WHERE (ft1.a = ft2.x) -> Hash (cost=22.30..22.30 rows=1230 width=36) -> Seq Scan on lt3 (cost=0.00..22.30rows=1230 width=36) (6 rows) Probably, the basic design is correct. However, the planner gives higher priority on the join plan between local and foreign than pushing-down foreign relations. Does it make sense not to consider any other possible plans when FDW decided a particular join can be pushed down? Thanks, 2011年10月7日18:06 Kohei KaiGai <kaigai@kaigai.gr.jp>: > 2011年10月4日12:08 Shigeru Hanada <shigeru.hanada@gmail.com>: >>> In my opinion, FdwRoutine should have an additional API to inform the core its >>> supported features; such as inner-join, outer-join, order-by, >>> group-by, aggregate >>> functions, insert, update, delete, etc... in the future version. >> >> Sure, so in my design PlanForeignJoin is optional. >> >> The lack of capability is informed from FDW with setting function >> pointer in FdwRoutine to NULL. If PlanForeignJoin was NULL, core >> (planner) will give up to consider join push-down, and use one of local >> join methods such as NestLoop and MergeJoin for those foreign tables. >> As you say, other push-down-able features would also have optional >> handler function for each. >> > Sorry, I overlooked it was already implemented at create_foreignjoin_path(). > > I additionally tried several cases using pgsql_fdw. > In some cases, it seems to me the planner don't push down the join tree > as you probably expected. > Please see the following example: > > I defined three foreign tables: ft1(a int, b text), ft2(x int, y > text), ft3(s int, t text), > and lt1, lt2, lt3 are regular local tables. > > postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join ft3 on a = s); > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------- > Foreign Scan on multiple foreign tables (cost=0.00..0.00 rows=25000 width=108) > Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y, ft3.s, ft3.t FROM > public.ft1 ft1, public.ft2 ft2, public.ft3 ft3 WHERE (ft1.a = ft3.s) > AND (ft1.a = ft2.x) > (2 rows) > > It works good. > (P.S. I noticed that pgsql_fdw has incorrect Assert(). Please fix > pgsql_fdw.c:730) > > However, an existence of local relation makes planner confused. > It seems to me you expect "ft1 join ft2 on a = x" > > postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join lt3 on a = s); > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------- > Merge Join (cost=205.08..758.83 rows=30750 width=108) > Merge Cond: (ft1.a = lt3.s) > -> Merge Join (cost=119.66..199.66 rows=5000 width=72) > Merge Cond: (ft1.a = ft2.x) > -> Sort (cost=59.83..62.33 rows=1000 width=36) > Sort Key: ft1.a > -> Foreign Scan on ft1 (cost=10.00..10.00 rows=1000 width=36) > Remote SQL: DECLARE pgsql_fdw_cursor_2 SCROLL > CURSOR FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1 > -> Sort (cost=59.83..62.33 rows=1000 width=36) > Sort Key: ft2.x > -> Foreign Scan on ft2 (cost=10.00..10.00 rows=1000 width=36) > Remote SQL: DECLARE pgsql_fdw_cursor_3 SCROLL > CURSOR FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2 > -> Sort (cost=85.43..88.50 rows=1230 width=36) > Sort Key: lt3.s > -> Seq Scan on lt3 (cost=0.00..22.30 rows=1230 width=36) > (15 rows) > > What is the reason why the foreign join is not pushed down? > Maybe, injected Sort plan prevent the planner to consider both side of > relations being foreign scan owned by same server? I'm still > investigating the reason. > > I hope comments from committers. :-( > > A collateral evidence is below. > If we try to sort the result by a key being not used to join, the both > of foreign scan gets pushed down. > > postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by y; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------- > Sort (cost=307.19..319.69 rows=5000 width=72) > Sort Key: ft2.y > -> Foreign Scan on multiple foreign tables (cost=0.00..0.00 > rows=5000 width=72) > Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y FROM public.ft1 > ft1, public.ft2 ft2 WHERE (ft1.a = ft2.x) > (4 rows) > > However, when I tried to sort by a key being used to join, the both of > foreign scan was not pushed down. > > postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by a; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------- > Merge Join (cost=119.66..199.66 rows=5000 width=72) > Merge Cond: (ft1.a = ft2.x) > -> Sort (cost=59.83..62.33 rows=1000 width=36) > Sort Key: ft1.a > -> Foreign Scan on ft1 (cost=10.00..10.00 rows=1000 width=36) > Remote SQL: DECLARE pgsql_fdw_cursor_6 SCROLL CURSOR > FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1 > -> Sort (cost=59.83..62.33 rows=1000 width=36) > Sort Key: ft2.x > -> Foreign Scan on ft2 (cost=10.00..10.00 rows=1000 width=36) > Remote SQL: DECLARE pgsql_fdw_cursor_7 SCROLL CURSOR > FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2 > (10 rows) > > Thanks, > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
pgsql-hackers by date: