Re: WIP: Join push-down for foreign tables - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: WIP: Join push-down for foreign tables
Date
Msg-id CADyhKSUaL31GENeYgzfKshAi7NXh=yHBR8N2XKiu+B2fZmC-UA@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Join push-down for foreign tables  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Responses Re: WIP: Join push-down for foreign tables
Re: WIP: Join push-down for foreign tables
List pgsql-hackers
2011年10月4日12:08 Shigeru Hanada <shigeru.hanada@gmail.com>:
>> In my opinion, FdwRoutine should have an additional API to inform the core its
>> supported features; such as inner-join, outer-join, order-by,
>> group-by, aggregate
>> functions, insert, update, delete, etc... in the future version.
>
> Sure, so in my design PlanForeignJoin is optional.
>
> The lack of capability is informed from FDW with setting function
> pointer in FdwRoutine to NULL.  If PlanForeignJoin was NULL, core
> (planner) will give up to consider join push-down, and use one of local
> join methods such as NestLoop and MergeJoin for those foreign tables.
> As you say, other push-down-able features would also have optional
> handler function for each.
>
Sorry, I overlooked it was already implemented at create_foreignjoin_path().

I additionally tried several cases using pgsql_fdw.
In some cases, it seems to me the planner don't push down the join tree
as you probably expected.
Please see the following example:

I defined three foreign tables: ft1(a int, b text), ft2(x int, y
text), ft3(s int, t text),
and lt1, lt2, lt3 are regular local tables.

postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join ft3 on a = s);
 QUERY PLAN
---------------------------------------------------------------------------------------------------------------Foreign
Scanon multiple foreign tables  (cost=0.00..0.00 rows=25000 width=108)  Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y,
ft3.s,ft3.t FROM
 
public.ft1 ft1, public.ft2 ft2, public.ft3 ft3 WHERE (ft1.a = ft3.s)
AND (ft1.a = ft2.x)
(2 rows)

It works good.
(P.S. I noticed that pgsql_fdw has incorrect Assert(). Please fix
pgsql_fdw.c:730)

However, an existence of local relation makes planner confused.
It seems to me you expect "ft1 join ft2 on a = x"

postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join lt3 on a = s);
       QUERY PLAN
 

----------------------------------------------------------------------------------------------------------------------Merge
Join (cost=205.08..758.83 rows=30750 width=108)  Merge Cond: (ft1.a = lt3.s)  ->  Merge Join  (cost=119.66..199.66
rows=5000width=72)        Merge Cond: (ft1.a = ft2.x)        ->  Sort  (cost=59.83..62.33 rows=1000 width=36)
  Sort Key: ft1.a              ->  Foreign Scan on ft1  (cost=10.00..10.00 rows=1000 width=36)
RemoteSQL: DECLARE pgsql_fdw_cursor_2 SCROLL
 
CURSOR FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1        ->  Sort  (cost=59.83..62.33 rows=1000 width=36)
SortKey: ft2.x              ->  Foreign Scan on ft2  (cost=10.00..10.00 rows=1000 width=36)                    Remote
SQL:DECLARE pgsql_fdw_cursor_3 SCROLL
 
CURSOR FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2  ->  Sort  (cost=85.43..88.50 rows=1230 width=36)        Sort Key:
lt3.s       ->  Seq Scan on lt3  (cost=0.00..22.30 rows=1230 width=36)
 
(15 rows)

What is the reason why the foreign join is not pushed down?
Maybe, injected Sort plan prevent the planner to consider both side of
relations being foreign scan owned by same server? I'm still
investigating the reason.

I hope comments from committers. :-(

A collateral evidence is below.
If we try to sort the result by a key being not used to join, the both
of foreign scan gets pushed down.

postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by y;
QUERYPLAN
 
-----------------------------------------------------------------------------------------------------------------Sort
(cost=307.19..319.69rows=5000 width=72)  Sort Key: ft2.y  ->  Foreign Scan on multiple foreign tables
(cost=0.00..0.00
rows=5000 width=72)        Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y FROM public.ft1
ft1, public.ft2 ft2 WHERE (ft1.a = ft2.x)
(4 rows)

However, when I tried to sort by a key being used to join, the both of
foreign scan was not pushed down.

postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by a;
QUERYPLAN
 
----------------------------------------------------------------------------------------------------------------Merge
Join (cost=119.66..199.66 rows=5000 width=72)  Merge Cond: (ft1.a = ft2.x)  ->  Sort  (cost=59.83..62.33 rows=1000
width=36)       Sort Key: ft1.a        ->  Foreign Scan on ft1  (cost=10.00..10.00 rows=1000 width=36)
RemoteSQL: DECLARE pgsql_fdw_cursor_6 SCROLL CURSOR
 
FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1  ->  Sort  (cost=59.83..62.33 rows=1000 width=36)        Sort Key: ft2.x
   ->  Foreign Scan on ft2  (cost=10.00..10.00 rows=1000 width=36)              Remote SQL: DECLARE pgsql_fdw_cursor_7
SCROLLCURSOR
 
FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2
(10 rows)

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: alter table only ... drop constraint broken in HEAD
Next
From: Usama Dar
Date:
Subject: PQsendQuery/ PQgetResult Problem