Re: WIP: Join push-down for foreign tables - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: WIP: Join push-down for foreign tables
Date
Msg-id CADyhKSXS3+WQEzx7XyR=SnziOBzRpSU-dBBNOj6BEJO9wHzatw@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Join push-down for foreign tables  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
Responses Re: WIP: Join push-down for foreign tables
List pgsql-hackers
Hanada-san,

The proposed patch put an invocation of PlanForeignJoin on the
create_foreignjoin_path() being also called by match_unsorted_outer().
Is it a suitable position to make a decision whether a join can be
pushed-down?

I think; it needs an additional functionality to provide higher priority
on the foreign-join plan that other plans, when fdw determind a particular
join can be pushed-down.
(Sorry, I have no idea right now.)

Let's see the following result.

postgres=# EXPLAIN SELECT * FROM ft1 , ft2, lt3 WHERE ft1.a = ft2.x
AND ft1.a = lt3.s;                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------Merge
Join (cost=205.08..758.83 rows=30750 width=108)  Merge Cond: (ft1.a = lt3.s)  ->  Merge Join  (cost=119.66..199.66
rows=5000width=72)        Merge Cond: (ft1.a = ft2.x)        ->  Sort  (cost=59.83..62.33 rows=1000 width=36)
  Sort Key: ft1.a              ->  Foreign Scan on ft1  (cost=10.00..10.00 rows=1000 width=36)
RemoteSQL: DECLARE pgsql_fdw_cursor_0 SCROLL
 
CURSOR FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1        ->  Sort  (cost=59.83..62.33 rows=1000 width=36)
SortKey: ft2.x              ->  Foreign Scan on ft2  (cost=10.00..10.00 rows=1000 width=36)                    Remote
SQL:DECLARE pgsql_fdw_cursor_1 SCROLL
 
CURSOR FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2  ->  Sort  (cost=85.43..88.50 rows=1230 width=36)        Sort Key:
lt3.s       ->  Seq Scan on lt3  (cost=0.00..22.30 rows=1230 width=36)
 
(15 rows)

Then, I turned off the enable_mergejoin.

postgres=# EXPLAIN SELECT * FROM ft1 , ft2, lt3 WHERE ft1.a = ft2.x
AND ft1.a = lt3.s;                                                  QUERY PLAN
-------------------------------------------------------------------------------------------------------------Hash Join
(cost=37.67..1126.42rows=30750 width=108)  Hash Cond: (ft1.a = lt3.s)  ->  Foreign Scan on multiple foreign tables
(cost=0.00..0.00
rows=5000 width=72)        Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y FROM public.ft1
ft1, public.ft2 ft2 WHERE (ft1.a = ft2.x)  ->  Hash  (cost=22.30..22.30 rows=1230 width=36)        ->  Seq Scan on lt3
(cost=0.00..22.30rows=1230 width=36)
 
(6 rows)

Probably, the basic design is correct. However, the planner gives
higher priority on the join plan between
local and foreign than pushing-down foreign relations.

Does it make sense not to consider any other possible plans when FDW
decided a particular join can be
pushed down?

Thanks,

2011年10月7日18:06 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> 2011年10月4日12:08 Shigeru Hanada <shigeru.hanada@gmail.com>:
>>> In my opinion, FdwRoutine should have an additional API to inform the core its
>>> supported features; such as inner-join, outer-join, order-by,
>>> group-by, aggregate
>>> functions, insert, update, delete, etc... in the future version.
>>
>> Sure, so in my design PlanForeignJoin is optional.
>>
>> The lack of capability is informed from FDW with setting function
>> pointer in FdwRoutine to NULL.  If PlanForeignJoin was NULL, core
>> (planner) will give up to consider join push-down, and use one of local
>> join methods such as NestLoop and MergeJoin for those foreign tables.
>> As you say, other push-down-able features would also have optional
>> handler function for each.
>>
> Sorry, I overlooked it was already implemented at create_foreignjoin_path().
>
> I additionally tried several cases using pgsql_fdw.
> In some cases, it seems to me the planner don't push down the join tree
> as you probably expected.
> Please see the following example:
>
> I defined three foreign tables: ft1(a int, b text), ft2(x int, y
> text), ft3(s int, t text),
> and lt1, lt2, lt3 are regular local tables.
>
> postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join ft3 on a = s);
>
>  QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Foreign Scan on multiple foreign tables  (cost=0.00..0.00 rows=25000 width=108)
>   Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y, ft3.s, ft3.t FROM
> public.ft1 ft1, public.ft2 ft2, public.ft3 ft3 WHERE (ft1.a = ft3.s)
> AND (ft1.a = ft2.x)
> (2 rows)
>
> It works good.
> (P.S. I noticed that pgsql_fdw has incorrect Assert(). Please fix
> pgsql_fdw.c:730)
>
> However, an existence of local relation makes planner confused.
> It seems to me you expect "ft1 join ft2 on a = x"
>
> postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join lt3 on a = s);
>                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=205.08..758.83 rows=30750 width=108)
>   Merge Cond: (ft1.a = lt3.s)
>   ->  Merge Join  (cost=119.66..199.66 rows=5000 width=72)
>         Merge Cond: (ft1.a = ft2.x)
>         ->  Sort  (cost=59.83..62.33 rows=1000 width=36)
>               Sort Key: ft1.a
>               ->  Foreign Scan on ft1  (cost=10.00..10.00 rows=1000 width=36)
>                     Remote SQL: DECLARE pgsql_fdw_cursor_2 SCROLL
> CURSOR FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1
>         ->  Sort  (cost=59.83..62.33 rows=1000 width=36)
>               Sort Key: ft2.x
>               ->  Foreign Scan on ft2  (cost=10.00..10.00 rows=1000 width=36)
>                     Remote SQL: DECLARE pgsql_fdw_cursor_3 SCROLL
> CURSOR FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2
>   ->  Sort  (cost=85.43..88.50 rows=1230 width=36)
>         Sort Key: lt3.s
>         ->  Seq Scan on lt3  (cost=0.00..22.30 rows=1230 width=36)
> (15 rows)
>
> What is the reason why the foreign join is not pushed down?
> Maybe, injected Sort plan prevent the planner to consider both side of
> relations being foreign scan owned by same server? I'm still
> investigating the reason.
>
> I hope comments from committers. :-(
>
> A collateral evidence is below.
> If we try to sort the result by a key being not used to join, the both
> of foreign scan gets pushed down.
>
> postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by y;
>                                                   QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
>  Sort  (cost=307.19..319.69 rows=5000 width=72)
>   Sort Key: ft2.y
>   ->  Foreign Scan on multiple foreign tables  (cost=0.00..0.00
> rows=5000 width=72)
>         Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y FROM public.ft1
> ft1, public.ft2 ft2 WHERE (ft1.a = ft2.x)
> (4 rows)
>
> However, when I tried to sort by a key being used to join, the both of
> foreign scan was not pushed down.
>
> postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by a;
>                                                   QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=119.66..199.66 rows=5000 width=72)
>   Merge Cond: (ft1.a = ft2.x)
>   ->  Sort  (cost=59.83..62.33 rows=1000 width=36)
>         Sort Key: ft1.a
>         ->  Foreign Scan on ft1  (cost=10.00..10.00 rows=1000 width=36)
>               Remote SQL: DECLARE pgsql_fdw_cursor_6 SCROLL CURSOR
> FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1
>   ->  Sort  (cost=59.83..62.33 rows=1000 width=36)
>         Sort Key: ft2.x
>         ->  Foreign Scan on ft2  (cost=10.00..10.00 rows=1000 width=36)
>               Remote SQL: DECLARE pgsql_fdw_cursor_7 SCROLL CURSOR
> FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2
> (10 rows)
>
> Thanks,
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>
>



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: libpq, PQdescribePrepared -> PQftype, PQfmod, no PQnullable
Next
From: "Kevin Grittner"
Date:
Subject: Re: libpq, PQdescribePrepared -> PQftype, PQfmod, no PQnullable