Re: Oddity in EXPLAIN for foreign/custom join pushdown plans - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Date
Msg-id edd1ee88-d7c9-583e-6c0a-bc7d50e40522@lab.ntt.co.jp
Whole thread Raw
In response to Re: Oddity in EXPLAIN for foreign/custom join pushdown plans  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
List pgsql-hackers
On 2016/07/29 13:05, Etsuro Fujita wrote:
> In a foreign-join case,
> however, we can't see such relations from the EXPLAIN printed *by core*.
>  postgres_fdw avoids this issue by adding such relations to the EXPLAIN
> using ExplainForeignScan as shown in the below example, but since such
> relations are essential, I think that information should be shown by
> core itself.
>
> postgres=# explain select * from (select ft1.a from ft1 left join ft2 on
> ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3
> left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
>                            QUERY PLAN
> ----------------------------------------------------------------
>  Hash Full Join  (cost=202.06..204.12 rows=1 width=8)
>    Hash Cond: (ft1.a = ft3.a)
>    ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
>          Relations: (public.ft1) LEFT JOIN (public.ft2)
>    ->  Hash  (cost=102.05..102.05 rows=1 width=4)
>          ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
>                Relations: (public.ft3) LEFT JOIN (public.ft4)
> (7 rows)
>
> From the Relations line shown by postgres_fdw, we can see which foreign
> join joins which foreign tables, but if no such lines, we couldn't.

I thought about the Relations line a bit more and noticed that there are  
cases where the table reference names for joining relations in the  
Relations line are printed incorrectly.  Here is an example:

postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1,  
ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2  
where t1.a = t2.a) as t(t1a, t2a);                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Unique (cost=204.12..204.13 rows=2 width=8)   Output: t1.a, t2.a   ->  Sort  (cost=204.12..204.12 rows=2 width=8)
 Output: t1.a, t2.a         Sort Key: t1.a, t2.a         ->  Append  (cost=100.00..204.11 rows=2 width=8)
-> Foreign Scan  (cost=100.00..102.04 rows=1 width=8)                     Output: t1.a, t2.a
Relations:(public.ft1 t1) INNER JOIN (public.ft2 t2)                     Remote SQL: SELECT r1.a, r2.a FROM (public.t1
r1 
 
INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))               ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
              Output: t1_1.a, t2_1.a                     Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
          Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1  
 
INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
(14 rows)

The table reference names for ft1 and ft2 in the Relations line for the  
second Foreign Scan should be t1_1 and t2_1 respectively.

Another concern about the Relations line is, that represents just an  
internal representation of a pushed-down join, so that would not  
necessarily match a deparsed query shown in the Remote SQL line.  Here  
is an example, which I found when working on supporting pushing down  
full outer join a lot more, by improving the deparsing logic so that  
postgres_fdw can build a remote query that involves subqueries [1],  
which I'll work on for 10.0:

+ -- full outer join with restrictions on the joining relations
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM ft4 WHERE c1 BETWEEN 50 AND  
60) t1 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 BETWEEN 50 AND 60) t2 ON  
(t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
+                                                                  QUERY  
PLAN  

+  

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Foreign Scan
+    Output: ft4.c1, ft5.c1
+    Relations: (public.ft4) FULL JOIN (public.ft5)
+    Remote SQL: SELECT ss1.c1, ss2.c1 FROM ((SELECT c1 FROM "S 1"."T 3"  
WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss1(c1) FULL JOIN (SELECT c1 FROM  
"S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss2(c1) ON (((ss1.c1 =  
ss2.c1)))) ORDER BY ss1.c1 ASC NULLS LAST, ss2.c1 ASC NULLS LAST
+ (4 rows)

"(public.ft4) FULL JOIN (public.ft5)" in the Relations line does not  
exactly match the deparsed query in the Remote SQL line, which I think  
would be rather confusing for users.  (We may be able to print more  
exact information in the Relations line so as to match the depaserd  
query, but I think that that would make the Relations line redundant.)

Would we really need the Relations line?  If joining relations are  
printed by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1"  
as proposed upthread, we can see those relations from that, not the  
Relations line.  Also we can see the join tree structure from the  
deparsed query in the Remote SQL line.  The Relations line seems to be  
not that useful anymore, then.  What do you think about that?

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5710D7E2.7010302%40lab.ntt.co.jp





pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
Next
From: Dean Rasheed
Date:
Subject: Re: Combining hash values