Re: Oddity in EXPLAIN for foreign/custom join pushdown plans - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Date
Msg-id 741173c5-af93-00b4-e367-df28fee5910a@lab.ntt.co.jp
Whole thread Raw
In response to Re: Oddity in EXPLAIN for foreign/custom join pushdown plans  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
List pgsql-hackers
On 2016/07/28 22:11, Kouhei Kaigai wrote:

I wrote:
>> That may be so, but my point is that the target relations involved in
>> the foreign join (ie, ft1 and ft2) should be printed somewhere in the
>> EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.

> Why? According to your rule, Hash Join should take "on t0,t1,t2".
>
> postgres=# explain select id from t0 natural join t1 natural join t2;
>                                  QUERY PLAN
> -----------------------------------------------------------------------------
>  Hash Join  (cost=6370.00..4560826.24 rows=98784048 width=4)
>    Hash Cond: (t0.aid = t1.aid)
>    ->  Hash Join  (cost=3185.00..3199360.58 rows=98784048 width=8)
>          Hash Cond: (t0.bid = t2.bid)
>          ->  Seq Scan on t0  (cost=0.00..1833334.80 rows=100000080 width=12)
>          ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
>                ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
>    ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
>          ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
> (9 rows)

I don't think it needs "on t0,t1,t2", because we can see joining  
relations from inner/outer plans in that case.  In a foreign-join case,  
however, we can't see such relations from the EXPLAIN printed *by core*.   postgres_fdw avoids this issue by adding
suchrelations to the EXPLAIN  
 
using ExplainForeignScan as shown in the below example, but since such  
relations are essential, I think that information should be shown by  
core itself.

postgres=# explain select * from (select ft1.a from ft1 left join ft2 on  
ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3  
left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;                           QUERY PLAN
---------------------------------------------------------------- Hash Full Join  (cost=202.06..204.12 rows=1 width=8)
HashCond: (ft1.a = ft3.a)   ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)         Relations: (public.ft1) LEFT
JOIN(public.ft2)   ->  Hash  (cost=102.05..102.05 rows=1 width=4)         ->  Foreign Scan  (cost=100.00..102.05 rows=1
width=4)              Relations: (public.ft3) LEFT JOIN (public.ft4)
 
(7 rows)
From the Relations line shown by postgres_fdw, we can see which foreign  
join joins which foreign tables, but if no such lines, we couldn't.

I wrote:
>> Probably something like this:
>>
>>    Foreign Processing
>>      Remote Operations: ...
>>
>> In the Remote Operations line, the FDW/extension could print any info
>> about remote operations, eg, "Scan/Join + Aggregate".

> "Foreign" implies this node is processed by FDW, but "Procesing" gives us
> no extra information; seems to me redundant.

I intentionally chose that word and thought we could leave detailed  
descriptions about remote operations to the FDW/extension; a broader  
word like "Processing" seems to work well because we allow various kinds  
of operations to the remote side, in addition to scans/joins, to be  
performed in that one Foreign Scan node indicated by "Foreign  
Processing", such as aggregation, window functions, distinct, order by,  
row locking, table modification, or combinations of them.

> Prior to the new invention, please explain why you don't want to by my
> suggestion first? Annoying is a feel of you, but not a logic to persuade
> others.

I'm not saying that the idea I proposed is better than your suggestion.   Just brain storming.  I want to know what
optionswe have and the pros  
 
and cons of each approach.

>>> postgres=# explain select id from t0 natural join t1 natural join t2;
>>>                                 QUERY PLAN
>>> ---------------------------------------------------------------------------
>>>  Custom Scan (GpuJoin)  (cost=12385.67..291245.35 rows=9815465 width=4)
>>>    GPU Projection: t0.id
>>>    Depth 1: GpuHashJoin, HashKeys: (t0.bid)
>>>             JoinQuals: (t0.bid = t2.bid)
>>>             Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
>>>    Depth 2: GpuHashJoin, HashKeys: (t0.aid)
>>>             JoinQuals: (t0.aid = t1.aid)
>>>             Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
>>>    ->  Seq Scan on t0  (cost=0.00..183333.96 rows=9999996 width=12)
>>>    ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
>>>    ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
>>> (11 rows)

> My largest concern for you proposition is, ForeignScan/CustomScan node is
> enforced to print name of underlying relations, regardless of its actual
> behavior. The above GpuJoin never scans tables at least, thus, it mislead
> users if we have no choice to print underlying relation names.

OK, I understand we would need special handling for such custom joins.

Best regards,
Etsuro Fujita





pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Wrong defeinition of pq_putmessage_noblock since 9.5
Next
From: Ashutosh Bapat
Date:
Subject: Re: Oddity in EXPLAIN for foreign/custom join pushdown plans