Re: Oddity in EXPLAIN for foreign/custom join pushdown plans - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Oddity in EXPLAIN for foreign/custom join pushdown plans |
Date | |
Msg-id | 741173c5-af93-00b4-e367-df28fee5910a@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Oddity in EXPLAIN for foreign/custom join pushdown plans (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Re: Oddity in EXPLAIN for foreign/custom join pushdown plans Re: Oddity in EXPLAIN for foreign/custom join pushdown plans |
List | pgsql-hackers |
On 2016/07/28 22:11, Kouhei Kaigai wrote: I wrote: >> That may be so, but my point is that the target relations involved in >> the foreign join (ie, ft1 and ft2) should be printed somewhere in the >> EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan. > Why? According to your rule, Hash Join should take "on t0,t1,t2". > > postgres=# explain select id from t0 natural join t1 natural join t2; > QUERY PLAN > ----------------------------------------------------------------------------- > Hash Join (cost=6370.00..4560826.24 rows=98784048 width=4) > Hash Cond: (t0.aid = t1.aid) > -> Hash Join (cost=3185.00..3199360.58 rows=98784048 width=8) > Hash Cond: (t0.bid = t2.bid) > -> Seq Scan on t0 (cost=0.00..1833334.80 rows=100000080 width=12) > -> Hash (cost=1935.00..1935.00 rows=100000 width=4) > -> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4) > -> Hash (cost=1935.00..1935.00 rows=100000 width=4) > -> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=4) > (9 rows) I don't think it needs "on t0,t1,t2", because we can see joining relations from inner/outer plans in that case. In a foreign-join case, however, we can't see such relations from the EXPLAIN printed *by core*. postgres_fdw avoids this issue by adding suchrelations to the EXPLAIN using ExplainForeignScan as shown in the below example, but since such relations are essential, I think that information should be shown by core itself. postgres=# explain select * from (select ft1.a from ft1 left join ft2 on ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3 left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a; QUERY PLAN ---------------------------------------------------------------- Hash Full Join (cost=202.06..204.12 rows=1 width=8) HashCond: (ft1.a = ft3.a) -> Foreign Scan (cost=100.00..102.05 rows=1 width=4) Relations: (public.ft1) LEFT JOIN(public.ft2) -> Hash (cost=102.05..102.05 rows=1 width=4) -> Foreign Scan (cost=100.00..102.05 rows=1 width=4) Relations: (public.ft3) LEFT JOIN (public.ft4) (7 rows) From the Relations line shown by postgres_fdw, we can see which foreign join joins which foreign tables, but if no such lines, we couldn't. I wrote: >> Probably something like this: >> >> Foreign Processing >> Remote Operations: ... >> >> In the Remote Operations line, the FDW/extension could print any info >> about remote operations, eg, "Scan/Join + Aggregate". > "Foreign" implies this node is processed by FDW, but "Procesing" gives us > no extra information; seems to me redundant. I intentionally chose that word and thought we could leave detailed descriptions about remote operations to the FDW/extension; a broader word like "Processing" seems to work well because we allow various kinds of operations to the remote side, in addition to scans/joins, to be performed in that one Foreign Scan node indicated by "Foreign Processing", such as aggregation, window functions, distinct, order by, row locking, table modification, or combinations of them. > Prior to the new invention, please explain why you don't want to by my > suggestion first? Annoying is a feel of you, but not a logic to persuade > others. I'm not saying that the idea I proposed is better than your suggestion. Just brain storming. I want to know what optionswe have and the pros and cons of each approach. >>> postgres=# explain select id from t0 natural join t1 natural join t2; >>> QUERY PLAN >>> --------------------------------------------------------------------------- >>> Custom Scan (GpuJoin) (cost=12385.67..291245.35 rows=9815465 width=4) >>> GPU Projection: t0.id >>> Depth 1: GpuHashJoin, HashKeys: (t0.bid) >>> JoinQuals: (t0.bid = t2.bid) >>> Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1) >>> Depth 2: GpuHashJoin, HashKeys: (t0.aid) >>> JoinQuals: (t0.aid = t1.aid) >>> Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1) >>> -> Seq Scan on t0 (cost=0.00..183333.96 rows=9999996 width=12) >>> -> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4) >>> -> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=4) >>> (11 rows) > My largest concern for you proposition is, ForeignScan/CustomScan node is > enforced to print name of underlying relations, regardless of its actual > behavior. The above GpuJoin never scans tables at least, thus, it mislead > users if we have no choice to print underlying relation names. OK, I understand we would need special handling for such custom joins. Best regards, Etsuro Fujita
pgsql-hackers by date: