Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 56289A04.9080007@lab.ntt.co.jp
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
On 2015/10/20 9:36, Kouhei Kaigai wrote:
> Even if we fetch whole-row of both side, join pushdown is exactly working
> because we can receive less number of rows than local join + 2 of foreign-
> scan. (If planner works well, we can expect join-path that increases number
> of rows shall be dropped.)
>
> One downside of my proposition is growth of width for individual rows.
> It is a trade-off situation. The above approach takes no changes for
> existing EPQ infrastructure, thus, its implementation design is clear.
> On the other hands, your approach will reduce traffic over the network,
> however, it is still unclear how we integrate scanrelid==0 with EPQ
> infrastructure.

I agree with KaiGai-san that his proposition (or my proposition based on 
secondary plans) is still a performance improvement over the current 
implementation on local joining plus early row locking, since that that 
wouldn't have to transfer useless data that didn't satisfy join 
conditions at all!

> On the other hands, in case of custom-scan that takes underlying local
> scan-nodes, thus, any kind of ROW_MARK_* except for ROW_MARK_COPY will
> happen. I think width of the joined tuples are relatively minor issue
> than FDW cases. However, we cannot expect the fetched rows are protected
> by early row-locking mechanism, so probability of re-fetching rows and
> reconstruction of joined-tuple has relatively higher priority.

I see.

>> There is also some possible loss of efficiency with this approach.
>> Suppose that we have two tables ft1 and ft2 which are being joined,
>> and we push down the join.  They are being joined on an integer
>> column, and the join needs to select several other columns as well.
>> However, ft1 and ft2 are very wide tables that also contain some text
>> columns.   The query is like this:
>>
>> SELECT localtab.a, ft1.p, ft2.p FROM localtab LEFT JOIN (ft1 JOIN ft2
>> ON ft1.x = ft2.x AND ft1.huge ~ 'stuff' AND f2.huge2 ~ 'nonsense') ON
>> localtab.q = ft1.q;
>>
>> If we refetch each row individually, we will need a wholerow image of
>> ft1 and ft2 that includes all columns, or at least f1.huge and
>> f2.huge2.  If we just fetch a wholerow image of the join output, we
>> can exclude those.  The only thing we need to recheck is that it's
>> still the case that localtab.q = ft1.q (because the value of
>> localtab.q might have changed).

As KaiGai-san mentioned above, what we need to discuss more about with 
Robert's proposition is how to integrate that into the existing EPQ 
machinery.  For example, when, where, and how should we refetch the 
whole-row image of the join output in the case of late row locking?  IMV 
I think that that would need to add a new FDW API different from 
RefetchForeignRow, say RefetchForeignJoinRow.

IMO I think that another benefit from the proposition from KaiGai-san 
(or me) would be that that could provide the whole functionality for row 
locking in remote joins, without an additional development burden on an 
FDW author; the author only has to write GetForeignRowMarkType and 
RefetchForeignRow, which I think is relatively easy.  I think that in 
the proposition, the use of rowmark types such as ROW_MARK_SHARE or 
ROW_MARK_EXCLUSIVE for foreign tables in remote joins would be quite 
inefficient, but I think that the use of ROW_MARK_REFERENCE instead of 
ROW_MARK_COPY would be an option for the workload where EPQ rechecks are 
rarely invoked, because we just need to transfer ctids, not whole-row 
images.

Best regards,
Etsuro Fujita




pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH v3] GSSAPI encryption support
Next
From: Andres Freund
Date:
Subject: Re: [PATCH v3] GSSAPI encryption support