Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 5639D4DF.5020709@lab.ntt.co.jp
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: Foreign join pushdown vs EvalPlanQual  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: Foreign join pushdown vs EvalPlanQual  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers
On 2015/11/04 17:10, Kouhei Kaigai wrote:
>> On 2015/10/28 6:04, Robert Haas wrote:
>>> On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>> Sorry, my explanation was not correct.  (Needed to take in caffeine.) What
>>>> I'm concerned about is the following:
>>>>
>>>> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON
>>>> localtab.id = ft1.id FOR UPDATE OF ft1
>>>>
>>>> LockRows
>>>> -> Nested Loop
>>>>        Join Filter: (localtab.id = ft1.id)
>>>>        -> Seq Scan on localtab
>>>>        -> Foreign Scan on <ft1, ft2>
>>>>             Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x
>>>> FOR UPDATE OF ft1
>>>>
>>>> Assume that ft1 performs late row locking.

>>> If the SQL includes "FOR UPDATE of ft1", then it clearly performs
>>> early row locking.  I assume you meant to omit that.

>>>> If an EPQ recheck was invoked
>>>> due to a concurrent transaction on the remote server that changed only the
>>>> value x of the ft1 tuple previously retrieved, then we would have to
>>>> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2
>>>> tuple previously retrieved was not a null tuple.) However, I'm not sure how
>>>> we can do that in ForeignRecheck; we can't know for example, which one is
>>>> outer and which one is inner, without an alternative local join execution
>>>> plan.  Maybe I'm missing something, though.

>>> I would expect it to issue a new query like: SELECT * FROM ft1 LEFT
>>> JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1.

>> We assume here that ft1 uses late row locking, so I thought the above
>> SQL should include "FOR UPDATE of ft1".  But I still don't think that
>> that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the
>> fake ft1/ft2-join tuple with nulls for ft2, as expected.  The reason for
>> that is that the updated version of the ft1 tuple wouldn't satisfy the
>> ft1.tid = $0 condition in an EPQ recheck, because the ctid for the
>> updated version of the ft1 tuple has changed.  (IIUC, I think that if we
>> use a TID scan for ft1, the SQL would generate the expected result,
>> because I think that the TID condition would be ignored in the EPQ
>> recheck, but I don't think it's guaranteed to use a TID scan for ft1.)
>> Maybe I'm missing something, though.

> It looks to me, we should not use ctid system column to identify remote
> row when postgres_fdw tries to support late row locking.
>
> The documentation says:
>    http://www.postgresql.org/docs/devel/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE
>
>    UPDATE and DELETE operations are performed against rows previously
>    fetched by the table-scanning functions. The FDW may need extra information,
>    such as a row ID or the values of primary-key columns, to ensure that it can
>    identify the exact row to update or delete
>
> The "rowid" should not be changed once it is fetched from the remote side
> until it is actually updated, deleted or locked, for correct identification.
> If ctid is used for this purpose, it is safe only when remote row is locked
> when it is fetched - it is exactly early row locking behavior, isn't it?

Yeah, we should use early row locking for a target foreign table in 
UPDATE/DELETE.

In case of SELECT FOR UPDATE, I think we are allowed to use ctid to 
identify target rows for late row locking, but I think the above SQL 
should be changed to something like this:

SELECT * FROM (SELECT * FROM ft1 WHERE ft1.tid = $0 FOR UPDATE) ss1 LEFT 
JOIN (SELECT * FROM ft2 WHERE ft2.tid = $1) ss2 ON ss1.x = ss2.x

>>> This should be significantly more efficient than fetching the base
>>> rows from each of two tables with two separate queries.

>> Maybe I think we could fix the SQL, so I have to admit that, but I'm
>> just wondering (1) what would happen for the case when ft1 uses late row
>> rocking and ft2 uses early row rocking and (2) that would be still more
>> efficient than re-fetching only the base row from ft1.

> It should be decision by FDW driver. It is not easy to estimate a certain
> FDW driver mixes up early and late locking policy within a same remote join
> query. Do you really want to support such a mysterious implementation?

Yeah, the reason for that is because GetForeignRowMarkType allows that.

> Or, do you expect all the FDW driver is enforced to return a joined tuple
> if remote join case?

No.  That wouldn't make sense if at least one component table involved 
in a foreign join uses the rowmark type other than ROW_MARK_COPY.

> It is different from my idea; it shall be an extra
> optimization option if FDW can fetch a joined tuple at once, but not always.
> So, if FDW driver does not support this optimal behavior, your driver can
> fetch two base tables then run local alternative join (or something other).

OK, so if we all agree that the joined-tuple optimization is just an 
option for the case where all the component tables use ROW_MARK_COPY, 
I'd propose to leave that for 9.6.

Best regards,
Etsuro Fujita




pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: PL/Pythonu - function ereport
Next
From: Etsuro Fujita
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual