Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs) - Mailing list pgsql-hackers

From Shigeru Hanada
Subject Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date
Msg-id 36D0D5FC-5C68-427A-A26A-1B12A004BE8E@gmail.com
Whole thread Raw
In response to Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
List pgsql-hackers
Thank for your comments.

2015-05-21 23:11 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> On Sat, May 16, 2015 at 9:04 AM, Shigeru Hanada
> <shigeru.hanada@gmail.com> wrote:
>> d) All relations must have the same effective user id
>> This check is done with userid stored in PgFdwRelationInfo, which is
>> valid only when underlying relations have the same effective user id.
>> Here "effective user id" means id of the user executing the query, or
>> the owner of the view when the foreign table is accessed via view.
>> Tom suggested that it is necessary to check that user mapping matches
>> for security reason, and now I think it's better than checking
>> effective user as current postgres_fdw does.
>
> So, should this be a separate patch?

Yes, it should be an additional patch for Custom/Foreign join API which is already committed.
The patch would contain these changes:
* add new field usermappingid to RelOptInfo, it is InvalidOid for non-foreign tables
* obtain oid of user mapping for a foreign table, and store it in the RelOptInfo (we already have serverid in
RelOptInfo,so userid is enough to identify user mapping though) 
* propagate usermappingid up to the join relation when outer and inner relations have same valid value
* check matching of user mapping before calling GetForeignJoinPaths, rather than serverid

> One of my concerns about this patch is that it's got a lot of stuff in
> it that isn't obviously related to the patch.  Anything that is a
> separate change should be separated out into its own patch.  Perhaps
> you can post a set of patches that apply one on top of the next, with
> the changes for each one clearly separated.

IIUC, each patch should not break compilation, and should contain only one complete logical change which can't be
separatedinto pieces.  I think whole of the patch is necessary to implement  

>
>> e) Each source relation must not have any local filter
>> Evaluating conditions of join source talbe potentially produces
>> different result in OUTER join cases.  This can be relaxed for the
>> cases when the join is INNER and local filters don't contain any
>> volatile function/operator, but it is left as future enhancement.
>
> I think this restriction is a sign that you're not really doing this
> right. Consider:
>
> (1) SELECT * FROM a LEFT JOIN b ON a.x = b.x AND b.x = 3;
> (2) SELECT * FROM a LEFT JOIN b ON a.x = b.x WHERE b.x = 3;
>
> If you push down the scan of b, you can include the b.x = 3 qual in
> case (1) but not in case (2).  If you push down the join, you can
> include the qual in either case, but you must attach it in the same
> place where it was before.
>
>> One big change about deparsing base relation is aliasing.  This patch
>> adds column alias to SELECT clause even original query is a simple
>> single table SELECT.
>>
>> fdw=# EXPLAIN (VERBOSE, COSTS false) SELECT * FROM pgbench_branches b;
>>                                     QUERY PLAN
>> ------------------------------------------------------------------------------------
>> Foreign Scan on public.pgbench_branches b
>>   Output: bid, bbalance, filler
>>   Remote SQL: SELECT bid a9, bbalance a10, filler a11 FROM
>> public.pgbench_branches
>> (3 rows)
>>
>> As you see, every column has alias in format "a%d" with index value
>> derived from pg_attribute.attnum.  Index value is attnum + 8, and the
>> magic number "8" comes from FirstLowInvalidHeapAttributeNumber for the
>> adjustment that makes attribute number of system attributes positive.
>
> Yeah.  I'm not sure this is a good idea.  The column labels are
> pointless at the outermost level.
>
> I'm not sure it isn't a good idea, either, but I have some doubts.

I fixed the patch to not add column alias to remote queries for single table.  This change also reduces amount of
differencesfrom master branch slightly. 


>
>> One thing tricky is "peusdo projection" which is done by
>> deparseProjectionSql for whole-row reference.  This is done by put the
>> query string in FROM subquery and add whole-row reference in outer
>> SELECT clause.  This is done by ExecProjection in 9.4 and older, but
>> we need to do this in postgres_fdw because ExecProjection is not
>> called any more.
>
> What commit changed this?

No commit changed this behavior, as Kaigai-san says.  If you still have comments, please refer my response to
Kaigai-san.

>
> Thanks for your work on this.  Although I know progress has been slow,
> I think this work is really important to the project.

I agree.  I’ll take more time for this work.

--
Shigeru HANADA



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Add a hint when postgresql.conf hot_standby = 'off' and recovery.conf standby = 'on'
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan