Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs) - Mailing list pgsql-hackers

From Shigeru Hanada
Subject Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date
Msg-id CAEZqfEczvBEsnvrwi8dPD1AaP6-3Oa3KoHii5e5MNQ=Njc84OA@mail.gmail.com
Whole thread Raw
In response to Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)  (Shigeru HANADA <shigeru.hanada@gmail.com>)
Responses Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
List pgsql-hackers
Attached is the v15 patch of foreign join support for postgres_fdw.

This patch is based on current master, and having being removed some
hunks which are not essential.

And I wrote description of changes done by the patch.  It is little
bit long but I hope it would help understanding what the patch does.



The total LOC of the patch is 3.7k, 1.8k for code and 2.0k for
regression tests.  This is not a small patch, as Robert says, so I'd
like to summarize changed done by this patch and explain why they are
necessary.

Outline of join push-down support for postgres_fdw
==================================================

This patch provides new capability to join between foriegn tables
managed by same foreign server on remote side, by constructing a
remote query containing join clause, and executing it as source of a
pseudo foreign scan.  This patch is based on Custom/Foreign join patch
written by Kohei KaiGai.

PostgreSQL's planning for a query containing join is done with these steps:

1. generate possible scan paths for each base relations
2. generate join paths with bottom-up approach
3. generate plan nodes required for the cheapest path
4. execute the plan nodes to obtain result tuples

Generating path node
--------------------
As of now, postgres_fdw generates a ForeignPath which represents a
result of a join for each RelOptInfo, and planner can determine which
path is cheapest from its cost values.

GetForeignJoinPaths is called once for each join combination, i.e. A
JOIN B and B JOIN A are considered separately.  So GetForeignJoinPath
should return immediately to skip its job when the call is the
reversed combination of already considered one.  For this purpose, I
added update_safe flag to PgFdwRelationInfo.  This flag is always set
for simple foriegn scans, but for join relation it is set only when
the join can be pushed down.  The reason of adding this flag is that
checking RelOptInfo#fdw_private is MULL can't prevent useless
processing for a join combination which is reversed one of already
considered join which can't be pushed down.

postgres_fdw's GetForeignJoinPaths() does various checks, to ensure
that the result has same semantics as local joins.  Now postgres_fdw
have these criteria:

a) join type must be one of INNER/LEFT OUTER/RIGHT OUTER/FULL OUTER join
This check is done with given jointype argument.  IOW, CROSS joins and
SEMI/ANTI joins are not pushed down.  This is because 1) CROSS joins
would produe more result than separeted join sources, and 2) ANTI/SEMI
joins need to be deparsed as sub-query and it seems to take some more
time to implement.
b) Both outer and inner must have RelOptInfo#fdw_private
Having fdw_private means that the RelOptInfo is safe to push down, so
having no fdw_private means that portion is not safe to push down and
thus the whole join is not safe to push down.
c) All relations in the join must belong to the same server
This check is done with serverid stored in RelOptInfo#fdw_private as
PgFdwRelationInfo.  Joining relations belong to different servers is
not leagal.  Even they finally have completely same connection
information, they should accessed via different libpq sessions.
Responsibility of checking server matching is under discussion in the
Custom/Foreign join thread, and I'd vote for checking it in core.  If
it is decided, I remove this criterion soon.
d) All relations must have the same effective user id
This check is done with userid stored in PgFdwRelationInfo, which is
valid only when underlying relations have the same effective user id.
Here "effective user id" means id of the user executing the query, or
the owner of the view when the foreign table is accessed via view.
Tom suggested that it is necessary to check that user mapping matches
for security reason, and now I think it's better than checking
effective user as current postgres_fdw does.
e) Each source relation must not have any local filter
Evaluating conditions of join source talbe potentially produces
different result in OUTER join cases.  This can be relaxed for the
cases when the join is INNER and local filters don't contain any
volatile function/operator, but it is left as future enhancement.
f) All join conditions of non-inner join must be safe to push down
This is similar to e).

A join which passes all criteria above is safe to push-down, so
postgres_fdw create a ForeignPath for the join and add it to
RelOptInfo.  Currently postgres_fdw doesn't set pathkeys (ordering
information) nor require_outer (information for parameterized path).

PgFdwRelationInfo is used to store various planning information
specific to postgres_fdw.  To support join push-down, I added some
fields which are necessary to deparse join query recursively in
deparseSelectSql.

- outer RelOptInfo, to generate source relatoin as subquery
- inner RelOptInfo, to generate source relation as subquery
- jointype, to deparse JOIN keyword string (e.g. INNER JOIN)
- joinclauses, to deprase ON part of JOIN clause
- otherclauses, to deparse WHERE clause of join query

I also moved it from postgres_fdw.c to postgres_fdw.h, because
deparse.c needs to refer the definition during deparsing query.

Generating plan node
--------------------
Once planner find the cheapest path, it generates plan tree from the
path tree.  During the steps, planner calls GetForeignPlan for the
ForeignPath in the top of a join tree.  IOW, GetForeignPlan is not
called for underlying joins and scans, so postgres_fdw needs a way to
do its task (mainly generating query string) recursively.

GetForeignPlan generates remote query and store it in ForeignScan node
to be returned.  The construction of remote query is done by calling
deparseSelectSql for given RelOptInfo.  This function was modified to
accept both base relations and join relations to support join
push-down.  Main part of generating join query is implemented in
deparseJoinSql, which is a newly added function, and deparseSelectSql
calls it if given relation was a join relation.

One big change about deparsing base relation is aliasing.  This patch
adds column alias to SELECT clause even original query is a simple
single table SELECT.

fdw=# EXPLAIN (VERBOSE, COSTS false) SELECT * FROM pgbench_branches b;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Foreign Scan on public.pgbench_branches b
   Output: bid, bbalance, filler
   Remote SQL: SELECT bid a9, bbalance a10, filler a11 FROM
public.pgbench_branches
(3 rows)

As you see, every column has alias in format "a%d" with index value
derived from pg_attribute.attnum.  Index value is attnum + 8, and the
magic number "8" comes from FirstLowInvalidHeapAttributeNumber for the
adjustment that makes attribute number of system attributes positive.
To make the code more readable, I introduced local macro
GET_RELATIVE_ATTNO(), but I'm not sure that this name is very nice.
This column alias system allows to refer a particular column pointed
by Var node in upper join level without consideration about column
position in SELECT clause.  To achieve this, deparseVar is expanded to
handle variables used in join query, and deparseJoinVar is added to
deparse column reference with table alias "l" or "r" for outer or
inner respectively.

As mentioned at the beginning of this section, GetForeignPlan is
called only for the top node of the join tree, so we need to do
something recursively.  So postgres_fdw has information about join
tree structure in PgFdwRelationInfo and pass it via
RelOptInfo#fdw_private.  This link works down to the base relation at
the leaf of the join tree.

When deparseSelectSql is called for a join relation, it calls
deparseSelectSql for each outer and inner RelOptInfo to generate query
strings, and pass them to deparseJoinSql as parts for FROM clause
subquery.  For example of two way join:

[table definition]
CREATE FOREIGN TABLE table1 (
id int NOT NULL,
name text,
...
) SERVER server;
CREATE FOREIGN TABLE table2 (
id int NOT NULL,
name text,
...
) SERVER server;

[original query]
SELECT t1.name, t2.name FROM table1 t1 INNER JOIN table2 t2 ON t1.id = t2.id;

[remote query]
SELECT l.a2, r.a2
 FROM (SELECT id, name FROM table1) l (a1, a2)
  INNER JOIN
  (SELECT id, name FROM table2) r (a1, a2)
  ON (l.a1 = r.a1)
;

During deparsing join query, deparseJoinSql maintains fdw_scan_tlist
list to hold TargetEntry for columns used in SELECT clause of every
query.

One thing tricky is "peusdo projection" which is done by
deparseProjectionSql for whole-row reference.  This is done by put the
query string in FROM subquery and add whole-row reference in outer
SELECT clause.  This is done by ExecProjection in 9.4 and older, but
we need to do this in postgres_fdw because ExecProjection is not
called any more.

For various conditions, such as join conditions in JOIN clauses and
filter conditions in WHERE clauses, appendCondition is used to deparse
condition strings.  This was expanded version of appendWhereClause.
Note that appendConditions accepts list of RestrictInfo or list of
Expr as source, and downcasts them properly.  As of 9.4
appendWhereClause accepted only RestrictInfo, but join conditions are
Expr, so I made it little flexible.

Finally deparseJoinSql construct whole query by putting parts into the
right places.  Note that column aliases are not written in SELECT
clause but in FROM clause, after table alias.  This simpifies SELECT
clause construction simpler.

debug stuffs
------------
bms_to_str is a function which prints contents of a bitmapset in
human-readable format.  I added this for debug purpose, but IMO it's
ok to have such function as public bitmapset API.

2015-05-03 10:51 GMT+09:00 Shigeru HANADA <shigeru.hanada@gmail.com>:
> Thanks for the comments.
>
> 2015/05/01 22:35、Robert Haas <robertmhaas@gmail.com> のメール:
>> On Mon, Apr 27, 2015 at 5:07 AM, Shigeru Hanada
>> <shigeru.hanada@gmail.com> wrote:
>>> 2015-04-27 11:00 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
>>>> Hanada-san, could you adjust your postgres_fdw patch according to
>>>> the above new (previous?) definition.
>>>
>>> The attached v14 patch is the revised version for your v13 patch.  It also contains changed for Ashutosh’s
comments.
>>
>> We should probably move this discussion to a new thread now that the
>> other patch is committed.  Changing subject line accordingly.
>>
>> Generally, there's an awful lot of changes in this patch - it is over
>> 2000 insertions and more than 450 deletions - and it's not awfully
>> obvious why all of those changes are there.  I think this patch needs
>> a detailed README to accompany it explaining what the various changes
>> in the patch are and why those things got changed; or maybe there is a
>> way to break it up into multiple patches so that we can take a more
>> incremental approach.  I am really suspicious of the amount of
>> wholesale reorganization of code that this patch is doing.  It's
>> really hard to validate that a reorganization like that is necessary,
>> or that it's correct, and it's gonna make back-patching noticeably
>> harder in the future.  If we really need this much code churn it needs
>> careful justification; if we don't, we shouldn't do it.
>>
>
> I agree.  I’ll write detailed description for the patch and repost the new one with rebasing onto current HEAD.  I’m
sorrybut it will take a day or so... 
>
>> +SET enable_mergejoin = off; -- planner choose MergeJoin even it has
>> higher costs, so disable it for testing.
>>
>> This seems awfully strange.  Why would the planner choose a plan if it
>> had a higher cost?
>
> I thought so but I couldn’t reproduce such situation now.  I’ll investigate it more.  If the issue has gone, I’ll
removethat SET statement for straightforward tests. 
>
>
>>
>> -        * If the table or the server is configured to use remote estimates,
>> -        * identify which user to do remote access as during planning.  This
>> +        * Identify which user to do remote access as during planning.  This
>>         * should match what ExecCheckRTEPerms() does.  If we fail due
>> to lack of
>>         * permissions, the query would have failed at runtime anyway.
>>         */
>> -       if (fpinfo->use_remote_estimate)
>> -       {
>> -               RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
>> -               Oid                     userid = rte->checkAsUser ?
>> rte->checkAsUser : GetUserId();
>> -
>> -               fpinfo->user = GetUserMapping(userid, fpinfo->server->serverid);
>> -       }
>> -       else
>> -               fpinfo->user = NULL;
>> +       rte = planner_rt_fetch(baserel->relid, root);
>> +       fpinfo->userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
>>
>> So, wait a minute, remote estimates aren't optional any more?
>
> No, it seems to be removed accidentally.  I’ll check the reason again though, but I’ll revert the change unless I
findany problem. 
>
> --
> Shigeru HANADA
> shigeru.hanada@gmail.com
>
>
>
>



--
Shigeru HANADA

Attachment

pgsql-hackers by date:

Previous
From: Sawada Masahiko
Date:
Subject: Re: Support for N synchronous standby servers - take 2
Next
From: Michael Paquier
Date:
Subject: Re: Support for N synchronous standby servers - take 2