Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
Date
Msg-id 9051.1441216055@sss.pgh.pa.us
Whole thread Raw
In response to Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Sep 2, 2015 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But if you have in mind that typical FDWs would actually create join paths
>> at that point, consider that
>> 
>> 1. The FDW would have to find all the combinations of its supplied
>> relations (unless you are only intending to generate one path for the
>> union of all such rels, which seems pretty narrow-minded from here).

> Well, if the remote end is another database server, presumably we can
> leave it to optimize the query, so why would we need more than one
> path?

If you have say 5 relations in the query, 3 of which are foreign, it might
make sense to join all 3 at the remote end, or maybe you should only join
2 of them remotely because it's better to then join to one of the local
rels before joining the last remote rel.  Even if you claim that that
would never make sense from a cost standpoint (a claim easily seen to be
silly), there might not be any legal way to join all 3 directly because of
join order constraints.

The larger point is that we can't expect the remote server to be fully
responsible for optimizing, because it will know nothing of what's being
done on our end.

> I can see that we need more than one path because of sort-order
> considerations, which would affect the query we ship to the remote
> side.  But I don't see the point of considering multiple join orders
> unless the remote end is dumber than our optimizer, which might be
> true in some cases, but not if the remote end is PostgreSQL.

(1) not all remote ends are Postgres, (2) the remote end doesn't have any
access to info about our end.

> So, the problem is that I don't think this entirely skirts the
> join_is_legal issues, which are a principal point of concern for me.
> Say this is a joinrel between (A B) and (C D E).  We need to generate
> an SQL query for (A B C D E).  We know that the outermost syntactic
> join can be (A B) to (C D E).  But how do we know which join orders
> are legal as among (C D E)?  Maybe there's a simple way to handle this
> that I'm not seeing.

Well, if the joins get built up in the way I think should happen, we'd
have already considered (C D E), and we could have recorded the legal join
orders within that at the time.  (I imagine that we should allow FDWs to
store some data within RelOptInfo structs that represent foreign joins
belonging entirely to them, so that there'd be a handy place to keep that
data till later.)  Or we could trawl through the paths associated with the
child joinrel, which will presumably include instances of every reasonable
sub-join combination.  Or the FDW could look at the SpecialJoinInfo data
and determine things for itself (or more likely, ask join_is_legal about
that).

In the case of postgres_fdw, I think the actual requirement will be to be
able to reconstruct a SQL query that correctly expresses the join; that
is, we need to send over something like "from c left join d on (...) full
join e on (...)", not just "from c, d, e", or we'll get totally bogus
estimates as well as bogus execution results.  Offhand I think that the
most likely way to build that text will be to examine the query's jointree
to see where c,d,e appear in it.  But in any case, that's a separate issue
and I fail to see how plopping the join search problem into the FDW's lap
would make it any easier.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Improving test coverage of extensions with pg_dump
Next
From: Josh Berkus
Date:
Subject: Re: Horizontal scalability/sharding