Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual) |
Date | |
Msg-id | 9051.1441216055@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual) |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Sep 2, 2015 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But if you have in mind that typical FDWs would actually create join paths >> at that point, consider that >> >> 1. The FDW would have to find all the combinations of its supplied >> relations (unless you are only intending to generate one path for the >> union of all such rels, which seems pretty narrow-minded from here). > Well, if the remote end is another database server, presumably we can > leave it to optimize the query, so why would we need more than one > path? If you have say 5 relations in the query, 3 of which are foreign, it might make sense to join all 3 at the remote end, or maybe you should only join 2 of them remotely because it's better to then join to one of the local rels before joining the last remote rel. Even if you claim that that would never make sense from a cost standpoint (a claim easily seen to be silly), there might not be any legal way to join all 3 directly because of join order constraints. The larger point is that we can't expect the remote server to be fully responsible for optimizing, because it will know nothing of what's being done on our end. > I can see that we need more than one path because of sort-order > considerations, which would affect the query we ship to the remote > side. But I don't see the point of considering multiple join orders > unless the remote end is dumber than our optimizer, which might be > true in some cases, but not if the remote end is PostgreSQL. (1) not all remote ends are Postgres, (2) the remote end doesn't have any access to info about our end. > So, the problem is that I don't think this entirely skirts the > join_is_legal issues, which are a principal point of concern for me. > Say this is a joinrel between (A B) and (C D E). We need to generate > an SQL query for (A B C D E). We know that the outermost syntactic > join can be (A B) to (C D E). But how do we know which join orders > are legal as among (C D E)? Maybe there's a simple way to handle this > that I'm not seeing. Well, if the joins get built up in the way I think should happen, we'd have already considered (C D E), and we could have recorded the legal join orders within that at the time. (I imagine that we should allow FDWs to store some data within RelOptInfo structs that represent foreign joins belonging entirely to them, so that there'd be a handy place to keep that data till later.) Or we could trawl through the paths associated with the child joinrel, which will presumably include instances of every reasonable sub-join combination. Or the FDW could look at the SpecialJoinInfo data and determine things for itself (or more likely, ask join_is_legal about that). In the case of postgres_fdw, I think the actual requirement will be to be able to reconstruct a SQL query that correctly expresses the join; that is, we need to send over something like "from c left join d on (...) full join e on (...)", not just "from c, d, e", or we'll get totally bogus estimates as well as bogus execution results. Offhand I think that the most likely way to build that text will be to examine the query's jointree to see where c,d,e appear in it. But in any case, that's a separate issue and I fail to see how plopping the join search problem into the FDW's lap would make it any easier. regards, tom lane
pgsql-hackers by date: