Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8010C5772@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
|
List | pgsql-hackers |
> On Wed, Mar 18, 2015 at 9:33 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> > So, overall consensus for the FDW hook location is just before the set_chepest() > >> > at standard_join_search() and merge_clump(), isn't it? > >> > >> Yes, I think so. > >> > >> > Let me make a design of FDW hook to reduce code duplications for each FDW driver, > >> > especially, to identify baserel/joinrel to be involved in this join. > >> > >> Great, thanks! > >> > >> One issue, which I think Ashutosh alluded to upthread, is that we need > >> to make sure it's not unreasonably difficult for foreign data wrappers > >> to construct the FROM clause of an SQL query to be pushed down to the > >> remote side. It should be simple when there are only inner joins > >> involved, but when there are all outer joins it might be a bit > >> complex. It would be very good if someone could try to write that > >> code, based on the new hook locations, and see how it turns out, so > >> that we can figure out how to address any issues that may crop up > >> there. > >> > > Here is an idea that provides a common utility function that break down > > the supplied RelOptInfo of joinrel into a pair of join-type and a list of > > baserel/joinrel being involved in the relations join. It intends to be > > called by FDW driver to list up underlying relations. > > IIUC, root->join_info_list will provide information of how relations are > > combined to the upper joined relations, thus, I expect it is not > > unreasonably complicated way to solve. > > Once a RelOptInfo of the target joinrel is broken down into multiple sub- > > relations (N>=2 if all inner join, elsewhere N=2), FDW driver can > > reference the RestrictInfo to be used in relations join. > > > > Anyway, I'll try to investigate the existing code for more detail today, > > to clarify whether the above approach is feasible. > > Sounds good. Keep in mind that, while the parse tree will obviously > reflect the way that the user actually specified the join > syntactically, it's not the job of the join_info_list to make it > simple to reconstruct that information. To the contrary, > join_info_list is supposed to be structured in a way that makes it > easy to determine whether *a particular join order is one of the legal > join orders* not *whether it is the specific join order selected by > the user*. See join_is_legal(). > > For FDW pushdown, I think it's sufficient to be able to identify *any > one* legal join order, not necessarily the same order the user > originally entered. For exampe, if the user entered A LEFT JOIN B ON > A.x = B.x LEFT JOIN C ON A.y = C.y and the FDW generates a query that > instead does A LEFT JOIN C ON a.y = C.y LEFT JOIN B ON A.x = B.x, I > suspect that's just fine. Particular FDWs might wish to try to be > smart about what they emit based on knowledge of what the remote > side's optimizer is likely to do, and that's fine. If the remote side > is PostgreSQL, it shouldn't matter much. > Sorry for my response late. It was not easy to code during business trip. The attached patch adds a hook for FDW/CSP to replace entire join-subtree by a foreign/custom-scan, according to the discussion upthread. GetForeignJoinPaths handler of FDW is simplified as follows: typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root, RelOptInfo *joinrel); It takes PlannerInfo and RelOptInfo of the join-relation to be replaced if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be able to know the relations to be involved and construct a remote join query. However, it is not obvious with RelOptInfo to know how relations are joined. The function below will help implement FDW driver that support remote join. List * get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel, SpecialJoinInfo **p_sjinfo) It returns a list of RelOptInfo to be involved in the relations join that is represented with 'joinrel', and also set a SpecialJoinInfo on the third argument if not inner join. In case of inner join, it returns multiple (more than or equal to 2) relations to be inner-joined. Elsewhere, it returns two relations and a valid SpecialJoinInfo. The #if 0 ... #endif block is just for confirmation purpose to show how hook is invoked and the joinrel is broken down with above service routine. postgres=# select * from t0 left join t1 on t1.aid=bid left join t2 on t1.aid=cid left join t3 on t1.aid=did left join t4 on t1.aid=eid; INFO: LEFT JOIN: t0, t1 INFO: LEFT JOIN: (t0, t1), t2 INFO: LEFT JOIN: (t0, t1), t3 INFO: LEFT JOIN: (t0, t1), t4 INFO: LEFT JOIN: (t0, t1, t3), t2 INFO: LEFT JOIN: (t0, t1, t4), t2 INFO: LEFT JOIN: (t0, t1, t4), t3 INFO: LEFT JOIN: (t0, t1, t3, t4), t2 In this case, joinrel is broken down into (t0, t1, t3, t4) and t2. The earlier one is also joinrel, so it expects FDW driver will make the get_joinrel_broken_down() call recurdively. postgres=# explain select * from t0 natural join t1 natural join t2 natural join t3 natural join t4; INFO: INNER JOIN: t0, t1 INFO: INNER JOIN: t0, t2 INFO: INNER JOIN: t0, t3 INFO: INNER JOIN: t0, t4 INFO: INNER JOIN: t0, t1, t2 INFO: INNER JOIN: t0, t1, t3 INFO: INNER JOIN: t0, t1, t4 INFO: INNER JOIN: t0, t2, t3 INFO: INNER JOIN: t0, t2, t4 INFO: INNER JOIN: t0, t3, t4 INFO: INNER JOIN: t0, t1, t2, t3 INFO: INNER JOIN: t0, t1, t2, t4 INFO: INNER JOIN: t0, t1, t3, t4 INFO: INNER JOIN: t0, t2, t3, t4 INFO: INNER JOIN: t0, t1, t2, t3, t4 In this case, joinrel is consist of inner join, so get_joinrel_broken_down() returns a list that contains RelOptInfo of 6 base relations. postgres=# explain select * from t0 natural join t1 left join t2 on t1.aid=t2.bid natural join t3 natural join t4; INFO: INNER JOIN: t0, t1 INFO: INNER JOIN: t0, t3 INFO: INNER JOIN: t0, t4 INFO: LEFT JOIN: t1, t2 INFO: INNER JOIN: (t1, t2), t0 INFO: INNER JOIN: t0, t1, t3 INFO: INNER JOIN: t0, t1, t4 INFO: INNER JOIN: t0, t3, t4 INFO: INNER JOIN: (t1, t2), t0, t3 INFO: INNER JOIN: (t1, t2), t0, t4 INFO: INNER JOIN: t0, t1, t3, t4 INFO: INNER JOIN: (t1, t2), t0, t3, t4 In mixture case, it keeps restriction of join legality (t1 and t2 must be left joined) during its broken down. At this moment, I'm not 100% certain about its logic. Especially, I didn't test SEMI- and ANTI- join cases yet. However, time is money - I want people to check overall design first, rather than detailed debugging. Please tell me if I misunderstood the logic to break down join relations. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
pgsql-hackers by date: