Re: Custom Scan APIs (Re: Custom Plan node) - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: Custom Scan APIs (Re: Custom Plan node) |
Date | |
Msg-id | CADyhKSU6W-r+1j+cs5TOJ47tYYvonGGVaHRSg1KbV=nzuULqTA@mail.gmail.com Whole thread Raw |
In response to | Custom Scan APIs (Re: Custom Plan node) (Kohei KaiGai <kaigai@kaigai.gr.jp>) |
Responses |
Re: Custom Scan APIs (Re: Custom Plan node)
|
List | pgsql-hackers |
Hi, I tried to write up a wikipage to introduce how custom-scan works. https://wiki.postgresql.org/wiki/CustomScanAPI Any comments please. 2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>: > The attached patches provide a feature to implement custom scan node > that allows extension to replace a part of plan tree with its own code > instead of the built-in logic. > In addition to the previous proposition, it enables us to integrate custom > scan as a part of candidate paths to be chosen by optimizer. > Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers > a set of API stuff and a simple demonstration module that implement > regular table scan using inequality operator on ctid system column. > The second one (pgsql-v9.4-custom-scan-remote-join) enhances > postgres_fdw to support remote join capability. > > Below is an example to show how does custom-scan work. > > We usually run sequential scan even if clause has inequality operator > that references ctid system column. > > postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; > QUERY PLAN > -------------------------------------------------------- > Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43) > Filter: (ctid > '(10,0)'::tid) > (2 rows) > > An extension that performs as custom-scan provider suggests > an alternative path, and its cost was less than sequential scan, > thus optimizer choose it. > > postgres=# LOAD 'ctidscan'; > LOAD > postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; > QUERY PLAN > ---------------------------------------------------------------------- > Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43) > Filter: (ctid > '(10,0)'::tid) > (2 rows) > > Of course, more cost effective plan will win if exists. > > postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200; > QUERY PLAN > ------------------------------------------------------------------- > Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43) > Index Cond: (a = 200) > Filter: (ctid > '(10,0)'::tid) > (3 rows) > > One other worthwhile example is remote-join enhancement on the > postgres_fdw as follows. Both of ft1 and ft2 are foreign table being > managed by same foreign server. > > postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x > WHERE f_leak(b) AND y > like '%aaa%'; > QUERY PLAN > ------------------------------------------------------------------------------------------------------ > Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72) > Output: a, b, x, y > Filter: f_leak(b) > Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN > public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text)) > (4 rows) > > --------------------------- > How does it works > --------------------------- > This patch adds two hooks (for base and join relations) around allpaths.c > and joinpaths.c. It allows extensions to add alternative paths to handle > scanning on the base relation or join of two relations. > > Its callback routine can add CustomPath using add_path() to inform > optimizer this alternative scan path. Every custom-scan provider is > identified by its name being registered preliminary using the following > function. > > void register_custom_provider(const CustomProvider *provider); > > CustomProvider is a set of name string and function pointers of callbacks. > > Once CustomPath got chosen, create_scan_plan() construct a custom- > scan plan and calls back extension to initialize the node. > Rest of portions are similar to foreign scan, however, some of detailed > portions are different. For example, foreign scan is assumed to return > a tuple being formed according to table definition. On the other hand, > custom-scan does not have such assumption, so extension needs to > set tuple-descriptor on the scan tuple slot of ScanState structure by > itself. > > In case of join, custom-scan performs as like a regular scan but it > returns tuples being already joined on underlying relations. > The patched postgres_fdw utilizes a hook at joinpaths.c to run > remote join. > > ------------ > Issues > ------------ > I'm not 100% certain whether arguments of add_join_path_hook is > reasonable. I guess the first 7 arguments are minimum necessity. > The mergeclause_list and semifactors might be useful if someone > tries to implement its own mergejoin or semijoin. Also, I'm not > good at usage of path parameterization, but the last two arguments > are related to. Where is the best code to learn about its usage? > > +/* Hook for plugins to add custom join path, in addition to default ones */ > +typedef void (*add_join_path_hook_type)(PlannerInfo *root, > + RelOptInfo *joinrel, > + RelOptInfo *outerrel, > + RelOptInfo *innerrel, > + JoinType jointype, > + SpecialJoinInfo *sjinfo, > + List *restrictlist, > + List *mergeclause_list, > + SemiAntiJoinFactors *semifactors, > + Relids param_source_rels, > + Relids extra_lateral_rels); > +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; > > > When we replace a join by a custom scan, where is the best target > for Var node that referenced relations under the join? > Usually, Var->varno is given as rtindex of tables being joined, then, > it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references(). > It eventually determines the slot to be fetched on ExecEvalScalarVar(). > On the other hand, we want Var-node to reference scan-tuple-slot > neither outer-slot nor inner-slot, if we replaced a join. > I tried to add a new CUSTOM_VAR that references scan-tuple-slot. > Probably, it is a straightforward way to run remote join as like a scan, > but I'm not certain whether it is the best way. > > > I was concerned about FDW callback of postgres_fdw is designed to > take ForeignState argument. Because of this, remote join code did > not available to call these routines, even though most of custom-join > portions are similar. > So, I'd like to rework postgres_fdw first to put a common routine that > can be called from FDW portion and remote join portions. > However, I thought it makes reviewing hard due to the large scale of > changeset. So, I'd like to have a code reworking first. > > > ---------------- > Jobs to do > ---------------- > * SGML documentation like fdwhandler.sgml is still under construction. > * Probably, a wikipage may help people to understand it well. > * Postgres_fdw needs reworking to share common code for both of > FDW and remote join portions. > > Thanks, > > 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> 2013/10/3 Robert Haas <robertmhaas@gmail.com>: >>> Well, there were a lot of problems with your demonstration, which have >>> already been pointed out upthread. I'm skeptical about the idea of >>> simply replacing planner nodes wholesale, and Tom is outright opposed. >>> I think you'll do better to focus on a narrower case - I'd suggest >>> custom scan nodes - and leave the rest as a project for another time. >>> >> Thanks, it makes me clear what we should target on v9.4 development. >> Towards the next commitfest, I'm planning to develop the following >> features: >> * CustomScan node that can run custom code instead of built-in >> scan nodes. >> * Join-pushdown of postgres_fdw using the hook to be located on >> the add_paths_to_joinrel(), for demonstration purpose. >> * Something new way to scan a relation; probably, your suggested >> ctid scan with less or bigger qualifier is a good example, also for >> demonstration purpose. >> >> Probably, above set of jobs will be the first chunk of this feature. >> Then, let's do other stuff like Append, Sort, Aggregate and so on >> later. It seems to me a reasonable strategy. >> > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
pgsql-hackers by date: