Re: Custom Scan APIs (Re: Custom Plan node) - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: Custom Scan APIs (Re: Custom Plan node) |
Date | |
Msg-id | CADyhKSUrZzHBkgeGm=xzOxUwebxqPr1p9DNCr03DnrhZJq2omw@mail.gmail.com Whole thread Raw |
In response to | Re: Custom Scan APIs (Re: Custom Plan node) (Kohei KaiGai <kaigai@kaigai.gr.jp>) |
Responses |
Re: Custom Scan APIs (Re: Custom Plan node)
|
List | pgsql-hackers |
The attached patches are the revised custom-scan APIs. - Custom-scan.sgml was added to introduce the way to write custom-scan provider in the official documentation. - Much code duplication in postgres_fdw.c was eliminated. I split some fdw- handlers into two parts; common portion and fdw specific one. Executor callbacks of custom-scan code utilizes the common portion above because most of its implementations are equivalent. I'd like to see comments regarding to the way to handle Var reference onto a custom-scan that replaced relations join. A varno of Var that references a join relation is rtindex of either right or left relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be set instead. However, it does not work well if a custom-scan that just references result of remote join query was chosen instead of local join, because its result shall be usually set in the ps_ResultTupleSlot of PlanState, thus ExecEvalScalarVar does not reference neither inner nor outer slot. Instead of existing solution, I added one more special varno; CUSTOM_VARNO that just references result-tuple-slot of the target relation. If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from the TupleDesc of underlying ps_ResultTupleSlot. I'm not 100% certain whether it is the best approach for us, but it works well. Also, I'm uncertain for usage of param_info in Path structure, even though I followed the manner in other portion. So, please point out if my usage was not applicable well. Thanks, 2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>: > Hi, > > I tried to write up a wikipage to introduce how custom-scan works. > > https://wiki.postgresql.org/wiki/CustomScanAPI > > Any comments please. > > 2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> The attached patches provide a feature to implement custom scan node >> that allows extension to replace a part of plan tree with its own code >> instead of the built-in logic. >> In addition to the previous proposition, it enables us to integrate custom >> scan as a part of candidate paths to be chosen by optimizer. >> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers >> a set of API stuff and a simple demonstration module that implement >> regular table scan using inequality operator on ctid system column. >> The second one (pgsql-v9.4-custom-scan-remote-join) enhances >> postgres_fdw to support remote join capability. >> >> Below is an example to show how does custom-scan work. >> >> We usually run sequential scan even if clause has inequality operator >> that references ctid system column. >> >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; >> QUERY PLAN >> -------------------------------------------------------- >> Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43) >> Filter: (ctid > '(10,0)'::tid) >> (2 rows) >> >> An extension that performs as custom-scan provider suggests >> an alternative path, and its cost was less than sequential scan, >> thus optimizer choose it. >> >> postgres=# LOAD 'ctidscan'; >> LOAD >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; >> QUERY PLAN >> ---------------------------------------------------------------------- >> Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43) >> Filter: (ctid > '(10,0)'::tid) >> (2 rows) >> >> Of course, more cost effective plan will win if exists. >> >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200; >> QUERY PLAN >> ------------------------------------------------------------------- >> Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43) >> Index Cond: (a = 200) >> Filter: (ctid > '(10,0)'::tid) >> (3 rows) >> >> One other worthwhile example is remote-join enhancement on the >> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being >> managed by same foreign server. >> >> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x >> WHERE f_leak(b) AND y >> like '%aaa%'; >> QUERY PLAN >> ------------------------------------------------------------------------------------------------------ >> Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72) >> Output: a, b, x, y >> Filter: f_leak(b) >> Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN >> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text)) >> (4 rows) >> >> --------------------------- >> How does it works >> --------------------------- >> This patch adds two hooks (for base and join relations) around allpaths.c >> and joinpaths.c. It allows extensions to add alternative paths to handle >> scanning on the base relation or join of two relations. >> >> Its callback routine can add CustomPath using add_path() to inform >> optimizer this alternative scan path. Every custom-scan provider is >> identified by its name being registered preliminary using the following >> function. >> >> void register_custom_provider(const CustomProvider *provider); >> >> CustomProvider is a set of name string and function pointers of callbacks. >> >> Once CustomPath got chosen, create_scan_plan() construct a custom- >> scan plan and calls back extension to initialize the node. >> Rest of portions are similar to foreign scan, however, some of detailed >> portions are different. For example, foreign scan is assumed to return >> a tuple being formed according to table definition. On the other hand, >> custom-scan does not have such assumption, so extension needs to >> set tuple-descriptor on the scan tuple slot of ScanState structure by >> itself. >> >> In case of join, custom-scan performs as like a regular scan but it >> returns tuples being already joined on underlying relations. >> The patched postgres_fdw utilizes a hook at joinpaths.c to run >> remote join. >> >> ------------ >> Issues >> ------------ >> I'm not 100% certain whether arguments of add_join_path_hook is >> reasonable. I guess the first 7 arguments are minimum necessity. >> The mergeclause_list and semifactors might be useful if someone >> tries to implement its own mergejoin or semijoin. Also, I'm not >> good at usage of path parameterization, but the last two arguments >> are related to. Where is the best code to learn about its usage? >> >> +/* Hook for plugins to add custom join path, in addition to default ones */ >> +typedef void (*add_join_path_hook_type)(PlannerInfo *root, >> + RelOptInfo *joinrel, >> + RelOptInfo *outerrel, >> + RelOptInfo *innerrel, >> + JoinType jointype, >> + SpecialJoinInfo *sjinfo, >> + List *restrictlist, >> + List *mergeclause_list, >> + SemiAntiJoinFactors *semifactors, >> + Relids param_source_rels, >> + Relids extra_lateral_rels); >> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; >> >> >> When we replace a join by a custom scan, where is the best target >> for Var node that referenced relations under the join? >> Usually, Var->varno is given as rtindex of tables being joined, then, >> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references(). >> It eventually determines the slot to be fetched on ExecEvalScalarVar(). >> On the other hand, we want Var-node to reference scan-tuple-slot >> neither outer-slot nor inner-slot, if we replaced a join. >> I tried to add a new CUSTOM_VAR that references scan-tuple-slot. >> Probably, it is a straightforward way to run remote join as like a scan, >> but I'm not certain whether it is the best way. >> >> >> I was concerned about FDW callback of postgres_fdw is designed to >> take ForeignState argument. Because of this, remote join code did >> not available to call these routines, even though most of custom-join >> portions are similar. >> So, I'd like to rework postgres_fdw first to put a common routine that >> can be called from FDW portion and remote join portions. >> However, I thought it makes reviewing hard due to the large scale of >> changeset. So, I'd like to have a code reworking first. >> >> >> ---------------- >> Jobs to do >> ---------------- >> * SGML documentation like fdwhandler.sgml is still under construction. >> * Probably, a wikipage may help people to understand it well. >> * Postgres_fdw needs reworking to share common code for both of >> FDW and remote join portions. >> >> Thanks, >> >> 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>: >>> 2013/10/3 Robert Haas <robertmhaas@gmail.com>: >>>> Well, there were a lot of problems with your demonstration, which have >>>> already been pointed out upthread. I'm skeptical about the idea of >>>> simply replacing planner nodes wholesale, and Tom is outright opposed. >>>> I think you'll do better to focus on a narrower case - I'd suggest >>>> custom scan nodes - and leave the rest as a project for another time. >>>> >>> Thanks, it makes me clear what we should target on v9.4 development. >>> Towards the next commitfest, I'm planning to develop the following >>> features: >>> * CustomScan node that can run custom code instead of built-in >>> scan nodes. >>> * Join-pushdown of postgres_fdw using the hook to be located on >>> the add_paths_to_joinrel(), for demonstration purpose. >>> * Something new way to scan a relation; probably, your suggested >>> ctid scan with less or bigger qualifier is a good example, also for >>> demonstration purpose. >>> >>> Probably, above set of jobs will be the first chunk of this feature. >>> Then, let's do other stuff like Append, Sort, Aggregate and so on >>> later. It seems to me a reasonable strategy. >>> >> >> -- >> KaiGai Kohei <kaigai@kaigai.gr.jp> > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
pgsql-hackers by date: