Re: Custom Scan APIs (Re: Custom Plan node) - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: Custom Scan APIs (Re: Custom Plan node)
Date
Msg-id CADyhKSU6W-r+1j+cs5TOJ47tYYvonGGVaHRSg1KbV=nzuULqTA@mail.gmail.com
Whole thread Raw
In response to Custom Scan APIs (Re: Custom Plan node)  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
Responses Re: Custom Scan APIs (Re: Custom Plan node)  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
List pgsql-hackers
Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> The attached patches provide a feature to implement custom scan node
> that allows extension to replace a part of plan tree with its own code
> instead of the built-in logic.
> In addition to the previous proposition, it enables us to integrate custom
> scan as a part of candidate paths to be chosen by optimizer.
> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
> a set of API stuff and a simple demonstration module that implement
> regular table scan using inequality operator on ctid system column.
> The second one (pgsql-v9.4-custom-scan-remote-join) enhances
> postgres_fdw to support remote join capability.
>
> Below is an example to show how does custom-scan work.
>
> We usually run sequential scan even if clause has inequality operator
> that references ctid system column.
>
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>                        QUERY PLAN
> --------------------------------------------------------
>  Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=43)
>    Filter: (ctid > '(10,0)'::tid)
> (2 rows)
>
> An extension that performs as custom-scan provider suggests
> an alternative path, and its cost was less than sequential scan,
> thus optimizer choose it.
>
> postgres=# LOAD 'ctidscan';
> LOAD
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=43)
>    Filter: (ctid > '(10,0)'::tid)
> (2 rows)
>
> Of course, more cost effective plan will win if exists.
>
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=43)
>    Index Cond: (a = 200)
>    Filter: (ctid > '(10,0)'::tid)
> (3 rows)
>
> One other worthwhile example is remote-join enhancement on the
> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
> managed by same foreign server.
>
> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
>                                               WHERE f_leak(b) AND y
> like '%aaa%';
>                                                                QUERY PLAN
> ------------------------------------------------------------------------------------------------------
>  Custom Scan (postgres-fdw)  (cost=100.00..100.01 rows=0 width=72)
>    Output: a, b, x, y
>    Filter: f_leak(b)
>    Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
> (4 rows)
>
> ---------------------------
> How does it works
> ---------------------------
> This patch adds two hooks (for base and join relations) around allpaths.c
> and joinpaths.c. It allows extensions to add alternative paths to handle
> scanning on the base relation or join of two relations.
>
> Its callback routine can add CustomPath using add_path() to inform
> optimizer this alternative scan path. Every custom-scan provider is
> identified by its name being registered preliminary using the following
> function.
>
>   void register_custom_provider(const CustomProvider *provider);
>
> CustomProvider is a set of name string and function pointers of callbacks.
>
> Once CustomPath got chosen, create_scan_plan() construct a custom-
> scan plan and calls back extension to initialize the node.
> Rest of portions are similar to foreign scan, however, some of detailed
> portions are different. For example, foreign scan is assumed to return
> a tuple being formed according to table definition. On the other hand,
> custom-scan does not have such assumption, so extension needs to
> set tuple-descriptor on the scan tuple slot of ScanState structure by
> itself.
>
> In case of join, custom-scan performs as like a regular scan but it
> returns tuples being already joined on underlying relations.
> The patched postgres_fdw utilizes a hook at joinpaths.c to run
> remote join.
>
> ------------
> Issues
> ------------
> I'm not 100% certain whether arguments of add_join_path_hook is
> reasonable. I guess the first 7 arguments are minimum necessity.
> The mergeclause_list and semifactors might be useful if someone
> tries to implement its own mergejoin or semijoin. Also, I'm not
> good at usage of path parameterization, but the last two arguments
> are related to. Where is the best code to learn about its usage?
>
> +/* Hook for plugins to add custom join path, in addition to default ones */
> +typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> +                                       RelOptInfo *joinrel,
> +                                       RelOptInfo *outerrel,
> +                                       RelOptInfo *innerrel,
> +                                       JoinType jointype,
> +                                       SpecialJoinInfo *sjinfo,
> +                                       List *restrictlist,
> +                                       List *mergeclause_list,
> +                                       SemiAntiJoinFactors *semifactors,
> +                                       Relids param_source_rels,
> +                                       Relids extra_lateral_rels);
> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>
>
> When we replace a join by a custom scan, where is the best target
> for Var node that referenced relations under the join?
> Usually, Var->varno is given as rtindex of tables being joined, then,
> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
> It eventually determines the slot to be fetched on ExecEvalScalarVar().
> On the other hand, we want Var-node to reference scan-tuple-slot
> neither outer-slot nor inner-slot, if we replaced a join.
> I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
> Probably, it is a straightforward way to run remote join as like a scan,
> but I'm not certain whether it is the best way.
>
>
> I was concerned about FDW callback of postgres_fdw is designed to
> take ForeignState argument. Because of this, remote join code did
> not available to call these routines, even though most of custom-join
> portions are similar.
> So, I'd like to rework postgres_fdw first to put a common routine that
> can be called from FDW portion and remote join portions.
> However, I thought it makes reviewing hard due to the large scale of
> changeset. So, I'd like to have a code reworking first.
>
>
> ----------------
> Jobs to do
> ----------------
> * SGML documentation like fdwhandler.sgml is still under construction.
> * Probably, a wikipage may help people to understand it well.
> * Postgres_fdw needs reworking to share common code for both of
>   FDW and remote join portions.
>
> Thanks,
>
> 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> 2013/10/3 Robert Haas <robertmhaas@gmail.com>:
>>> Well, there were a lot of problems with your demonstration, which have
>>> already been pointed out upthread.  I'm skeptical about the idea of
>>> simply replacing planner nodes wholesale, and Tom is outright opposed.
>>>  I think you'll do better to focus on a narrower case - I'd suggest
>>> custom scan nodes - and leave the rest as a project for another time.
>>>
>> Thanks, it makes me clear what we should target on v9.4 development.
>> Towards the next commitfest, I'm planning to develop the following
>> features:
>> * CustomScan node that can run custom code instead of built-in
>>   scan nodes.
>> * Join-pushdown of postgres_fdw using the hook to be located on
>>   the add_paths_to_joinrel(), for demonstration purpose.
>> * Something new way to scan a relation; probably, your suggested
>>   ctid scan with less or bigger qualifier is a good example, also for
>>   demonstration purpose.
>>
>> Probably, above set of jobs will be the first chunk of this feature.
>> Then, let's do other stuff like Append, Sort, Aggregate and so on
>> later. It seems to me a reasonable strategy.
>>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>



pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: Exempting superuser from row-security isn't enough. Run predicates as DEFINER?
Next
From: Rohit Goyal
Date:
Subject: Fwd: Test of Algorithm || Indexing Scheme