Custom Plan node - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Custom Plan node
Date
Msg-id CADyhKSWaSpJy9v3R1K5t3fC9r04-yf6ta_driuLHuR-xgCvyng@mail.gmail.com
Whole thread Raw
Responses Re: Custom Plan node
List pgsql-hackers
Hi,

The attached patch adds a new plan node type; CustomPlan that enables
extensions to get control during query execution, via registered callbacks.
Right now, all the jobs of the executor are built-in, except for foreign scan,
thus we have no way to run self implemented code within extension, instead
of a particular plan-tree portion. It is painful for people who want
to implement
an edge feature on the executor, because all we can do is to replace whole
of the executor portion but unreasonable maintenance burden.

CustomPlan requires extensions two steps to use; registration of a set of
callbacks, and manipulation of plan tree.
First, extension has to register a set of callbacks with a unique name
using RegisterCustomPlan(). Each callbacks are defined as follows, and
extension is responsible to perform these routines works well.

  void BeginCustomPlan(CustomPlanState *cestate, int eflags);
  TupleTableSlot *ExecCustomPlan(CustomPlanState *node);
  Node *MultiExecCustomPlan(CustomPlanState *node);
  void EndCustomPlan(CustomPlanState *node);
  void ExplainCustomPlan(CustomPlanState *node, ExplainState *es);
  void ReScanCustomPlan(CustomPlanState *node);
  void ExecMarkPosCustomPlan(CustomPlanState *node);
  void ExecRestrPosCustomPlan(CustomPlanState *node);

These callbacks are invoked if plan tree contained CustomPlan node.
However, usual code path never construct this node type towards any
SQL input. So, extension needs to manipulate the plan tree already
constructed.
It is the second job. Extension will put its local code on the planner_hook
to reference and manipulate PlannedStmt object. It can replace particular
nodes in plan tree by CustomPlan, or inject it into arbitrary point.

Though my intention is to implement GPU accelerate table scan or other
stuff on top of this feature, probably, some other useful features can be
thought. Someone suggested it may be useful for PG-XC folks to implement
clustered-scan, at the developer meeting. Also, I have an idea to implement
in-memory query cache that enables to cut off a particular branch of plan tree.
Probably, other folks have other ideas.

The contrib/xtime module shows a simple example that records elapsed time
of the underlying plan node, then print it at end of execution.
For example, this query constructs the following plan-tree as usually we see.

postgres=# EXPLAIN (costs off)
           SELECT * FROM t1 JOIN t2 ON t1.a = t2.x
                    WHERE x BETWEEN 1000 AND 1200 ORDER BY y;
                     QUERY PLAN
-----------------------------------------------------
 Sort
   Sort Key: t2.y
   ->  Nested Loop
         ->  Seq Scan on t2
               Filter: ((x >= 1000) AND (x <= 1200))
         ->  Index Scan using t1_pkey on t1
               Index Cond: (a = t2.x)
(7 rows)

Once xtime module manipulate the plan tree to inject CustomPlan,
it shall become as follows:

postgres=# LOAD '$libdir/xtime';
LOAD
postgres=# EXPLAIN (costs off)
           SELECT * FROM t1 JOIN t2 ON t1.a = t2.x
                    WHERE x BETWEEN 1000 AND 1200 ORDER BY y;
                           QUERY PLAN
-----------------------------------------------------------------
 CustomPlan:xtime
   ->  Sort
         Sort Key: y
         ->  CustomPlan:xtime
               ->  Nested Loop
                     ->  CustomPlan:xtime on t2
                           Filter: ((x >= 1000) AND (x <= 1200))
                     ->  CustomPlan:xtime
                           ->  Index Scan using t1_pkey on t1
                                 Index Cond: (a = x)
(10 rows)

You can see CustomPlan with name of "xtime" appeared in the plan-tree,
then the executor calls functions being registered as callback of "xtime",
when it met CustomPlan during recursive execution.

Extension has to set name of custom plan provider at least when it
construct a CustomPlan node and put it on the target plan tree.
A set of callbacks are looked up by the name, and installed on
CustomPlanState object for execution, on ExecIniNode().
The reason why I didn't put function pointer directly is, plan nodes need
to be complianced to copyObject() and others.

Please any comments.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?
Next
From: Andres Freund
Date:
Subject: Re: lcr v5 - introduction of InvalidCommandId