Re: [v9.5] Custom Plan API - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: [v9.5] Custom Plan API
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F801087186@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: [v9.5] Custom Plan API  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [v9.5] Custom Plan API  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > Let me explain the current idea of mine.
> > CustomScan node will have a field that hold varnode mapping
> > information that is constructed by custom-scan provider on
> > create_customscan_plan, if they want. It is probably a list of varnode.
> > If exists, setrefs.c changes its behavior; that updates varno/varattno
> > of varnode according to this mapping, as if set_join_references() does
> > based on indexed_tlist.
> > To reference exct_scantuple, INDEX_VAR will be a best choice for varno
> > of these varnodes, and index of the above varnode mapping list will be
> > varattno. It can be utilized to make EXPLAIN output, instead of
> > GetSpecialCustomVar hook.
>
> > So, steps to go may be:
> > (1) Add custom_private, custom_exprs, ... instead of self defined data
> >     type based on CustomXXX.
> > (2) Rid of SetCustomScanRef and GetSpecialCustomVar hook for the current
> >     custom-"scan" support.
> > (3) Integration of above varnode mapping feature within upcoming join
> >     replacement by custom-scan support.
>
> Well ... I still do not find this interesting, because I don't believe that
> CustomScan is a solution to anything interesting.  It's difficult enough
> to solve problems like expensive-function pushdown within the core code;
> why would we tie one hand behind our backs by insisting that they should
> be solved by extensions?  And as I mentioned before, we do need solutions
> to these problems in the core, regardless of CustomScan.
>
I'd like to split the "anything interesting" into two portions.
As you pointed out, the feature to push-down complicated expression
may need a bit large efforts (for remaining two commit-fest at least),
however, what the feature to replace join by custom-scan requires is
similar to job of set_join_references() because it never involves
translation between varnode and general expression.

Also, from my standpoint, a simple join replacement by custom-scan has
higher priority; join acceleration in v9.5 makes sense even if full-
functionality of pushing down general expression is not supported yet.

> I think that a useful way to go at this might be to think first about how
> to make use of expensive functions that have been cached in indexes, and
> then see how the solution to that might translate to pushing down expensive
> functions into FDWs and CustomScans.  If you start with the CustomScan
> aspect of it then you immediately find yourself trying to design APIs to
> divide up the solution, which is premature when you don't even know what
> the solution is.
>
Yep, it also seems to me remaining two commit fests are a bit tight
schedule to make consensus of overall design and to implement.
I'd like to focus on the simpler portion first.

> The rough idea I'd had about this is that while canvassing a relation's
> indexes (in get_relation_info), we could create a list of precomputed
> expressions that are available from indexes, then run through the query
> tree and replace any matching subexpressions with some Var-like nodes (or
> maybe better PlaceHolderVar-like nodes) that indicate that "we can get this
> expression for free if we read the right index".
> If we do read the right index, such an expression reduces to a Var in the
> finished plan tree; if not, it reverts to the original expression.
> (Some thought would need to be given to the semantics when the index's table
> is underneath an outer join --- that may just mean that we can't necessarily
> replace every textually-matching subexpression, only those that are not
> above an outer join.) One question mark here is how to do the "replace
> any matching subexpressions" bit without O(lots) processing cost in big
> queries.  But that's probably just a SMOP.  The bigger issue I fear is that
> the planner is not currently structured to think that evaluation cost of
> expressions in the SELECT list has anything to do with which Path it should
> pick.  That is tied to the handwaving I've been doing for awhile now about
> converting all the upper-level planning logic into
> generate-and-compare-Paths style; we certainly cannot ignore tlist eval
> costs while making those decisions.  So at least for those upper-level Paths,
> we'd have to have a notion of what tlist we expect that plan level to compute,
> and charge appropriate evaluation costs.
>
Let me investigate the planner code more prior to comment on...

> So there's a lot of work there and I don't find that CustomScan looks like
> a solution to any of it.  CustomScan and FDWs could benefit from this work,
> in that we'd now have a way to deal with the concept that expensive functions
> (and aggregates, I hope) might be computed at the bottom scan level.  But
> it's folly to suppose that we can make it work just by hacking some
> arms-length extension code without any fundamental planner changes.
>
Indeed, I don't think it is a good idea to start from this harder portion.
Let's focus on just varno/varattno remapping to replace join relation by
custom-scan, as an immediate target.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Stating the significance of Lehman & Yao in the nbtree README
Next
From: Heikki Linnakangas
Date:
Subject: Re: add modulo (%) operator to pgbench