Re: Custom Plan node - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: Custom Plan node
Date
Msg-id CADyhKSW2FjuYqAQUK4rn_zvLg13FGEb0WJsMG5VdTxJ2rV3fyg@mail.gmail.com
Whole thread Raw
In response to Re: Custom Plan node  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Custom Plan node  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
2013/9/13 Robert Haas <robertmhaas@gmail.com>:
> On Tue, Sep 10, 2013 at 11:45 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>> Fair enough, I think.  So the action item for KaiGai is to think of
>>> how the planner integration might work.
>>>
>> Do you think the idea I mentioned at the upthread is worth to investigate
>> for more detailed consideration? Or, does it seems to you short-sighted
>> thinking to fit this infrastructure with planner?
>>
>> It categorizes plan node into three: join, scan and other stuff.
>> Cost based estimation is almost applied on join and scan, so abstracted
>> scan and join may make sense to inform core planner what does this
>> custom plan node try to do.
>> On the other hand, other stuff, like Agg, is a stuff that must be added
>> according to the provided query, even if its cost estimation was not small,
>> to perform as the provided query described.
>> So, I thought important one is integration of join and scan, but manipulation
>> of plan tree for other stuff is sufficient.
>>
>> How about your opinion?
>
> Well, I don't know that I'm smart enough to predict every sort of
> thing that someone might want to do here, unfortunately.  This is a
> difficult area: there are many possible things someone might want to
> do, and as Tom points out, there's a lot of special handling of
> particular node types that can make things difficult.  And I can't
> claim to be an expert in this area.
>
Sorry for my late response. I've tried to investigate the planner code
to find out the way to integrate this custom api, and it is still in
progress.
One special handling I found was that create_join_plan() adjust
root->curOuterRels prior to recursion of inner tree if NestLoop.
Probably, we need some flags to control these special handling
in the core.
It is a hard job to list up all the stuff, so it seems to me we need
to check-up them during code construction...

> That having been said, I think the idea of a CustomScan node is
> probably worth investigating.  I don't know if that would work out
> well or poorly, but I think it would be worth some experimentation.
> Perhaps you could have a hook that gets called for each baserel, and
> can decide whether or not it wishes to inject any additional paths;
> and then a CustomScan node that could be used to introduce such paths.
>  I've been thinking that we really ought to have the ability to
> optimize CTID range queries, like SELECT * FROM foo WHERE ctid > 'some
> constant'.  We have a Tid Scan node, but it only handles equalities,
> not inequalities.  I suppose this functionality should really be in
> core, but since it's not it might make an interesting test for the
> infrastructure you're proposing.  You may be able to think of
> something else you like better; it's just a thought.
>
This above framework was exactly what I considered.
Probably, we have to put a hook on functions invoked by
set_base_rel_pathlist() to add another possible way to scan
the provided baserel, then set_cheapest() will choose the
most reasonable one.
The attached patch, it's just a works-in-progress, shows
which hook I try to put around the code. Please grep it
with "add_custom_scan_paths".
Regarding to the "guest module" of this framework, another
idea that I have is, built-in query cache module that returns
previous scan result being cached if table contents was not
updated from the previous run. Probably, it makes sense in
case when most of rows are filtered out in this scan.
Anyway, I'd like to consider something useful to demonstrate
this API.

> I am a little less sanguine about the chances of a CustomJoin node
> working out well.  I agree that we need something to handle join
> pushdown, but it seems to me that might be done by providing a Foreign
> Scan path into the joinrel rather than by adding a concept of foreign
> joins per se.
>
Indeed, if we have a hook on add_paths_to_joinrel(), it also makes
sense for foreign tables; probably, planner will choose foreign-path
instead of existing join node including foreign-scans.

>  There are other possible new join types, like the Range
> Join that Jeff Davis has mentioned in the past, which might be
> interesting.  But overall, I can't see us adding very many more join
> types, so I'm not totally sure how much extensibility would help us
> here.  We've added a few scan types over the years (index only scans
> in 9.2, and bitmap index scans in 8.1, I think) but all of our
> existing join types go back to time immemorial.
>
It seems to me a significant point. Even if the custom node being added
by extension instead of joins looks like a scan for core-PostgreSQL,
extension will be able to run its custom join equivalent to join.

I think, the above built-in query cache idea make sense to demonstrate
this pseudo join node; that will be able to hold a materialized result being
already joined. At least, it seems to me sufficient for my target; table join
accelerated with GPU device.

> And I think that lumping everything else together under "not a scan or
> join" has the least promise of all.  Is replacing Append really the
> same as replacing Sort?  I think we'll need to think harder here about
> what we're trying to accomplish and how to get there.
>
As long as extension modifies PlannedStmt on the planner_hook,
I don't think it is not difficult so much, as I demonstrate on the
previous patch.
Unlike scan or join, existing code is not designed to compare
multiple possible paths, so it seems to me a feature to adjust
a plan-tree already construct is sufficient for most usage
because extension can decide which one can offer more cheap
path than built-in ones.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

pgsql-hackers by date:

Previous
From: Rushabh Lathia
Date:
Subject: Re: insert throw error when year field len > 4 for timestamptz datatype
Next
From: Tom Lane
Date:
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL