Thread: [HACKERS] Possible problem in Custom Scan API

[HACKERS] Possible problem in Custom Scan API

From
Dmitry Ivanov
Date:
Hi hackers,

I'm struggling to understand one particular thing about Custom Scan API.

As you may know, there's a function called use_physical_tlist(), which aims
to eliminate meaningless projections during query execution. Any scan node
(e.g. CustomScan) aims to take advantage of physical targetlists... except
for the IndexOnlyScan (for obvious reasons):

createplan.c, create_scan_plan():

if (use_physical_tlist(root, best_path, flags))
{if (best_path->pathtype == T_IndexOnlyScan){    /* For index-only scan, the preferred tlist is the index's */    tlist
=copyObject(((IndexPath *) best_path)->indexinfo->indextlist);    ...} 
...
}

In theory, CustomScans should be able to use any Plan nodes (i.e.
'custom_plans' list), but as far as I can understand, there's no way to
override behavior of use_physical_tlist(), which means that we might see
something like this:

ERROR:  variable not found in subplan target list

if we use child IndexOnlyScan and the index does not include some of the
relation's columns.

Is there any existing workaround?


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Possible problem in Custom Scan API

From
Tom Lane
Date:
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes:
> In theory, CustomScans should be able to use any Plan nodes (i.e. 
> 'custom_plans' list), but as far as I can understand, there's no way to 
> override behavior of use_physical_tlist(), which means that we might see 
> something like this:

> ERROR:  variable not found in subplan target list

> if we use child IndexOnlyScan and the index does not include some of the 
> relation's columns.

Uh, why would you see that?  The planner would never generate an
IndexOnlyScan in the first place if the query required any columns
not available from the index.
        regards, tom lane



Re: [HACKERS] Possible problem in Custom Scan API

From
Dmitry Ivanov
Date:
Tom Lane wrote:
> Uh, why would you see that?  The planner would never generate an
> IndexOnlyScan in the first place if the query required any columns
> not available from the index.

True, but as you can see, create_append_plan() produces its own targetlist:

static Plan *
create_append_plan(PlannerInfo *root, AppendPath *best_path)
{Append       *plan;List       *tlist = build_path_tlist(root, &best_path->path);...

If we replace Append with some custom node, the plan will instantly become
invalid (it won't be be able to build a projection from 'custom_scan_tlist'
to 'targetlist'). However, this doesn't mean that it's unable to produce
the same result.


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Possible problem in Custom Scan API

From
Tom Lane
Date:
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes:
> Tom Lane wrote:
>> Uh, why would you see that?  The planner would never generate an
>> IndexOnlyScan in the first place if the query required any columns
>> not available from the index.

> True, but as you can see, create_append_plan() produces its own targetlist:

> static Plan *
> create_append_plan(PlannerInfo *root, AppendPath *best_path)
> {
>     Append       *plan;
>     List       *tlist = build_path_tlist(root, &best_path->path);
>     ...

> If we replace Append with some custom node, the plan will instantly become 
> invalid (it won't be be able to build a projection from 'custom_scan_tlist' 
> to 'targetlist'). However, this doesn't mean that it's unable to produce 
> the same result.

You haven't really convinced me that anything is wrong there.  The append
plan's tlist isn't going to contain unwanted variables either.

Reading between the lines, I think the problem may be that you're not
being careful about how you set up custom_scan_tlist.  But since the
core code has zero involvement in that decision, it's hard to see why
it would be a core code bug.
        regards, tom lane



Re: [HACKERS] Possible problem in Custom Scan API

From
Dmitry Ivanov
Date:
Tom Lane wrote:
> Reading between the lines, I think the problem may be that you're not
> being careful about how you set up custom_scan_tlist.  But since the
> core code has zero involvement in that decision, it's hard to see why
> it would be a core code bug.

I'm sorry, but you didn't understand. It's *the core code* that builds the
targetlist, and sometimes it may decide to provide a physical targetlist,
not the one that's *really needed*. The broader targetlist has Vars that
cannot be supplied by the IndexOnlyScan node, hence the error.

We cannot come up with our own targetlist because we don't know if it's a
top node and we should return exactly the same tuple (CP_EXACT_TLIST) or
it's just the stray optimization that made us think so.

Append works only because it doesn't allow projections, and it can never
get a physical targetlist if an index doesn't contain all columns.

But everything changes when we use CustomScan: PostgreSQL will not pay
attention. For example, if our CustomScan is a child of NestLoop, the
former will call this (create_nestloop_plan):

inner_plan = create_plan_recurse(root, best_path->innerjoinpath, 0);

Since NestLoop can make projections, it doesn't enforce CP_EXACT_TLIST
flag, and our CustomScan may end up with a full physical targetlist (see
the code of create_scan_plan() if you don't believe me), INSTEAD OF the one
it really has been asked to produce. Meanwhile, Append will behave as it
should, since it doesn't care about physical tlists.

It's not just my imagination.


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Possible problem in Custom Scan API

From
Tom Lane
Date:
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes:
>> Reading between the lines, I think the problem may be that you're not
>> being careful about how you set up custom_scan_tlist.  But since the
>> core code has zero involvement in that decision, it's hard to see why
>> it would be a core code bug.

> I'm sorry, but you didn't understand. It's *the core code* that builds the 
> targetlist, and sometimes it may decide to provide a physical targetlist, 
> not the one that's *really needed*.

Uh, no, construction of a custom plan node is entirely driven by the
PlanCustomPath method as far as I can see.  You're free to ignore what
create_scan_plan did and insert your own tlist.  In particular, if what
your custom path actually is is a rescan of something like an
IndexOnlyScan, why don't you just copy the IOS's result tlist?
        regards, tom lane



Re: [HACKERS] Possible problem in Custom Scan API

From
Dmitry Ivanov
Date:
Tom Lane wrote:
> Uh, no, construction of a custom plan node is entirely driven by the
> PlanCustomPath method as far as I can see.  You're free to ignore what
> create_scan_plan did and insert your own tlist.

Are you sure? Even if it's true, this targetlist should still contain each
and every Var that's been requested. If I'm correct, the only way to ensure
that is to call build_path_tlist(), which is static (oops!). Perhaps I
could make my own, but it uses replace_nestloop_params() (again, static),
and the problem goes further and further.

This effectively means that it would be nice if I could just use the
existing machinery. The fix would be quite trivial.

By the way, what if our CustomScan is a top node? Let's take a look at
create_plan():

/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);

...
if (!IsA(plan, ModifyTable))apply_tlist_labeling(plan->targetlist, root->processed_tlist);
...

If I spoil the targetlist, everything blows up in apply_tlist_labeling():

Assert(list_length(dest_tlist) == list_length(src_tlist));

since the lengths may differ. It's much safer to use the tlist provided by
the core code, but alas, sometimes it's not an option.

> In particular, if what
> your custom path actually is is a rescan of something like an
> IndexOnlyScan, why don't you just copy the IOS's result tlist?

I'm actively working on a prototype of partition pruning. It could be much
simpler if I just patched the core, but we have a working extension for 9.5
and 9.6, and we can't throw it away just yet. I wouldn't bother you if I
didn't encounter a broken query :)



--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Possible problem in Custom Scan API

From
Alexander Korotkov
Date:
On Wed, Apr 12, 2017 at 12:59 AM, Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:
Tom Lane wrote:
Uh, no, construction of a custom plan node is entirely driven by the
PlanCustomPath method as far as I can see.  You're free to ignore what
create_scan_plan did and insert your own tlist.

Are you sure? Even if it's true, this targetlist should still contain each and every Var that's been requested. If I'm correct, the only way to ensure that is to call build_path_tlist(), which is static (oops!). Perhaps I could make my own, but it uses replace_nestloop_params() (again, static), and the problem goes further and further.

As I understand, making build_path_tlist a non-static function would solve the problem.
Tom, do you think it's possible?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Possible problem in Custom Scan API

From
Tom Lane
Date:
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes:
>> Uh, no, construction of a custom plan node is entirely driven by the
>> PlanCustomPath method as far as I can see.  You're free to ignore what
>> create_scan_plan did and insert your own tlist.

> Are you sure? Even if it's true, this targetlist should still contain each 
> and every Var that's been requested. If I'm correct, the only way to ensure 
> that is to call build_path_tlist(), which is static (oops!). Perhaps I 
> could make my own, but it uses replace_nestloop_params() (again, static), 
> and the problem goes further and further.

True.  We could expose build_path_tlist(), perhaps, but then you soon
reach the conclusion that PlanCustomPath should also be given access to
the tlist-related "flags" that are being passed around in createplan.c,
and that's a conclusion I don't want to reach.  That whole business is
a bit of a hack that I hope to redesign someday, but if we embed it in
the custom-scan API it will get very hard to change.

I'm coming around to the idea that it'd be better to disable physical
tlists for custom scans.  That optimization is built on the assumption
that providing all the columns is free (or at least, cheaper than doing
ExecProject), but it's easy to think of custom-scan applications where
that's not true.  It's a disastrous choice for a custom-scan provider
that's trying to interface to a column store, for instance.

However, I'm hesitant to make such a change in the back branches; if
there's anyone using custom scans who's negatively affected, they'd be
rightfully unhappy.  Are you satisfied if we change this in HEAD?
        regards, tom lane



Re: [HACKERS] Possible problem in Custom Scan API

From
Dmitry Ivanov
Date:
Tom Lane wrote:
> I'm coming around to the idea that it'd be better to disable physical
> tlists for custom scans.

I've been thinking about this all along, and it seems that this is a decent
decision. However, I've made a tiny bugfix patch which allows CustomScans
to notify the core code that they can handle physical targetlists.
Extension authors won't have to change anything provided that CustomPath is
allocated using palloc0().

> However, I'm hesitant to make such a change in the back branches; if
> there's anyone using custom scans who's negatively affected, they'd be
> rightfully unhappy.  Are you satisfied if we change this in HEAD?

It's kind of bittersweet. I'm really glad that you've changed your mind and
this is going to be fixed in HEAD, but this change is crucial for our
extension (if we use it with vanilla postgres).

Maybe you'll like my patch more.


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Possible problem in Custom Scan API

From
Tom Lane
Date:
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes:
> Tom Lane wrote:
>> I'm coming around to the idea that it'd be better to disable physical
>> tlists for custom scans.
>> However, I'm hesitant to make such a change in the back branches; if
>> there's anyone using custom scans who's negatively affected, they'd be
>> rightfully unhappy.  Are you satisfied if we change this in HEAD?

After thinking about this over the weekend, I've come to the conclusion
that back-patching is probably the right thing anyway.  The upside of the
use_physical_tlist optimization is pretty marginal even when it applies,
while the downsides of applying it inappropriately can be very large,
as we've discussed.

> I've been thinking about this all along, and it seems that this is a decent 
> decision. However, I've made a tiny bugfix patch which allows CustomScans 
> to notify the core code that they can handle physical targetlists. 

I'm unimpressed by this part --- we couldn't back-patch such a change, and
I think it's unnecessary anyway in 9.6+ because the scan provider could
generate a nondefault pathtarget if it wants this to happen.
        regards, tom lane



Re: [HACKERS] Possible problem in Custom Scan API

From
Dmitry Ivanov
Date:
Tom Lane wrote:
> I'm unimpressed by this part --- we couldn't back-patch such a change, and
> I think it's unnecessary anyway in 9.6+ because the scan provider could
> generate a nondefault pathtarget if it wants this to happen.

You're right, of course. Thank you very much!


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company