Thread: [HACKERS] Possible problem in Custom Scan API
Hi hackers, I'm struggling to understand one particular thing about Custom Scan API. As you may know, there's a function called use_physical_tlist(), which aims to eliminate meaningless projections during query execution. Any scan node (e.g. CustomScan) aims to take advantage of physical targetlists... except for the IndexOnlyScan (for obvious reasons): createplan.c, create_scan_plan(): if (use_physical_tlist(root, best_path, flags)) {if (best_path->pathtype == T_IndexOnlyScan){ /* For index-only scan, the preferred tlist is the index's */ tlist =copyObject(((IndexPath *) best_path)->indexinfo->indextlist); ...} ... } In theory, CustomScans should be able to use any Plan nodes (i.e. 'custom_plans' list), but as far as I can understand, there's no way to override behavior of use_physical_tlist(), which means that we might see something like this: ERROR: variable not found in subplan target list if we use child IndexOnlyScan and the index does not include some of the relation's columns. Is there any existing workaround? -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes: > In theory, CustomScans should be able to use any Plan nodes (i.e. > 'custom_plans' list), but as far as I can understand, there's no way to > override behavior of use_physical_tlist(), which means that we might see > something like this: > ERROR: variable not found in subplan target list > if we use child IndexOnlyScan and the index does not include some of the > relation's columns. Uh, why would you see that? The planner would never generate an IndexOnlyScan in the first place if the query required any columns not available from the index. regards, tom lane
Tom Lane wrote: > Uh, why would you see that? The planner would never generate an > IndexOnlyScan in the first place if the query required any columns > not available from the index. True, but as you can see, create_append_plan() produces its own targetlist: static Plan * create_append_plan(PlannerInfo *root, AppendPath *best_path) {Append *plan;List *tlist = build_path_tlist(root, &best_path->path);... If we replace Append with some custom node, the plan will instantly become invalid (it won't be be able to build a projection from 'custom_scan_tlist' to 'targetlist'). However, this doesn't mean that it's unable to produce the same result. -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes: > Tom Lane wrote: >> Uh, why would you see that? The planner would never generate an >> IndexOnlyScan in the first place if the query required any columns >> not available from the index. > True, but as you can see, create_append_plan() produces its own targetlist: > static Plan * > create_append_plan(PlannerInfo *root, AppendPath *best_path) > { > Append *plan; > List *tlist = build_path_tlist(root, &best_path->path); > ... > If we replace Append with some custom node, the plan will instantly become > invalid (it won't be be able to build a projection from 'custom_scan_tlist' > to 'targetlist'). However, this doesn't mean that it's unable to produce > the same result. You haven't really convinced me that anything is wrong there. The append plan's tlist isn't going to contain unwanted variables either. Reading between the lines, I think the problem may be that you're not being careful about how you set up custom_scan_tlist. But since the core code has zero involvement in that decision, it's hard to see why it would be a core code bug. regards, tom lane
Tom Lane wrote: > Reading between the lines, I think the problem may be that you're not > being careful about how you set up custom_scan_tlist. But since the > core code has zero involvement in that decision, it's hard to see why > it would be a core code bug. I'm sorry, but you didn't understand. It's *the core code* that builds the targetlist, and sometimes it may decide to provide a physical targetlist, not the one that's *really needed*. The broader targetlist has Vars that cannot be supplied by the IndexOnlyScan node, hence the error. We cannot come up with our own targetlist because we don't know if it's a top node and we should return exactly the same tuple (CP_EXACT_TLIST) or it's just the stray optimization that made us think so. Append works only because it doesn't allow projections, and it can never get a physical targetlist if an index doesn't contain all columns. But everything changes when we use CustomScan: PostgreSQL will not pay attention. For example, if our CustomScan is a child of NestLoop, the former will call this (create_nestloop_plan): inner_plan = create_plan_recurse(root, best_path->innerjoinpath, 0); Since NestLoop can make projections, it doesn't enforce CP_EXACT_TLIST flag, and our CustomScan may end up with a full physical targetlist (see the code of create_scan_plan() if you don't believe me), INSTEAD OF the one it really has been asked to produce. Meanwhile, Append will behave as it should, since it doesn't care about physical tlists. It's not just my imagination. -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes: >> Reading between the lines, I think the problem may be that you're not >> being careful about how you set up custom_scan_tlist. But since the >> core code has zero involvement in that decision, it's hard to see why >> it would be a core code bug. > I'm sorry, but you didn't understand. It's *the core code* that builds the > targetlist, and sometimes it may decide to provide a physical targetlist, > not the one that's *really needed*. Uh, no, construction of a custom plan node is entirely driven by the PlanCustomPath method as far as I can see. You're free to ignore what create_scan_plan did and insert your own tlist. In particular, if what your custom path actually is is a rescan of something like an IndexOnlyScan, why don't you just copy the IOS's result tlist? regards, tom lane
Tom Lane wrote: > Uh, no, construction of a custom plan node is entirely driven by the > PlanCustomPath method as far as I can see. You're free to ignore what > create_scan_plan did and insert your own tlist. Are you sure? Even if it's true, this targetlist should still contain each and every Var that's been requested. If I'm correct, the only way to ensure that is to call build_path_tlist(), which is static (oops!). Perhaps I could make my own, but it uses replace_nestloop_params() (again, static), and the problem goes further and further. This effectively means that it would be nice if I could just use the existing machinery. The fix would be quite trivial. By the way, what if our CustomScan is a top node? Let's take a look at create_plan(): /* Recursively process the path tree, demanding the correct tlist result */ plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST); ... if (!IsA(plan, ModifyTable))apply_tlist_labeling(plan->targetlist, root->processed_tlist); ... If I spoil the targetlist, everything blows up in apply_tlist_labeling(): Assert(list_length(dest_tlist) == list_length(src_tlist)); since the lengths may differ. It's much safer to use the tlist provided by the core code, but alas, sometimes it's not an option. > In particular, if what > your custom path actually is is a rescan of something like an > IndexOnlyScan, why don't you just copy the IOS's result tlist? I'm actively working on a prototype of partition pruning. It could be much simpler if I just patched the core, but we have a working extension for 9.5 and 9.6, and we can't throw it away just yet. I wouldn't bother you if I didn't encounter a broken query :) -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Wed, Apr 12, 2017 at 12:59 AM, Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
Tom Lane wrote:Uh, no, construction of a custom plan node is entirely driven by the
PlanCustomPath method as far as I can see. You're free to ignore what
create_scan_plan did and insert your own tlist.
Are you sure? Even if it's true, this targetlist should still contain each and every Var that's been requested. If I'm correct, the only way to ensure that is to call build_path_tlist(), which is static (oops!). Perhaps I could make my own, but it uses replace_nestloop_params() (again, static), and the problem goes further and further.
As I understand, making build_path_tlist a non-static function would solve the problem.
Tom, do you think it's possible?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes: >> Uh, no, construction of a custom plan node is entirely driven by the >> PlanCustomPath method as far as I can see. You're free to ignore what >> create_scan_plan did and insert your own tlist. > Are you sure? Even if it's true, this targetlist should still contain each > and every Var that's been requested. If I'm correct, the only way to ensure > that is to call build_path_tlist(), which is static (oops!). Perhaps I > could make my own, but it uses replace_nestloop_params() (again, static), > and the problem goes further and further. True. We could expose build_path_tlist(), perhaps, but then you soon reach the conclusion that PlanCustomPath should also be given access to the tlist-related "flags" that are being passed around in createplan.c, and that's a conclusion I don't want to reach. That whole business is a bit of a hack that I hope to redesign someday, but if we embed it in the custom-scan API it will get very hard to change. I'm coming around to the idea that it'd be better to disable physical tlists for custom scans. That optimization is built on the assumption that providing all the columns is free (or at least, cheaper than doing ExecProject), but it's easy to think of custom-scan applications where that's not true. It's a disastrous choice for a custom-scan provider that's trying to interface to a column store, for instance. However, I'm hesitant to make such a change in the back branches; if there's anyone using custom scans who's negatively affected, they'd be rightfully unhappy. Are you satisfied if we change this in HEAD? regards, tom lane
Tom Lane wrote: > I'm coming around to the idea that it'd be better to disable physical > tlists for custom scans. I've been thinking about this all along, and it seems that this is a decent decision. However, I've made a tiny bugfix patch which allows CustomScans to notify the core code that they can handle physical targetlists. Extension authors won't have to change anything provided that CustomPath is allocated using palloc0(). > However, I'm hesitant to make such a change in the back branches; if > there's anyone using custom scans who's negatively affected, they'd be > rightfully unhappy. Are you satisfied if we change this in HEAD? It's kind of bittersweet. I'm really glad that you've changed your mind and this is going to be fixed in HEAD, but this change is crucial for our extension (if we use it with vanilla postgres). Maybe you'll like my patch more. -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Dmitry Ivanov <d.ivanov@postgrespro.ru> writes: > Tom Lane wrote: >> I'm coming around to the idea that it'd be better to disable physical >> tlists for custom scans. >> However, I'm hesitant to make such a change in the back branches; if >> there's anyone using custom scans who's negatively affected, they'd be >> rightfully unhappy. Are you satisfied if we change this in HEAD? After thinking about this over the weekend, I've come to the conclusion that back-patching is probably the right thing anyway. The upside of the use_physical_tlist optimization is pretty marginal even when it applies, while the downsides of applying it inappropriately can be very large, as we've discussed. > I've been thinking about this all along, and it seems that this is a decent > decision. However, I've made a tiny bugfix patch which allows CustomScans > to notify the core code that they can handle physical targetlists. I'm unimpressed by this part --- we couldn't back-patch such a change, and I think it's unnecessary anyway in 9.6+ because the scan provider could generate a nondefault pathtarget if it wants this to happen. regards, tom lane
Tom Lane wrote: > I'm unimpressed by this part --- we couldn't back-patch such a change, and > I think it's unnecessary anyway in 9.6+ because the scan provider could > generate a nondefault pathtarget if it wants this to happen. You're right, of course. Thank you very much! -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company