Thread: wCTE: why not finish sub-updates at the end, not the beginning?
I had what seems to me a remarkably good idea, though maybe someone else can spot a problem with it. Given that we've decided to run the modifying sub-queries all with the same command counter ID, they are logically executing "in parallel". The current implementation takes no advantage of that fact, though: it's based around the idea of running the updates strictly sequentially. I think we should change it so that the updates happen physically, not only logically, concurrently. Specifically, I'm imagining getting rid of the patch's additions to InitPlan and ExecutePlan that find all the modifying sub-queries and force them to be cycled to completion before the main plan runs. Just run the main plan and let it pull tuples from the CTEs as needed. Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to completion before shutting down the plan. (In the event of an error, we'd never get to ExecutorEnd, but it doesn't matter since whatever updates we did apply are nullified anyhow.) This has a number of immediate and future implementation benefits: 1. RETURNING tuples that aren't actually needed by the main plan don't need to be buffered anywhere. (ExecutorEnd would just pull directly from the ModifyTable nodes, ignoring their parent CTE nodes, in all cases.) 2. In principle, in many common cases the RETURNING tuples wouldn't have to be buffered at all, but could be consumed on-the-fly. I think that right now the CTEScan nodes might still buffer the tuples so they can regurgitate them in case of being rescanned, but it's not hard to see how that could be improved later if it doesn't work immediately. 3. The code could be significantly simpler. Instead of that rather complex and fragile logic in InitPlan to try to locate all the ModifyTable nodes and their CTEScan parents, we could just have ModifyTable nodes add themselves to a list in the EState during ExecInitNode. Then ExecutorEnd just traverses that list. However, the real reason for doing it isn't any of those, but rather to establish the principle that the executions of the modifying sub-queries are interleaved not sequential. We're never going to be able to do any significant optimization of such queries if we have to preserve the behavior that the sub-queries execute sequentially. And I think it's inevitable that users will manage to build such an assumption into their queries if the first release with the feature behaves that way. Comments? regards, tom lane
On Fri, Feb 25, 2011 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I had what seems to me a remarkably good idea, though maybe someone else > can spot a problem with it. Given that we've decided to run the > modifying sub-queries all with the same command counter ID, they are > logically executing "in parallel". The current implementation takes no > advantage of that fact, though: it's based around the idea of running > the updates strictly sequentially. I think we should change it so that > the updates happen physically, not only logically, concurrently. > Specifically, I'm imagining getting rid of the patch's additions to > InitPlan and ExecutePlan that find all the modifying sub-queries and > force them to be cycled to completion before the main plan runs. > Just run the main plan and let it pull tuples from the CTEs as needed. > Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to > completion before shutting down the plan. (In the event of an error, > we'd never get to ExecutorEnd, but it doesn't matter since whatever > updates we did apply are nullified anyhow.) > > This has a number of immediate and future implementation benefits: > > 1. RETURNING tuples that aren't actually needed by the main plan > don't need to be buffered anywhere. (ExecutorEnd would just pull > directly from the ModifyTable nodes, ignoring their parent CTE > nodes, in all cases.) > > 2. In principle, in many common cases the RETURNING tuples wouldn't have > to be buffered at all, but could be consumed on-the-fly. I think that > right now the CTEScan nodes might still buffer the tuples so they can > regurgitate them in case of being rescanned, but it's not hard to see > how that could be improved later if it doesn't work immediately. > > 3. The code could be significantly simpler. Instead of that rather > complex and fragile logic in InitPlan to try to locate all the > ModifyTable nodes and their CTEScan parents, we could just have > ModifyTable nodes add themselves to a list in the EState during > ExecInitNode. Then ExecutorEnd just traverses that list. > > However, the real reason for doing it isn't any of those, but rather > to establish the principle that the executions of the modifying > sub-queries are interleaved not sequential. We're never going to be > able to do any significant optimization of such queries if we have to > preserve the behavior that the sub-queries execute sequentially. > And I think it's inevitable that users will manage to build such an > assumption into their queries if the first release with the feature > behaves that way. > > Comments? I completely agree. Actually, I thought we had already agreed on the design you just proposed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Given that we've decided to run the modifying sub-queries all with > the same command counter ID, they are logically executing "in > parallel". > Just run the main plan and let it pull tuples from the CTEs as > needed. On the face of it, that sounds like it has another benefit you didn't mention -- it sounds like it's much more conducive to allowing parallel processing, if (when?) we eventually move in that direction. It might even be a good case for an initial, limited implementation. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Given that we've decided to run the modifying sub-queries all with >> the same command counter ID, they are logically executing "in >> parallel". >> Just run the main plan and let it pull tuples from the CTEs as >> needed. > On the face of it, that sounds like it has another benefit you > didn't mention -- it sounds like it's much more conducive to > allowing parallel processing, if (when?) we eventually move in that > direction. It might even be a good case for an initial, limited > implementation. Yeah. Most of the executor is in principle parallelizable at the plan-node level (ignoring the obvious and severe implementation problems with parallelizing *anything* in the backend). It's not good for wCTE to be creating a user-visible assumption that certain things will happen in a predefined order. regards, tom lane
On 2011-02-25 4:58 PM, Tom Lane wrote: > Specifically, I'm imagining getting rid of the patch's additions to > InitPlan and ExecutePlan that find all the modifying sub-queries and > force them to be cycled to completion before the main plan runs. > Just run the main plan and let it pull tuples from the CTEs as needed. > Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to > completion before shutting down the plan. (In the event of an error, > we'd never get to ExecutorEnd, but it doesn't matter since whatever > updates we did apply are nullified anyhow.) This idea has actually been discussed before when we talked about optimizing wCTEs, but IIRC you said that doing this in ExecutorEnd is a bit ugly. But if you can write this idea down in a way that makes you happy with the implementation, I think it's a huge benefit and we should definitely do it. > This has a number of immediate and future implementation benefits: > 3. The code could be significantly simpler. Instead of that rather > complex and fragile logic in InitPlan to try to locate all the > ModifyTable nodes and their CTEScan parents, we could just have > ModifyTable nodes add themselves to a list in the EState during > ExecInitNode. Then ExecutorEnd just traverses that list. Sounds good to me. > However, the real reason for doing it isn't any of those, but rather > to establish the principle that the executions of the modifying > sub-queries are interleaved not sequential. We're never going to be > able to do any significant optimization of such queries if we have to > preserve the behavior that the sub-queries execute sequentially. > And I think it's inevitable that users will manage to build such an > assumption into their queries if the first release with the feature > behaves that way. Yeah, you might be right. Regards, Marko Tiikkaja
On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > However, the real reason for doing it isn't any of those, but rather > to establish the principle that the executions of the modifying > sub-queries are interleaved not sequential. We're never going to be > able to do any significant optimization of such queries if we have to > preserve the behavior that the sub-queries execute sequentially. > And I think it's inevitable that users will manage to build such an > assumption into their queries if the first release with the feature > behaves that way. Does the interleaved execution have sane semantics? With a query like: WITH a as update x set x.i=x.i+1 returning x.i, b as update x set x.i=x.i+1 returning x.i select * from a natural join b; Is there any way to tell what it will return or what state it will leave the table in? -- greg
On Fri, Feb 25, 2011 at 11:31 AM, Greg Stark <gsstark@mit.edu> wrote: > On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> However, the real reason for doing it isn't any of those, but rather >> to establish the principle that the executions of the modifying >> sub-queries are interleaved not sequential. We're never going to be >> able to do any significant optimization of such queries if we have to >> preserve the behavior that the sub-queries execute sequentially. >> And I think it's inevitable that users will manage to build such an >> assumption into their queries if the first release with the feature >> behaves that way. > > Does the interleaved execution have sane semantics? > > With a query like: > > WITH > a as update x set x.i=x.i+1 returning x.i, > b as update x set x.i=x.i+1 returning x.i > select * from a natural join b; > > Is there any way to tell what it will return or what state it will > leave the table in? WITH a as update x set x.i=x.i+1 returning x.i, b as update x set x.i=x.i+1 where x.i = 1 returning x.iselect * from a naturaljoin b; or the above if x is.i is 1 for all x on query start? merlin
Greg Stark <gsstark@mit.edu> writes: > On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> However, the real reason for doing it isn't any of those, but rather >> to establish the principle that the executions of the modifying >> sub-queries are interleaved not sequential. > Does the interleaved execution have sane semantics? Depends on what you call sane. With the decision to not increment command counter, it's already the case that people shouldn't have two subqueries try to modify the same row. > With a query like: > WITH > a as update x set x.i=x.i+1 returning x.i, > b as update x set x.i=x.i+1 returning x.i > select * from a natural join b; > Is there any way to tell what it will return or what state it will > leave the table in? My reaction to that is "you shouldn't do that, and you definitely shouldn't complain if it's not predictable whether a or b will modify a given row". This is exactly the sort of assumption I don't want people building into their queries, because we will be locked into purely sequential execution if we promise that the results will be consistent. There is already precedent for that position. You can easily construct queries using UPDATE ... FROM wherein the same target row joins to more than one row in the FROM table, and then it's unpredictable which joining row will be used to update that target row. Our position has always been "don't do that", not that we'd lobotomize the planner and executor to ensure predictability. regards, tom lane
On Fri, Feb 25, 2011 at 09:58:36AM -0500, Tom Lane wrote: > I had what seems to me a remarkably good idea, though maybe someone else > can spot a problem with it. Given that we've decided to run the > modifying sub-queries all with the same command counter ID, they are > logically executing "in parallel". The current implementation takes no > advantage of that fact, though: it's based around the idea of running > the updates strictly sequentially. I think we should change it so that > the updates happen physically, not only logically, concurrently. > Specifically, I'm imagining getting rid of the patch's additions to > InitPlan and ExecutePlan that find all the modifying sub-queries and > force them to be cycled to completion before the main plan runs. > Just run the main plan and let it pull tuples from the CTEs as needed. > Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to > completion before shutting down the plan. (In the event of an error, > we'd never get to ExecutorEnd, but it doesn't matter since whatever > updates we did apply are nullified anyhow.) What's the effect, if any, on CTEs that depend on each other explicitly? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > What's the effect, if any, on CTEs that depend on each other > explicitly? An error. That would require mutual recursion, which we don't support for the SELECT case let alone data-modifying statements. regards, tom lane
On Fri, Feb 25, 2011 at 10:12:02PM -0500, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > What's the effect, if any, on CTEs that depend on each other > > explicitly? > > An error. That would require mutual recursion, which we don't > support for the SELECT case let alone data-modifying statements. Sorry that was unclear. Let's imagine there's a DELETE ... RETURNING in one WITH, and an UPDATE in another that depends on that one. Is that still allowed? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > Sorry that was unclear. Let's imagine there's a DELETE ... RETURNING > in one WITH, and an UPDATE in another that depends on that one. Is > that still allowed? Yeah it is, although I just noticed that there's a bug in the new implementation: with t1 as (insert into x select ... returning *), t2 as (insert into y select * from t1 returning *) select 1; This should result in the same rows inserted into both x and y, but in git HEAD it fails to insert anything into y. The reason is that the ExecutorEnd scan first processes the ModifyTable node for x, and cycles it to completion, discarding the results --- but we needed the CteScan in t2 to see those rows. There's a related case in the regression tests, but it works because the outer query does fetch from both WITH clauses, so there's no need to do anything at ExecutorEnd time. The first solution that comes to mind is to pay attention to the interdependencies of the CTEs, and perform the cleanup in an appropriate order (here, the ModifyTable for y needs to be cycled first). I'm not sure if there's a nicer way. We'll eventually want some interdependency tracking for CTEs anyway, if we're ever to support mutual recursion, so it'd not be completely single-purpose code. regards, tom lane
I wrote: > The first solution that comes to mind is to pay attention to the > interdependencies of the CTEs, and perform the cleanup in an appropriate > order (here, the ModifyTable for y needs to be cycled first). Doh ... actually, we already *are* ordering the CTEs in dependency order, so it's a one-liner fix to do the shutdowns in reverse order. regards, tom lane
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > On 2011-02-25 4:58 PM, Tom Lane wrote: >> Specifically, I'm imagining getting rid of the patch's additions to >> InitPlan and ExecutePlan that find all the modifying sub-queries and >> force them to be cycled to completion before the main plan runs. >> Just run the main plan and let it pull tuples from the CTEs as needed. >> Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to >> completion before shutting down the plan. > This idea has actually been discussed before when we talked about > optimizing wCTEs, but IIRC you said that doing this in ExecutorEnd is a > bit ugly. Further experimentation has reminded me of why I didn't want to put such processing in ExecutorEnd :-(. There are some nasty interactions with EXPLAIN: 1. EXPLAIN ANALYZE fails to include the execution cycles associated with running the ModifyTable nodes to completion. In the worst case, such as "WITH t AS (INSERT ...) SELECT 1", it will claim the INSERT subplan is never executed, even though rows certainly got inserted. This is because EXPLAIN extracts all the counts from the execution state tree before shutting it down with ExecutorEnd. 2. But it gets worse. Try the same query *without* ANALYZE. You'll find the INSERT executes anyway! That's because EXPLAIN still calls ExecutorEnd to clean up the execution state tree, and ExecutorEnd doesn't realize it's not supposed to run any of the plan. So we really need some refactoring here. I dislike adding another fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence, but there may not be a better way. The only way I see to fix this without changing that API is to have ExecutorRun do the cleanup processing just after the top plan node returns a null tuple, and that seems a bit ugly as well. Thoughts? regards, tom lane
On 26.02.2011 07:55, Tom Lane wrote: > So we really need some refactoring here. I dislike adding another > fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence, > but there may not be a better way. Could you keep the sequence unchanged for non-EXPLAIN callers with some refactoring? Add an exposed function like ExecutorFinishRun() that Explain calls explicitly in the EXPLAIN ANALYZE case, and modify ExecutorEnd to call it too, if it hasn't been called yet and the explain-only flag isn't set. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, Feb 26, 2011 at 5:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So we really need some refactoring here. I dislike adding another > fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence, > but there may not be a better way. The only way I see to fix this > without changing that API is to have ExecutorRun do the cleanup > processing just after the top plan node returns a null tuple, and that > seems a bit ugly as well. > How would that handle the case of a cursor which isn't read to completion? Should it still execute the CTEs to completion? -- greg
On 26 February 2011 05:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Further experimentation has reminded me of why I didn't want to put such > processing in ExecutorEnd :-(. There are some nasty interactions with > EXPLAIN: > > 1. EXPLAIN ANALYZE fails to include the execution cycles associated with > running the ModifyTable nodes to completion. In the worst case, such as > "WITH t AS (INSERT ...) SELECT 1", it will claim the INSERT subplan is > never executed, even though rows certainly got inserted. This is > because EXPLAIN extracts all the counts from the execution state tree > before shutting it down with ExecutorEnd. > > 2. But it gets worse. Try the same query *without* ANALYZE. You'll > find the INSERT executes anyway! That's because EXPLAIN still calls > ExecutorEnd to clean up the execution state tree, and ExecutorEnd > doesn't realize it's not supposed to run any of the plan. > There's a third problem: AfterTriggerEndQuery() is called before ExecutorEnd(), and so if the post-processing is done in ExecutorEnd() and it attempts to queue up any AFTER triggers, it fails (ERROR: AfterTriggerSaveEvent() called outside of query). > So we really need some refactoring here. I dislike adding another > fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence, > but there may not be a better way. The only way I see to fix this > without changing that API is to have ExecutorRun do the cleanup > processing just after the top plan node returns a null tuple, and that > seems a bit ugly as well. > > Thoughts? > Could the post-processing not be done at the end of ExecutePlan()? Regards, Dean
Greg Stark <gsstark@mit.edu> writes: > On Sat, Feb 26, 2011 at 5:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So we really need some refactoring here. �I dislike adding another >> fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence, >> but there may not be a better way. �The only way I see to fix this >> without changing that API is to have ExecutorRun do the cleanup >> processing just after the top plan node returns a null tuple, and that >> seems a bit ugly as well. > How would that handle the case of a cursor which isn't read to > completion? Should it still execute the CTEs to completion? Right at the moment we dodge that issue by disallowing wCTEs in cursors. If we did allow them, then I would say that the wCTEs have to be run to completion when the cursor is closed. regards, tom lane
On Sat, Feb 26, 2011 at 4:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Right at the moment we dodge that issue by disallowing wCTEs in cursors. > If we did allow them, then I would say that the wCTEs have to be run to > completion when the cursor is closed. > Does that really dodge anything? Isn't it just the same as running a query from a client and closing the result without reading to the end? ExecutorEnd would be called but ExecutorRun would never be called to the end of the scan. -- greg
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 26.02.2011 07:55, Tom Lane wrote: >> So we really need some refactoring here. I dislike adding another >> fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence, >> but there may not be a better way. > Could you keep the sequence unchanged for non-EXPLAIN callers with some > refactoring? Add an exposed function like ExecutorFinishRun() that > Explain calls explicitly in the EXPLAIN ANALYZE case, and modify > ExecutorEnd to call it too, if it hasn't been called yet and the > explain-only flag isn't set. I had been toying with the same idea, but it doesn't work, as Dean Rasheed points out nearby. The full monty for running the executor these days is really CreateQueryDesc(...); AfterTriggerBeginQuery(...); ExecutorStart(...); ExecutorRun(...); // zeroor more times AfterTriggerEndQuery(...); ExecutorEnd(...); FreeQueryDesc(...); ExecutorEnd can *not* do anything that might fire triggers, because it's too late: AfterTriggerEndQuery has already been called. BTW, there are various places that believe they can skip AfterTriggerBeginQuery/AfterTriggerEndQuery if operation == CMD_SELECT, an assumption that no longer holds if the query has wCTEs. So we have some work to do on the callers no matter what. I'm inclined to think that it would be best to move the responsibility for calling AfterTriggerBeginQuery/AfterTriggerEndQuery into the executor. That would get us down to CreateQueryDesc(...); ExecutorStart(...); // now includes AfterTriggerBeginQuery ExecutorRun(...); // zero or more times ExecutorFinish(...); // ModifyTable cleanup, AfterTriggerEndQuery ExecutorEnd(...); // just does what it's always done FreeQueryDesc(...); where EXPLAIN without ANALYZE would skip ExecutorRun and ExecutorFinish. The RI triggers have a requirement for being able to run this sequence without the AfterTriggerBeginQuery/AfterTriggerEndQuery calls (cf SPI_execute_snapshot's fire_triggers parameter), but we could support that by adding an ExecutorStart flag that tells it to suppress those trigger calls. IMO the major disadvantage of a refactoring like this is the possibility of sins of omission in third-party code, in particular somebody not noticing the added requirement to call ExecutorFinish. We could help them out by adding an Assert in ExecutorEnd to verify that ExecutorFinish had been called (unless explain-only mode). A variant of that problem is an auto_explain-like add-on not noticing that they probably want to hook into ExecutorFinish if they'd previously been hooking ExecutorRun. I don't see any simple check for that though. The other possible failure mode is forgetting to remove calls to the two trigger functions, but we could encourage getting that right by renaming those two functions. Comments? regards, tom lane
On 2011-02-26 7:18 PM, Tom Lane wrote: > IMO the major disadvantage of a refactoring like this is the possibility > of sins of omission in third-party code, in particular somebody not > noticing the added requirement to call ExecutorFinish. We could help > them out by adding an Assert in ExecutorEnd to verify that > ExecutorFinish had been called (unless explain-only mode). A variant of > that problem is an auto_explain-like add-on not noticing that they > probably want to hook into ExecutorFinish if they'd previously been > hooking ExecutorRun. I don't see any simple check for that though. > The other possible failure mode is forgetting to remove calls to the two > trigger functions, but we could encourage getting that right by renaming > those two functions. While I don't really like the possibility of breaking third party modules, I think the idea is good. Also +1 for adding checks where possible. Regards, Marko Tiikkaja
I wrote: > I'm inclined to think that it would be best to move the responsibility > for calling AfterTriggerBeginQuery/AfterTriggerEndQuery into the > executor. That would get us down to > CreateQueryDesc(...); > ExecutorStart(...); // now includes AfterTriggerBeginQuery > ExecutorRun(...); // zero or more times > ExecutorFinish(...); // ModifyTable cleanup, AfterTriggerEndQuery > ExecutorEnd(...); // just does what it's always done > FreeQueryDesc(...); > where EXPLAIN without ANALYZE would skip ExecutorRun and ExecutorFinish. > IMO the major disadvantage of a refactoring like this is the possibility > of sins of omission in third-party code, in particular somebody not > noticing the added requirement to call ExecutorFinish. We could help > them out by adding an Assert in ExecutorEnd to verify that > ExecutorFinish had been called (unless explain-only mode). A variant of > that problem is an auto_explain-like add-on not noticing that they > probably want to hook into ExecutorFinish if they'd previously been > hooking ExecutorRun. I don't see any simple check for that though. > The other possible failure mode is forgetting to remove calls to the two > trigger functions, but we could encourage getting that right by renaming > those two functions. This is committed. I desisted from the last change (renaming the trigger functions) because it seemed unnecessary. If someone does forget to remove redundant AfterTriggerBeginQuery/AfterTriggerEndQuery calls, it won't hurt them much, just waste a few cycles stacking and unstacking useless trigger contexts. regards, tom lane