Thread: wCTE behaviour
Hi all, The discussion around wCTE during the last week or so has brought to my attention that we don't actually have a consensus on how exactly wCTEs should behave. The question seems to be whether or not a statement should see the modifications of statements ran before it. While I think making the modifications visible would be a lot more intuitive, it's not clear how we'd optimize the execution in the future without changing the behaviour (triggers are a big concern). I've done some digging today and it seems that IBM's DB2 took the more intuitive approach: all statements are ran, in the order they're written in, to completion before the main statement, materializing the "deltas" into a temporary table and the modifications are made visible to the next statements. I have no idea how many complaints they have received about this behaviour, but I'd be in favor of matching it. Thoughts? Regards, Marko Tiikkaja
On Thu, Nov 11, 2010 at 04:15:34AM +0200, Marko Tiikkaja wrote: > Hi all, > > The discussion around wCTE during the last week or so has brought to > my attention that we don't actually have a consensus on how exactly > wCTEs should behave. The question seems to be whether or not a > statement should see the modifications of statements ran before it. > While I think making the modifications visible would be a lot more > intuitive, it's not clear how we'd optimize the execution in the > future without changing the behaviour (triggers are a big concern). +1 for letting writeable CTEs see the results of previous CTEs, just as current non-writeable ones do. A lot of the useful cases for this feature depend on this visibility. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2010-11-11 6:41 PM +0200, David Fetter wrote: > On Thu, Nov 11, 2010 at 04:15:34AM +0200, Marko Tiikkaja wrote: >> The discussion around wCTE during the last week or so has brought to >> my attention that we don't actually have a consensus on how exactly >> wCTEs should behave. The question seems to be whether or not a >> statement should see the modifications of statements ran before it. >> While I think making the modifications visible would be a lot more >> intuitive, it's not clear how we'd optimize the execution in the >> future without changing the behaviour (triggers are a big concern). > > +1 for letting writeable CTEs see the results of previous CTEs, just > as current non-writeable ones do. A lot of the useful cases for this > feature depend on this visibility. Just to be clear, the main point is whether they see the data modifications or not. The simplest case to point out this behaviour is: WITH t AS (DELETE FROM foo) SELECT * FROM foo; And the big question is: what state of "foo" should the SELECT statement see? Regards, Marko Tiikkaja
<div class="gmail_quote">On 11 November 2010 16:50, Marko Tiikkaja <span dir="ltr"><<a href="mailto:marko.tiikkaja@cs.helsinki.fi">marko.tiikkaja@cs.helsinki.fi</a>></span>wrote:<br /><blockquote class="gmail_quote"style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><divclass="im">On 2010-11-11 6:41 PM +0200, David Fetter wrote:<br /></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im"> On Thu,Nov 11, 2010 at 04:15:34AM +0200, Marko Tiikkaja wrote:<br /></div><div class="im"><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> The discussion around wCTEduring the last week or so has brought to<br /> my attention that we don't actually have a consensus on how exactly<br/> wCTEs should behave. The question seems to be whether or not a<br /> statement should see the modificationsof statements ran before it.<br /> While I think making the modifications visible would be a lot more<br />intuitive, it's not clear how we'd optimize the execution in the<br /> future without changing the behaviour (triggersare a big concern).<br /></blockquote><br /> +1 for letting writeable CTEs see the results of previous CTEs, just<br/> as current non-writeable ones do. A lot of the useful cases for this<br /> feature depend on this visibility.<br/></div></blockquote><br /> Just to be clear, the main point is whether they see the data modifications ornot. The simplest case to point out this behaviour is:<br /><br /> WITH t AS (DELETE FROM foo)<br /> SELECT * FROM foo;<br/><br /> And the big question is: what state of "foo" should the SELECT statement see?<br /><br /></blockquote></div><br/>I would expect that select to return nothing. And if the user wished to reference what was deleted,they could use RETURNING anyway. </probable ignorance><br /><br />WITH t AS (UPDATE foo SET col = true)<br/> SELECT * FROM foo WHERE col = false;<br /><br />... Wouldn't this be more practical to have foo's UPDATEs appliedprior to SELECT? Otherwise what would the usecase be?<br clear="all" /><br />-- <br />Thom Brown<br />Twitter: @darkixion<br/> IRC (freenode): dark_ixion<br />Registered Linux user: #516935<br />
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > On 2010-11-11 6:41 PM +0200, David Fetter wrote: >> On Thu, Nov 11, 2010 at 04:15:34AM +0200, Marko Tiikkaja wrote: >>> The discussion around wCTE during the last week or so has brought to >>> my attention that we don't actually have a consensus on how exactly >>> wCTEs should behave. The question seems to be whether or not a >>> statement should see the modifications of statements ran before it. >> +1 for letting writeable CTEs see the results of previous CTEs, just >> as current non-writeable ones do. A lot of the useful cases for this >> feature depend on this visibility. > Just to be clear, the main point is whether they see the data > modifications or not. The simplest case to point out this behaviour is: > WITH t AS (DELETE FROM foo) > SELECT * FROM foo; > And the big question is: what state of "foo" should the SELECT statement > see? You've already predetermined the outcome of the argument by phrasing it that way: if you assume that the CTE runs "before" the main statement then the conclusion is foregone. To my mind, they should be thought of as running in parallel, or at least in an indeterminate order, just exactly the same way that different data modifications made in a single INSERT/UPDATE/DELETE command are considered to be made simultaneously. If someone came to us and complained because his ON UPDATE trigger couldn't reliably see changes made to other rows by the same UPDATE command, and could we please make UPDATE more deterministic, we'd tell him to rethink what he was doing. This is the same thing. It is already the case that a user who pushes on things hard enough can see that a WITH isn't really run "before" the main command. For example, regression=# create sequence s1; CREATE SEQUENCE regression=# with tt(x,y) as (select x, nextval('s1') from generate_series(1,10) x) regression-# select x,y, nextval('s1') as z from tt;x | y | z ----+----+---- 1 | 1 | 2 2 | 3 | 4 3 | 5 | 6 4 | 7 | 8 5 | 9 | 10 6 | 11 | 12 7 | 13 | 14 8 | 15 | 16 9 | 17 | 1810| 19 | 20 (10 rows) If we establish a precedent that WITHs can be thought of as executing before the main command, we will eventually have to de-optimize existing WITH behavior. Or else make up reasons why the inconsistency is okay in some cases and not others, but that will definitely be a case of rationalizing after the fact. regards, tom lane
On Nov 11, 2010, at 9:13 AM, Tom Lane wrote: > If we establish a precedent that WITHs can be thought of as executing > before the main command, we will eventually have to de-optimize existing > WITH behavior. Or else make up reasons why the inconsistency is okay in > some cases and not others, but that will definitely be a case of > rationalizing after the fact. I can see that, but if one can't see the result of the write, or can't determine whether or not it will be visible in advance,what's the point of writeable CTEs? Best, David
"David E. Wheeler" <david@kineticode.com> writes: > I can see that, but if one can't see the result of the write, or can't determine whether or not it will be visible in advance,what's the point of writeable CTEs? The writeable CTE returns a RETURNING set, which you can and should use in the outer query. The thing that is being argued about here is what you see if you look "directly" at the target table rather than making use of RETURNING. Essentially, I'm arguing that we shouldn't promise any particular behavior at that level, just as we don't promise that UPDATE updates different rows in any determinate order. regards, tom lane
Thom Brown <thom@linux.com> writes: > WITH t AS (UPDATE foo SET col = true) > SELECT * FROM foo WHERE col = false; > ... Wouldn't this be more practical to have foo's UPDATEs applied prior to > SELECT? Otherwise what would the usecase be? If that's what you want, you might as well just issue two separate statements. There is no use-case for this at all unless the WITH produces some RETURNING data that the SELECT makes use of. regards, tom lane
On 11 Nov 2010, at 19:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: >> On 2010-11-11 6:41 PM +0200, David Fetter wrote: >>> On Thu, Nov 11, 2010 at 04:15:34AM +0200, Marko Tiikkaja wrote: >>>> The discussion around wCTE during the last week or so has brought >>>> to >>>> my attention that we don't actually have a consensus on how exactly >>>> wCTEs should behave. The question seems to be whether or not a >>>> statement should see the modifications of statements ran before it. > >>> +1 for letting writeable CTEs see the results of previous CTEs, just >>> as current non-writeable ones do. A lot of the useful cases for >>> this >>> feature depend on this visibility. > >> Just to be clear, the main point is whether they see the data >> modifications or not. The simplest case to point out this >> behaviour is: > >> WITH t AS (DELETE FROM foo) >> SELECT * FROM foo; > >> And the big question is: what state of "foo" should the SELECT >> statement >> see? > > You've already predetermined the outcome of the argument by phrasing > it > that way: if you assume that the CTE runs "before" the main statement > then the conclusion is foregone. To my mind, they should be thought > of > as running in parallel, or at least in an indeterminate order, just > exactly the same way that different data modifications made in a > single > INSERT/UPDATE/DELETE command are considered to be made simultaneously. > .. > If we establish a precedent that WITHs can be thought of as executing > before the main command, we will eventually have to de-optimize > existing > WITH behavior. Or else make up reasons why the inconsistency is > okay in > some cases and not others, but that will definitely be a case of > rationalizing after the fact. I apologize, I had misunderstood what you are suggesting. But now that I do, it seems to be an even worse idea to go your way. Based on my research, I'm almost certain that the SQL standard says that the execution order is deterministic if there is at least one DML statement in the WITH list. Can anyone confirm this? Regards, Marko Tiikkaja
On Thu, Nov 11, 2010 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > then the conclusion is foregone. To my mind, they should be thought of > as running in parallel, or at least in an indeterminate order, just > exactly the same way that different data modifications made in a single > INSERT/UPDATE/DELETE command are considered to be made simultaneously. +1 merlin
On Nov 11, 2010, at 9:29 AM, Tom Lane wrote: >> I can see that, but if one can't see the result of the write, or can't determine whether or not it will be visible inadvance, what's the point of writeable CTEs? > > The writeable CTE returns a RETURNING set, which you can and should use > in the outer query. The thing that is being argued about here is what > you see if you look "directly" at the target table rather than making > use of RETURNING. Essentially, I'm arguing that we shouldn't promise > any particular behavior at that level, just as we don't promise that > UPDATE updates different rows in any determinate order. Yes, if RETURNING guarantees the execution order, then great. That was the first thing I tried to do before I realized thatthe current CTE implementation doesn't support w. David
"David E. Wheeler" <david@kineticode.com> writes: > On Nov 11, 2010, at 9:29 AM, Tom Lane wrote: >> The writeable CTE returns a RETURNING set, which you can and should use >> in the outer query. The thing that is being argued about here is what >> you see if you look "directly" at the target table rather than making >> use of RETURNING. Essentially, I'm arguing that we shouldn't promise >> any particular behavior at that level, just as we don't promise that >> UPDATE updates different rows in any determinate order. > Yes, if RETURNING guarantees the execution order, then great. That was the first thing I tried to do before I realizedthat the current CTE implementation doesn't support w. Well, it doesn't "guarantee the execution order", it's just that that's the defined conduit for getting information out of the WITH and into the parent query. Looking directly at the table is not that conduit. I misspoke by saying that the behavior would be nondeterministic. What I think we should do is run all elements of the tree with the same snapshot, which would provide perfectly deterministic behavior: if you look at the target table, you see the prior state. You don't see the updated state, which is what allows us to possibly optimize things so that the updates aren't completely made before execution of the main query starts. regards, tom lane
On Thu, Nov 11, 2010 at 12:36:38PM -0500, Merlin Moncure wrote: > On Thu, Nov 11, 2010 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > then the conclusion is foregone. To my mind, they should be thought of > > as running in parallel, or at least in an indeterminate order, just > > exactly the same way that different data modifications made in a single > > INSERT/UPDATE/DELETE command are considered to be made simultaneously. > > +1 -1. When people want to see what has gone before, they can use RETURNING clauses. With the "indeterminate order" proposal, they cannot. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, Nov 11, 2010 at 12:34:55PM -0500, Tom Lane wrote: > Thom Brown <thom@linux.com> writes: > > WITH t AS (UPDATE foo SET col = true) > > SELECT * FROM foo WHERE col = false; > > > ... Wouldn't this be more practical to have foo's UPDATEs applied > > prior to SELECT? Otherwise what would the usecase be? > > If that's what you want, you might as well just issue two separate > statements. There is no use-case for this at all unless the WITH > produces some RETURNING data that the SELECT makes use of. There are lots of use cases where it does exactly this. One simple example is maintaining a rollup table, so as less-rolled data get deleted, they get aggregated into an INSERT into that table. Think of RRDtool, only with a real data store. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Thu, Nov 11, 2010 at 12:36:38PM -0500, Merlin Moncure wrote: >> On Thu, Nov 11, 2010 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > then the conclusion is foregone. �To my mind, they should be thought of > as running in parallel, or at least in an indeterminate order, just > exactly the same way that different data modifications made in a single > INSERT/UPDATE/DELETE command are considered to be made simultaneously. >> >> +1 > -1. > When people want to see what has gone before, they can use RETURNING > clauses. With the "indeterminate order" proposal, they cannot. Say what? The RETURNING data is well defined in any case. regards, tom lane
On Thu, Nov 11, 2010 at 1:53 PM, David Fetter <david@fetter.org> wrote: > On Thu, Nov 11, 2010 at 12:36:38PM -0500, Merlin Moncure wrote: >> On Thu, Nov 11, 2010 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > then the conclusion is foregone. To my mind, they should be thought of >> > as running in parallel, or at least in an indeterminate order, just >> > exactly the same way that different data modifications made in a single >> > INSERT/UPDATE/DELETE command are considered to be made simultaneously. >> >> +1 > > -1. > > When people want to see what has gone before, they can use RETURNING > clauses. With the "indeterminate order" proposal, they cannot. If you want to see what happened 'before' you *must* use a returning clause. It's the link that pipelines data from one query to another. There is in fact no 'before', just a way to define hook output into input. ISTM you have a lot more available routes of CTE optimization if you go this way. but, can you present an example of a case that depends on execution order w/o returning? maybe I'm not seeing something... merlin
David Fetter <david@fetter.org> writes: > On Thu, Nov 11, 2010 at 12:34:55PM -0500, Tom Lane wrote: >> If that's what you want, you might as well just issue two separate >> statements. There is no use-case for this at all unless the WITH >> produces some RETURNING data that the SELECT makes use of. > There are lots of use cases where it does exactly this. Name *one*. If there is no RETURNING data, there is absolutely no reason to use WITH instead of issuing the query separately. In fact, I would assume that a DML query without RETURNING would not even be syntactically legal in WITH. > One simple > example is maintaining a rollup table, so as less-rolled data get > deleted, they get aggregated into an INSERT into that table. Yes, exactly. The way you would do that is something like WITH del AS (DELETE FROM foo WHERE whatever RETURNING *)INSERT INTO rollup SELECT * FROM del; I am very interested to see how you will do the same thing without using RETURNING and with the behavior you claim to want that the DELETE is visibly complete before the INSERT starts. Where's the INSERT gonna get the already-deleted data from? With my proposal (ie, both queries using same snapshot) you could actually do it without RETURNING, like this: WITH useless_cte AS (DELETE FROM foo WHERE whatever)INSERT INTO rollup SELECT * FROM foo WHERE same-whatever; But I don't see any reason to think that that's a superior way to write the query, especially since it might be subject to weird race conditions against other concurrent modifications of the table. RETURNING is just a lot saner way to be sure that you're looking at exactly what the DELETE deleted. regards, tom lane
On 2010-11-11 17:50, Marko Tiikkaja wrote: > Just to be clear, the main point is whether they see the data > modifications or not. The simplest case to point out this behaviour is: > > WITH t AS (DELETE FROM foo) > SELECT * FROM foo; > > And the big question is: what state of "foo" should the SELECT > statement see? Since t is not referenced in the query, foo should not be deleted at all, like WITH t AS (SELECT nextval('seq')) SELECT * FROM foo does not update the sequence. But if t is referenced.. WITH t AS (DELETE FROM foo RETURNING *) SELECT * FROM foo NATURAL JOIN t; Since the extension of t can only be known by deleting foo, it makes sense that this query cannot return rows. "Select the rows from foo that I just deleted." regards, Yeb Havinga
Yeb Havinga <yebhavinga@gmail.com> writes: > On 2010-11-11 17:50, Marko Tiikkaja wrote: >> Just to be clear, the main point is whether they see the data >> modifications or not. The simplest case to point out this behaviour is: >> >> WITH t AS (DELETE FROM foo) >> SELECT * FROM foo; >> >> And the big question is: what state of "foo" should the SELECT >> statement see? > Since t is not referenced in the query, foo should not be deleted at > all, Yeah, that's another interesting question: should we somehow force unreferenced CTEs to be evaluated anyhow? Now that I think about it, there was also some concern about the possibility of the outer query not reading the CTE all the way to the end, ie WITH t AS (DELETE FROM foo RETURNING *)SELECT * FROM t LIMIT 1; How many rows does this delete? I think we concluded that we should force the DELETE to be run to conclusion even if the outer query didn't read it all. From an implementation standpoint that makes it more attractive to do the DELETE first and stick its results in a tuplestore --- but I still think we should view that as an implementation detail, not as part of the specification. regards, tom lane
On Fri, Nov 12, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeb Havinga <yebhavinga@gmail.com> writes: >> On 2010-11-11 17:50, Marko Tiikkaja wrote: >>> Just to be clear, the main point is whether they see the data >>> modifications or not. The simplest case to point out this behaviour is: >>> >>> WITH t AS (DELETE FROM foo) >>> SELECT * FROM foo; >>> >>> And the big question is: what state of "foo" should the SELECT >>> statement see? > >> Since t is not referenced in the query, foo should not be deleted at >> all, > > Yeah, that's another interesting question: should we somehow force > unreferenced CTEs to be evaluated anyhow? Now that I think about it, > there was also some concern about the possibility of the outer query > not reading the CTE all the way to the end, ie > > WITH t AS (DELETE FROM foo RETURNING *) > SELECT * FROM t LIMIT 1; > > How many rows does this delete? I think we concluded that we should > force the DELETE to be run to conclusion even if the outer query didn't > read it all. From an implementation standpoint that makes it more > attractive to do the DELETE first and stick its results in a tuplestore > --- but I still think we should view that as an implementation detail, > not as part of the specification. Yeah, I think we have to force any DML statements in CTEs to run to completion, whether we need the results or not, and even if they are unreferenced. Otherwise it's going to be really confusing, I fear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Nov 12, 2010 at 10:25:51AM -0500, Tom Lane wrote: > Yeb Havinga <yebhavinga@gmail.com> writes: > > On 2010-11-11 17:50, Marko Tiikkaja wrote: > >> Just to be clear, the main point is whether they see the data > >> modifications or not. The simplest case to point out this behaviour is: > >> > >> WITH t AS (DELETE FROM foo) > >> SELECT * FROM foo; > >> > >> And the big question is: what state of "foo" should the SELECT > >> statement see? > > > Since t is not referenced in the query, foo should not be deleted at > > all, > > Yeah, that's another interesting question: should we somehow force > unreferenced CTEs to be evaluated anyhow? Yes. > Now that I think about it, > there was also some concern about the possibility of the outer query > not reading the CTE all the way to the end, ie > > WITH t AS (DELETE FROM foo RETURNING *) > SELECT * FROM t LIMIT 1; > > How many rows does this delete? I think we concluded that we should > force the DELETE to be run to conclusion even if the outer query didn't > read it all. Yes. > From an implementation standpoint that makes it more > attractive to do the DELETE first and stick its results in a tuplestore > --- but I still think we should view that as an implementation detail, > not as part of the specification. Right :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, Nov 12, 2010 at 10:50:52AM -0500, Robert Haas wrote: > On Fri, Nov 12, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Yeb Havinga <yebhavinga@gmail.com> writes: > >> On 2010-11-11 17:50, Marko Tiikkaja wrote: > >>> Just to be clear, the main point is whether they see the data > >>> modifications or not. The simplest case to point out this behaviour is: > >>> > >>> WITH t AS (DELETE FROM foo) > >>> SELECT * FROM foo; > >>> > >>> And the big question is: what state of "foo" should the SELECT > >>> statement see? > > > >> Since t is not referenced in the query, foo should not be deleted at > >> all, > > > > Yeah, that's another interesting question: should we somehow force > > unreferenced CTEs to be evaluated anyhow? Now that I think about it, > > there was also some concern about the possibility of the outer query > > not reading the CTE all the way to the end, ie > > > > WITH t AS (DELETE FROM foo RETURNING *) > > SELECT * FROM t LIMIT 1; > > > > How many rows does this delete? I think we concluded that we should > > force the DELETE to be run to conclusion even if the outer query didn't > > read it all. From an implementation standpoint that makes it more > > attractive to do the DELETE first and stick its results in a tuplestore > > --- but I still think we should view that as an implementation detail, > > not as part of the specification. > > Yeah, I think we have to force any DML statements in CTEs to run to > completion, whether we need the results or not, and even if they are > unreferenced. Otherwise it's going to be really confusing, I fear. Yes, and as we add more things--COPY is the first but probably not the last--to CTEs, this "no action-at-a-distance" behavior will become even more important. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2010/11/13 Robert Haas <robertmhaas@gmail.com>: > On Fri, Nov 12, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeb Havinga <yebhavinga@gmail.com> writes: >>> On 2010-11-11 17:50, Marko Tiikkaja wrote: >>>> Just to be clear, the main point is whether they see the data >>>> modifications or not. The simplest case to point out this behaviour is: >>>> >>>> WITH t AS (DELETE FROM foo) >>>> SELECT * FROM foo; >>>> >>>> And the big question is: what state of "foo" should the SELECT >>>> statement see? >> >>> Since t is not referenced in the query, foo should not be deleted at >>> all, >> >> Yeah, that's another interesting question: should we somehow force >> unreferenced CTEs to be evaluated anyhow? Now that I think about it, >> there was also some concern about the possibility of the outer query >> not reading the CTE all the way to the end, ie >> >> WITH t AS (DELETE FROM foo RETURNING *) >> SELECT * FROM t LIMIT 1; >> >> How many rows does this delete? I think we concluded that we should >> force the DELETE to be run to conclusion even if the outer query didn't >> read it all. From an implementation standpoint that makes it more >> attractive to do the DELETE first and stick its results in a tuplestore >> --- but I still think we should view that as an implementation detail, >> not as part of the specification. > > Yeah, I think we have to force any DML statements in CTEs to run to > completion, whether we need the results or not, and even if they are > unreferenced. Otherwise it's going to be really confusing, I fear. One thing that has annoyed me while designing this feature is if as Tom suggests the all queries are executed in the same snapshot and optimized as the current read-only CTE does we are tempted to support recursive and forward-reference in even DML CTE. It explodes out my head and I'd like not to think about it if we can. On the other hand, different-snapshot, serialized execution model occurs the problem I originally rose in the previous thread, in which the space to store the data shared among different plans is missing. It's of course doable, but the easier implementation the better. I'm inclined to agree with the same snapshot model, that is not only easier to implement but also fits the current SQL processing design and the existing CTE specification. Not only from the developer's view but consistency from user's view. Whatever the standard says on the DML *subquery*, we're going to create our new *CTE* feature. Yes, this is CTE. For recursive and forward-reference issue, we can just forbid them in DML CTE at first. Regards, -- Hitoshi Harada
Hi all, It appears that we have a consensus on the behaviour. I'm going to take some time off this weekend to get a patch with this behaviour to the next commitfest. Regards, Marko Tiikkaja
On Sat, Nov 13, 2010 at 01:50:46AM +0900, Hitoshi Harada wrote: > 2010/11/13 Robert Haas <robertmhaas@gmail.com>: > > On Fri, Nov 12, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Yeb Havinga <yebhavinga@gmail.com> writes: > >>> On 2010-11-11 17:50, Marko Tiikkaja wrote: > >>>> Just to be clear, the main point is whether they see the data > >>>> modifications or not. The simplest case to point out this behaviour is: > >>>> > >>>> WITH t AS (DELETE FROM foo) > >>>> SELECT * FROM foo; > >>>> > >>>> And the big question is: what state of "foo" should the SELECT > >>>> statement see? > >> > >>> Since t is not referenced in the query, foo should not be deleted at > >>> all, > >> > >> Yeah, that's another interesting question: should we somehow force > >> unreferenced CTEs to be evaluated anyhow? Now that I think about it, > >> there was also some concern about the possibility of the outer query > >> not reading the CTE all the way to the end, ie > >> > >> WITH t AS (DELETE FROM foo RETURNING *) > >> SELECT * FROM t LIMIT 1; > >> > >> How many rows does this delete? I think we concluded that we should > >> force the DELETE to be run to conclusion even if the outer query didn't > >> read it all. From an implementation standpoint that makes it more > >> attractive to do the DELETE first and stick its results in a tuplestore > >> --- but I still think we should view that as an implementation detail, > >> not as part of the specification. > > > > Yeah, I think we have to force any DML statements in CTEs to run to > > completion, whether we need the results or not, and even if they are > > unreferenced. Otherwise it's going to be really confusing, I fear. > > One thing that has annoyed me while designing this feature is if as > Tom suggests the all queries are executed in the same snapshot and > optimized as the current read-only CTE does we are tempted to > support recursive and forward-reference in even DML CTE. It > explodes out my head and I'd like not to think about it if we can. Does this have about the same head-explodiness as the mutually recursive CTEs described in the SQL standard? More? Less? > On the other hand, different-snapshot, serialized execution model > occurs the problem I originally rose in the previous thread, in which > the space to store the data shared among different plans is missing. > It's of course doable, but the easier implementation the better. > > I'm inclined to agree with the same snapshot model, that is not only > easier to implement but also fits the current SQL processing design > and the existing CTE specification. Not only from the developer's view > but consistency from user's view. Whatever the standard says on the > DML *subquery*, we're going to create our new *CTE* feature. Yes, this > is CTE. For recursive and forward-reference issue, we can just forbid > them in DML CTE at first. Sounds good :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Tom Lane <tgl@sss.pgh.pa.us> writes: > WITH t AS (DELETE FROM foo RETURNING *) > SELECT * FROM t LIMIT 1; > > How many rows does this delete? I think we concluded that we should > force the DELETE to be run to conclusion even if the outer query didn't > read it all The counter-example that jumps to mind is unix pipes. It's read-only at the consumer level but as soon as you stop reading, the producer stops. I guess that's only talking about the surprise factor, though. I'm not sure how far we go with the SIGPIPE analogy, but I wanted to say that maybe that would not feel so strange to some people if the DELETE were not run to completion but only until the reader is done. What about this one: WITH d AS (DELETE FROM foo RETURNING id), q AS (INSERT INTO queue SELECT 'D', id FROM d) SELECT * FROM q ORDER BY idLIMIT 10; For next example, replace INSERT with a MERGE to remove a previously existing 'I' or 'U' event in the queue when we add a 'D'. Bonus points if wCTE allows to implement the query without resorting to MERGE at all, which would be nice in my mind. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> WITH t AS (DELETE FROM foo RETURNING *) >> SELECT * FROM t LIMIT 1; >> >> How many rows does this delete? I think we concluded that we should >> force the DELETE to be run to conclusion even if the outer query didn't >> read it all > The counter-example that jumps to mind is unix pipes. It's read-only at > the consumer level but as soon as you stop reading, the producer stops. > I guess that's only talking about the surprise factor, though. > I'm not sure how far we go with the SIGPIPE analogy, but I wanted to say > that maybe that would not feel so strange to some people if the DELETE > were not run to completion but only until the reader is done. I can see that there's a fair argument for that position in cases like the above, but the trouble is that there are also cases where it's very hard for the user to predict how many rows will be read. As examples, mergejoins may stop short of reading all of one input depending on what the last key value is from the other, and semijoins or antijoins will stop whenenever they hit a match in the inner input. I think in the join cases we had better establish a simple rule "it'll get executed to completion". We could maybe do things differently if the outer query is non-join with a LIMIT, but that seems pretty inconsistent. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > I can see that there's a fair argument for that position in cases like > the above, but the trouble is that there are also cases where it's very > hard for the user to predict how many rows will be read. As examples, > mergejoins may stop short of reading all of one input depending on what > the last key value is from the other, and semijoins or antijoins will > stop whenenever they hit a match in the inner input. Oh. Indeed, I now understand what you mean by surprises. I keep forgetting that DML and JOINs can live together… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Excerpts from Dimitri Fontaine's message of vie nov 12 17:13:59 -0300 2010: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > WITH t AS (DELETE FROM foo RETURNING *) > > SELECT * FROM t LIMIT 1; > > > > How many rows does this delete? I think we concluded that we should > > force the DELETE to be run to conclusion even if the outer query didn't > > read it all > > The counter-example that jumps to mind is unix pipes. It's read-only at > the consumer level but as soon as you stop reading, the producer stops. > I guess that's only talking about the surprise factor, though. It's not that straighforward though, in that the producer could stop a bit ahead of what the consumer reads, due to there being a buffer in the middle. Witness this simple example $ cat > producer #!/bin/sh for i in `seq 1 1000`; do echo $i >> /tmp/mylog echo $i done $ chmod a+x producer $ ./producer | head -5 1 2 3 4 5 $ cat /tmp/mylog 1 2 3 4 5 6 7 I certainly wouldn't want our implementation to behave like this. > I'm not sure how far we go with the SIGPIPE analogy, but I wanted to say > that maybe that would not feel so strange to some people if the DELETE > were not run to completion but only until the reader is done. > > What about this one: > > WITH d AS (DELETE FROM foo RETURNING id), > q AS (INSERT INTO queue SELECT 'D', id FROM d) > SELECT * FROM q ORDER BY id LIMIT 10; Personally I find this one less surprising: WITH d AS (DELETE FROM foo LIMIT 10 RETURNING id), q AS (INSERT INTO queue SELECT 'D', id FROM d)SELECT * FROM q ORDERBY id; -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 2010-11-12 16:51, David Fetter wrote: > On Fri, Nov 12, 2010 at 10:25:51AM -0500, Tom Lane wrote: >> >> Yeah, that's another interesting question: should we somehow force >> unreferenced CTEs to be evaluated anyhow? > Yes. After a night's sleep I'm still thinking no. Arguments: 1) the name "Common Table Expression" suggests that t must be regarded as an expression, hence syntactically / proof theoretic and not as a table, set of rows / model theoretic. I.e. it is not a "Common Table". 2) The expressions can be referenced zero, one or more times. To me it therefore makes the most sense that a DML expressions that is defined but not references has no effect. Referenced once: run the plan once. Referenced again: run the plan again. What should the result be of WITH t AS (INSERT INTO foo SELECT nextval('seq') RETURNING *) SELECT * FROM t UNION SELECT * FROM t; 1 or 1,2 ? regards, Yeb Havinga
On Sat, Nov 13, 2010 at 02:28:35PM +0100, Yeb Havinga wrote: > On 2010-11-12 16:51, David Fetter wrote: > >On Fri, Nov 12, 2010 at 10:25:51AM -0500, Tom Lane wrote: > >> > >>Yeah, that's another interesting question: should we somehow force > >>unreferenced CTEs to be evaluated anyhow? > >Yes. > After a night's sleep I'm still thinking no. Arguments: > 1) the name "Common Table Expression" suggests that t must be > regarded as an expression, hence syntactically / proof theoretic and > not as a table, set of rows / model theoretic. I.e. it is not a > "Common Table". Disagree. A table never referred to in a query still exists. Similarly, if a normal CTE called a data-changing function but was nevertheless not referred to, it would still run. > 2) The expressions can be referenced zero, one or more times. To me > it therefore makes the most sense that a DML expressions that is > defined but not references has no effect. Referenced once: run the > plan once. Referenced again: run the plan again. No. When I designed this feature, it was precisely to take advantage of the "run exactly once" behavior of CTEs. Under no circumstances should we break this. > > What should the result be of > WITH t AS (INSERT INTO foo SELECT nextval('seq') RETURNING *) > SELECT * FROM t > UNION > SELECT * FROM t; > > 1 or 1,2 ? 1. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2010-11-13 14:41, David Fetter wrote: > On Sat, Nov 13, 2010 at 02:28:35PM +0100, Yeb Havinga wrote: >> 1) the name "Common Table Expression" suggests that t must be >> regarded as an expression, hence syntactically / proof theoretic and >> not as a table, set of rows / model theoretic. I.e. it is not a >> "Common Table". > Disagree. A table never referred to in a query still exists. > Similarly, if a normal CTE called a data-changing function but was > nevertheless not referred to, it would still run. with t as (select nextval('seq')) select 1; does not update the sequence. >> 2) The expressions can be referenced zero, one or more times. To me >> it therefore makes the most sense that a DML expressions that is >> defined but not references has no effect. Referenced once: run the >> plan once. Referenced again: run the plan again. > No. When I designed this feature, it was precisely to take advantage > of the "run exactly once" behavior of CTEs. Under no circumstances > should we break this. I found the pgday2009 presentation http://wiki.postgresql.org/images/c/c0/PGDay2009-EN-Writeable_CTEs_The_Next_Big_Thing.pdf - the IO minimization example is cool, and I now understand that it would be artificial if the CTE had to be referenced, for it to be executed. Makes sense. regards, Yeb Havinga
On 13 Nov 2010, at 15:41, David Fetter <david@fetter.org> wrote: > On Sat, Nov 13, 2010 at 02:28:35PM +0100, Yeb Havinga wrote: >> On 2010-11-12 16:51, David Fetter wrote: >>> On Fri, Nov 12, 2010 at 10:25:51AM -0500, Tom Lane wrote: >>>> >>>> Yeah, that's another interesting question: should we somehow force >>>> unreferenced CTEs to be evaluated anyhow? >>> Yes. >> After a night's sleep I'm still thinking no. Arguments: >> 1) the name "Common Table Expression" suggests that t must be >> regarded as an expression, hence syntactically / proof theoretic and >> not as a table, set of rows / model theoretic. I.e. it is not a >> "Common Table". > > Disagree. A table never referred to in a query still exists. > Similarly, if a normal CTE called a data-changing function but was > nevertheless not referred to, it would still run. Actually, it wouldn't. But if we make the behaviour of wCTEs hard(er) to predict, we are going to have a pretty bad feature in our hands. Let's not repeat our mistakes, please. Regards, Marko Tiikkaja
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > On 13 Nov 2010, at 15:41, David Fetter <david@fetter.org> wrote: >> Similarly, if a normal CTE called a data-changing function but was >> nevertheless not referred to, it would still run. > Actually, it wouldn't. Indeed, and that was considered a feature when we did it. I think that having wCTEs behave arbitrarily differently on this point might be a bad idea. regards, tom lane
On 2010-11-13 5:08 PM +0200, Tom Lane wrote: > Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi> writes: >> On 13 Nov 2010, at 15:41, David Fetter<david@fetter.org> wrote: >>> Similarly, if a normal CTE called a data-changing function but was >>> nevertheless not referred to, it would still run. > >> Actually, it wouldn't. > > Indeed, and that was considered a feature when we did it. I think > that having wCTEs behave arbitrarily differently on this point > might be a bad idea. So these queries would behave differently? WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 WHERE false; WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t LIMIT 0; Regards, Marko Tiikkaja
On Sat, 13 Nov 2010 17:23 +0200, "Marko Tiikkaja" wrote: > So these queries would behave differently? > > WITH t AS (DELETE FROM foo RETURNING *) > SELECT 1 WHERE false; > > WITH t AS (DELETE FROM foo RETURNING *) > SELECT 1 FROM t LIMIT 0; I'm still trying to wrap my head around this new mechanism. What would this return? UPDATE foo SET access = 0; WITH t AS (UPDATE foo SET access = access + 1 RETURNING *) SELECT x.access, y.accessFROM t CROSS JOIN t;
On 2010-11-13 5:36 PM +0200, Clark C. Evans wrote: > On Sat, 13 Nov 2010 17:23 +0200, "Marko Tiikkaja" wrote: >> So these queries would behave differently? >> >> WITH t AS (DELETE FROM foo RETURNING *) >> SELECT 1 WHERE false; >> >> WITH t AS (DELETE FROM foo RETURNING *) >> SELECT 1 FROM t LIMIT 0; > > I'm still trying to wrap my head around this > new mechanism. What would this return? > > UPDATE foo SET access = 0; > > WITH t AS (UPDATE foo SET access = access + 1 RETURNING *) > SELECT x.access, y.access > FROM t CROSS JOIN t; I'm assuming you forgot to give the tables aliases: WITH t AS (UPDATE foo SET access = access + 1 RETURNING *) SELECT x.access, y.access FROM t x CROSS JOIN t y; This would return n * n rows with values (1,1) where n is the number of rows in foo when the snapshot was taken. I.e. every row in foo would now have access=1. I'm also ignoring the possibility that someone modified the table between those two queries. Regards, Marko Tiikkaja
On Sat, Nov 13, 2010 at 05:23:34PM +0200, Marko Tiikkaja wrote: > On 2010-11-13 5:08 PM +0200, Tom Lane wrote: > >Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi> writes: > >>On 13 Nov 2010, at 15:41, David Fetter<david@fetter.org> wrote: > >>>Similarly, if a normal CTE called a data-changing function but > >>>was nevertheless not referred to, it would still run. > > > >>Actually, it wouldn't. > > > >Indeed, and that was considered a feature when we did it. I think > >that having wCTEs behave arbitrarily differently on this point > >might be a bad idea. > > So these queries would behave differently? > > WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 WHERE false; > > WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t LIMIT 0; No. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sat, Nov 13, 2010 at 03:23:42PM +0100, Yeb Havinga wrote: > On 2010-11-13 14:41, David Fetter wrote: > >On Sat, Nov 13, 2010 at 02:28:35PM +0100, Yeb Havinga wrote: > >>1) the name "Common Table Expression" suggests that t must be > >>regarded as an expression, hence syntactically / proof theoretic and > >>not as a table, set of rows / model theoretic. I.e. it is not a > >>"Common Table". > >Disagree. A table never referred to in a query still exists. > >Similarly, if a normal CTE called a data-changing function but was > >nevertheless not referred to, it would still run. > with t as (select nextval('seq')) > select 1; > > does not update the sequence. I think you've found a bug in the form of an over-aggressive optimization for the data-changing case. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Alvaro Herrera <alvherre@commandprompt.com> writes: > It's not that straighforward though, in that the producer could stop a > bit ahead of what the consumer reads, due to there being a buffer in the > middle. Witness this simple example Yeah, another example where the analogy fails for us. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 2010-11-12 8:25 PM +0200, I wrote: > I'm going to take some time off this weekend to get a patch with this > behaviour to the next commitfest. .. and a wild patch appears. This is almost exactly the patch from 2010-02 without CommandCounterIncrement()s. It's still a bit rough around the edges and needs some more comments, but I'm posting it here anyway. This patch passes all regression tests, but feel free to try to break it, there are probably ways to do that. This one also has the "always run DMLs to completion, and exactly once" behaviour. Regards, Marko Tiikkaja
Attachment
2010/11/14 Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>: > On 2010-11-12 8:25 PM +0200, I wrote: >> >> I'm going to take some time off this weekend to get a patch with this >> behaviour to the next commitfest. > > .. and a wild patch appears. > > This is almost exactly the patch from 2010-02 without > CommandCounterIncrement()s. It's still a bit rough around the edges and > needs some more comments, but I'm posting it here anyway. > > This patch passes all regression tests, but feel free to try to break it, > there are probably ways to do that. This one also has the "always run DMLs > to completion, and exactly once" behaviour. > Could you update wiki on this feature if you think we've reached the consensus? http://wiki.postgresql.org/wiki/WriteableCTEs Also, wrapping up the discussion like pros & cons on the different execution models helps not only the advance discussions but also reviews of this patch. Regards, -- Hitoshi Harada
On 2010-11-14 5:28 PM +0200, Hitoshi Harada wrote: > 2010/11/14 Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi>: >> .. and a wild patch appears. > > Could you update wiki on this feature if you think we've reached the consensus? You're probably referring to http://archives.postgresql.org/pgsql-hackers/2010-11/msg00660.php which was unfortunately just me talking too soon. There still doesn't appear to be a consensus on the difference (if any) between these queries: WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 LIMIT 0; -- unreferenced CTE WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t LIMIT 0; -- referenced, but not read WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t LIMIT 1; -- referenced, but only partly read WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t t1, t t2; -- referenced, read multiple times In my opinion, all of these should have the same effect: DELETE all rows from "foo". Any other option means we're going to have trouble predicting how a query is going to behave. As far as I know, we do have a consensus that the order of execution should be an implementation detail, and that the statements should always be executed in the exact same snapshot (i.e. no CID bump between). > Also, wrapping up the discussion like pros& cons on the different > execution models helps not only the advance discussions but also > reviews of this patch. Do you mean between the "execute in order, bump CID" and "execute in whatever order but to completion" behaviours? Regards, Marko Tiikkaja
On 2010-11-14 8:01 PM +0200, I wrote: > In my opinion, all of these should have the same effect: DELETE all rows > from "foo". Since the example wasn't entirely clear on this one: in my opinion the DML should also only be executed once. So: WITH t AS (INSERT INTO foo VALUES (0) RETURNING *) SELECT 1 FROM t t1, t t2; would only insert one row in any case. Regards, Marko Tiikkaja
On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > In my opinion, all of these should have the same effect: DELETE all rows > from "foo". Any other option means we're going to have trouble predicting > how a query is going to behave. I think it's clear that's the only sensible behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2010-11-14 19:35, Robert Haas wrote: > On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja > <marko.tiikkaja@cs.helsinki.fi> wrote: >> In my opinion, all of these should have the same effect: DELETE all rows >> from "foo". Any other option means we're going to have trouble predicting >> how a query is going to behave. > I think it's clear that's the only sensible behavior. What if CTE's ever get input parameters? regards, Yeb Havinga
On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote: > On 2010-11-14 19:35, Robert Haas wrote: >> On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja >> <marko.tiikkaja@cs.helsinki.fi> wrote: >>> In my opinion, all of these should have the same effect: DELETE all rows >>> from "foo". Any other option means we're going to have trouble predicting >>> how a query is going to behave. >> I think it's clear that's the only sensible behavior. > What if CTE's ever get input parameters? What about input parameters? Regards, Marko Tiikkaja
On Sun, Nov 14, 2010 at 1:51 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > On 2010-11-14 19:35, Robert Haas wrote: >> >> On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja >> <marko.tiikkaja@cs.helsinki.fi> wrote: >>> >>> In my opinion, all of these should have the same effect: DELETE all rows >>> from "foo". Any other option means we're going to have trouble >>> predicting >>> how a query is going to behave. >> >> I think it's clear that's the only sensible behavior. > > What if CTE's ever get input parameters? Then they'd be functions, which we already have. As Tom recently pointed out, you can even make them temporary with an explicit pg_temp schema qualification. Perhaps someday we'll have lambda-expressions, but I have no reason to believe that they'll use any of the wCTE syntax. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 14, 2010 at 08:07:22PM +0200, Marko Tiikkaja wrote: > On 2010-11-14 8:01 PM +0200, I wrote: > >In my opinion, all of these should have the same effect: DELETE all rows > >from "foo". > > Since the example wasn't entirely clear on this one: in my opinion > the DML should also only be executed once. So: > > WITH t AS (INSERT INTO foo VALUES (0) RETURNING *) > SELECT 1 FROM t t1, t t2; > > would only insert one row in any case. Right :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Nov 14, 2010 at 1:51 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> What if CTE's ever get input parameters? > Then they'd be functions, which we already have. If you mean something like prepare foo(int) as with x as (delete from tab where id = $1 returning *) insert into log_table select * from x; I don't see that the parameter makes things any less well-defined. If you mean a parameter in the sense of an executor parameter passed in from a surrounding nestloop, that'd scare me too --- but I thought we were going to disallow wCTEs except at the top level of a query, so the case wouldn't arise. regards, tom lane
On 2010-11-14 21:06, Marko Tiikkaja wrote: > On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote: >> On 2010-11-14 19:35, Robert Haas wrote: >>> On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja >>> <marko.tiikkaja@cs.helsinki.fi> wrote: >>>> In my opinion, all of these should have the same effect: DELETE all >>>> rows >>>> from "foo". Any other option means we're going to have trouble >>>> predicting >>>> how a query is going to behave. >>> I think it's clear that's the only sensible behavior. >> What if CTE's ever get input parameters? > > What about input parameters? With input parameters there is a clear link between a CTE and a caller. If a CTE is called more than once, it must be executed more than once, e.g. (notation t:x means cte has parameter x) WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *) SELECT (SELECT * FROM t(1)), (SELECT * FROM t(2)); runs the cte two times, hence two new rows in foo. But what about WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *) SELECT (SELECT t(1)), (SELECT t(1)); it would be strange to expect a single row in foo here, since the only thing different from the previous query is a constant value. Though I like the easyness of "run exactly once" for uncorrelated cte's, I still have the feeling that it somehow mixes the expression and operational realm. In logic there's a difference between a proposition and an assertion. With "run exactly once", stating a proposition is made synonymous to asserting it. That makes syntactic operations or rewriting of writable CTEs hard, if not impossible. For instance, variable substitution in the second example makes a CTE without parameters: WITH t' AS (INSERT INTO foo VALUES(1) RETURNING *), t'' AS AS (INSERT INTO foo VALUES(1) RETURNING *), SELECT (SELECT t'), (SELECT t''); since t' and t'' are equal, WITH t' AS (INSERT INTO foo VALUES(1) RETURNING *) SELECT (SELECT t'), (SELECT t'); A syntactic operation like this on the query should not result in a different operation when it's run. Hence two new rows in foo are still expected, but the "run exactly once" dictates one new row for that query. regards, Yeb Havinga
On tor, 2010-11-11 at 19:35 +0200, Marko Tiikkaja wrote: > I apologize, I had misunderstood what you are suggesting. But now > that I do, it seems to be an even worse idea to go your way. Based on > my research, I'm almost certain that the SQL standard says that the > execution order is deterministic if there is at least one DML > statement in the WITH list. > > Can anyone confirm this? SQL:2008 doesn't allow any DML in the WITH list. SQL:2011 has the "combined data store and retrieval" feature that was discussed in another thread which basically implements the same thing. They apparently avoid the whole issue by allowing only one data change delta table per query.
On Sun, Nov 14, 2010 at 11:02:08PM +0100, Yeb Havinga wrote: > On 2010-11-14 21:06, Marko Tiikkaja wrote: > >On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote: > >>On 2010-11-14 19:35, Robert Haas wrote: > >>>On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja > >>><marko.tiikkaja@cs.helsinki.fi> wrote: > >>>>In my opinion, all of these should have the same effect: > >>>>DELETE all rows > >>>>from "foo". Any other option means we're going to have > >>>>trouble predicting > >>>>how a query is going to behave. > >>>I think it's clear that's the only sensible behavior. > >>What if CTE's ever get input parameters? > > > >What about input parameters? > With input parameters there is a clear link between a CTE and a > caller. If a CTE is called more than once, it must be executed more > than once, e.g. (notation t:x means cte has parameter x) > > WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *) > SELECT (SELECT * FROM t(1)), (SELECT * FROM t(2)); > runs the cte two times, hence two new rows in foo. I think we can worry about that if we ever have run-time functions done as WITH, but I think they'd be a *much* better fit for DO. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Marko Tiikkaja wrote: > This is almost exactly the patch from 2010-02 without > CommandCounterIncrement()s. It's still a bit rough around the edges > and needs some more comments, but I'm posting it here anyway. > > This patch passes all regression tests, but feel free to try to break > it, there are probably ways to do that. This one also has the "always > run DMLs to completion, and exactly once" behaviour. So this patch was marked "Ready for Committer", but a) no committer has picked it up yet and b) Marko has made changes here that nobody else has tested out yet that I've seen on the last. Accordingly, that classification may have been optimistic. It seems to me that another testing run-through from someone like David might be appropriate to build some confidence this latest patch should be a commit candidate. If there is a committer intending to work on this as-is, they haven't identified themselves. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
On Sun, Dec 05, 2010 at 01:33:39PM -0500, Greg Smith wrote: > Marko Tiikkaja wrote: > >This is almost exactly the patch from 2010-02 without > >CommandCounterIncrement()s. It's still a bit rough around the > >edges and needs some more comments, but I'm posting it here > >anyway. > > > >This patch passes all regression tests, but feel free to try to > >break it, there are probably ways to do that. This one also has > >the "always run DMLs to completion, and exactly once" behaviour. > > So this patch was marked "Ready for Committer", but a) no committer > has picked it up yet and b) Marko has made changes here that nobody > else has tested out yet that I've seen on the last. Accordingly, > that classification may have been optimistic. It seems to me that > another testing run-through from someone like David might be > appropriate to build some confidence this latest patch should be a > commit candidate. If there is a committer intending to work on this > as-is, they haven't identified themselves. I've tested this one and not managed to break it. One thing it could use is support for EXPLAIN ANALYZE. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2010-12-08 10:19 AM +0200, David Fetter wrote: > On Sun, Dec 05, 2010 at 01:33:39PM -0500, Greg Smith wrote: >> So this patch was marked "Ready for Committer", but a) no committer >> has picked it up yet and b) Marko has made changes here that nobody >> else has tested out yet that I've seen on the last. Accordingly, >> that classification may have been optimistic. It seems to me that >> another testing run-through from someone like David might be >> appropriate to build some confidence this latest patch should be a >> commit candidate. If there is a committer intending to work on this >> as-is, they haven't identified themselves. > > I've tested this one and not managed to break it. One thing it could > use is support for EXPLAIN ANALYZE. What's wrong with EXPLAIN ANALYZE? Here's what I see: =# explain analyze with t as (insert into foo values(0) returning *) select * from t; QUERY PLAN -------------------------------------------------------------------------------------------------- CTE Scan on t (cost=0.01..0.03rows=1 width=4) (actual time=0.017..0.017 rows=1 loops=2) CTE t -> Insert (cost=0.00..0.01 rows=1 width=0) (actual time=0.029..0.030 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1) Total runtime: 0.104 ms (5 rows) Regards, Marko Tiikkaja
On Wed, Dec 08, 2010 at 01:23:59PM +0200, Marko Tiikkaja wrote: > On 2010-12-08 10:19 AM +0200, David Fetter wrote: > >On Sun, Dec 05, 2010 at 01:33:39PM -0500, Greg Smith wrote: > >>So this patch was marked "Ready for Committer", but a) no committer > >>has picked it up yet and b) Marko has made changes here that nobody > >>else has tested out yet that I've seen on the last. Accordingly, > >>that classification may have been optimistic. It seems to me that > >>another testing run-through from someone like David might be > >>appropriate to build some confidence this latest patch should be a > >>commit candidate. If there is a committer intending to work on this > >>as-is, they haven't identified themselves. > > > >I've tested this one and not managed to break it. One thing it could > >use is support for EXPLAIN ANALYZE. > > What's wrong with EXPLAIN ANALYZE? Here's what I see: Oops! I am terribly sorry. It was an earlier patch I didn't manage to break. I've tried all the same things on this one, and no breakage so far. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote: > On 2010-11-12 8:25 PM +0200, I wrote: > > I'm going to take some time off this weekend to get a patch with this > > behaviour to the next commitfest. > > .. and a wild patch appears. > > This is almost exactly the patch from 2010-02 without > CommandCounterIncrement()s. It's still a bit rough around the edges and > needs some more comments, but I'm posting it here anyway. To pick up an earlier thread again, has any serious thought been given to adapting the SQL2001/DB2 syntax instead of our own?
On Tue, Dec 21, 2010 at 11:14:31PM +0200, Peter Eisentraut wrote: > On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote: > > On 2010-11-12 8:25 PM +0200, I wrote: > > > I'm going to take some time off this weekend to get a patch with this > > > behaviour to the next commitfest. > > > > .. and a wild patch appears. > > > > This is almost exactly the patch from 2010-02 without > > CommandCounterIncrement()s. It's still a bit rough around the edges and > > needs some more comments, but I'm posting it here anyway. > > To pick up an earlier thread again, has any serious thought been given > to adapting the SQL2001/DB2 syntax instead of our own? Yes, and it's a good deal more limited and less intuitive than ours. This is one place where we got it right and the standard just got pushed into doing whatever IBM did. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote: > .. and a wild patch appears. > > This is almost exactly the patch from 2010-02 without > CommandCounterIncrement()s. It's still a bit rough around the edges > and > needs some more comments, but I'm posting it here anyway. Is this the patch of record? There are no changes to the documentation included.
On tis, 2010-12-21 at 13:20 -0800, David Fetter wrote: > On Tue, Dec 21, 2010 at 11:14:31PM +0200, Peter Eisentraut wrote: > > On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote: > > > On 2010-11-12 8:25 PM +0200, I wrote: > > > > I'm going to take some time off this weekend to get a patch with this > > > > behaviour to the next commitfest. > > > > > > .. and a wild patch appears. > > > > > > This is almost exactly the patch from 2010-02 without > > > CommandCounterIncrement()s. It's still a bit rough around the edges and > > > needs some more comments, but I'm posting it here anyway. > > > > To pick up an earlier thread again, has any serious thought been given > > to adapting the SQL2001/DB2 syntax instead of our own? > > Yes, and it's a good deal more limited and less intuitive than ours. Less intuitive, possibly, but how is it more limited?
On 2010-12-22 8:24 PM, Peter Eisentraut wrote: > On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote: >> .. and a wild patch appears. >> >> This is almost exactly the patch from 2010-02 without >> CommandCounterIncrement()s. It's still a bit rough around the edges >> and >> needs some more comments, but I'm posting it here anyway. > > Is this the patch of record? There are no changes to the documentation > included. I've kept the documentation as a separate patch, but I haven't touched it in a very long time. I will work on the documentation if there's a chance of the patch getting accepted for 9.1. This arrangement makes more sense to me and I'm sure others will agree. Regards, Marko Tiikkaja
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > On 2010-12-22 8:24 PM, Peter Eisentraut wrote: >> Is this the patch of record? There are no changes to the documentation >> included. > I've kept the documentation as a separate patch, but I haven't touched > it in a very long time. I will work on the documentation if there's a > chance of the patch getting accepted for 9.1. This arrangement makes > more sense to me and I'm sure others will agree. Well, it's difficult to review a documentation-free patch. regards, tom lane
On Wed, Dec 22, 2010 at 03:54:08PM -0500, Tom Lane wrote: > Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > > On 2010-12-22 8:24 PM, Peter Eisentraut wrote: > >> Is this the patch of record? There are no changes to the documentation > >> included. > > > I've kept the documentation as a separate patch, but I haven't touched > > it in a very long time. I will work on the documentation if there's a > > chance of the patch getting accepted for 9.1. This arrangement makes > > more sense to me and I'm sure others will agree. > > Well, it's difficult to review a documentation-free patch. Here's a document-included version :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachment
On 2010-12-23 6:55 PM +0200, David Fetter wrote: > Here's a document-included version :) And here's the latest version of the patch for the last commit fest for 9.1. I fixed an issue with the portal logic, and now we use PORTAL_ONE_RETURNING for wCTE queries, even if the main query is not a DML or does not have RETURNING. This also means that we materialize the results of the main query sometimes unnecessarily, but that doesn't look like an easy thing to fix. PORTAL_ONE_RETURNING as a name is also a bit misleading now, so maybe that needs changing.. Any feedback welcome. Regards, Marko Tiikkaja
Attachment
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > I fixed an issue with the portal logic, and now we use > PORTAL_ONE_RETURNING for wCTE queries, even if the main query is not a > DML or does not have RETURNING. This also means that we materialize the > results of the main query sometimes unnecessarily, but that doesn't look > like an easy thing to fix. PORTAL_ONE_RETURNING as a name is also a bit > misleading now, so maybe that needs changing.. Why is it necessary to hack the portal logic at all? The patch seems to work for me without that. (I've fixed quite a few bugs though, so maybe what this is really doing is masking a problem elsewhere.) Also, why are we forbidding wCTEs in cursors? Given the current definitions, that case seems to work fine too: the wCTEs will be executed as soon as you fetch something from the cursor. Are you just worried about not allowing a case that might be hard to support later? regards, tom lane
On 2011-02-25 1:36 AM, Tom Lane wrote: > Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi> writes: >> I fixed an issue with the portal logic, and now we use >> PORTAL_ONE_RETURNING for wCTE queries, even if the main query is not a >> DML or does not have RETURNING. This also means that we materialize the >> results of the main query sometimes unnecessarily, but that doesn't look >> like an easy thing to fix. PORTAL_ONE_RETURNING as a name is also a bit >> misleading now, so maybe that needs changing.. > > Why is it necessary to hack the portal logic at all? The patch seems to > work for me without that. (I've fixed quite a few bugs though, so maybe > what this is really doing is masking a problem elsewhere.) Without hacking it broke when PQdescribePrepared was called on a prepared query like: WITH t AS (DELETE FROM foo) SELECT 1; Not sure if that's an actual problem, but it seemed like something worht fixing. > Also, why are we forbidding wCTEs in cursors? Given the current > definitions, that case seems to work fine too: the wCTEs will be > executed as soon as you fetch something from the cursor. Are you > just worried about not allowing a case that might be hard to support > later? Honestly, I have no idea. It might be a leftover from the previous design. If it looks like it's easy to support, then go for it. Regards, Marko Tiikkaja
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > On 2011-02-25 1:36 AM, Tom Lane wrote: >> Why is it necessary to hack the portal logic at all? The patch seems to >> work for me without that. (I've fixed quite a few bugs though, so maybe >> what this is really doing is masking a problem elsewhere.) > Without hacking it broke when PQdescribePrepared was called on a > prepared query like: > WITH t AS (DELETE FROM foo) > SELECT 1; > Not sure if that's an actual problem, but it seemed like something worht > fixing. I can't replicate such a problem here --- do you have a concrete test case? ISTM the issue would only have been a problem back when you were trying to generate multiple PlannedStmts from a query like the above. The current implementation with everything in one plantree really ought to look just like a SELECT so far as the portal code is concerned. >> Also, why are we forbidding wCTEs in cursors? Given the current >> definitions, that case seems to work fine too: the wCTEs will be >> executed as soon as you fetch something from the cursor. Are you >> just worried about not allowing a case that might be hard to support >> later? > Honestly, I have no idea. It might be a leftover from the previous > design. If it looks like it's easy to support, then go for it. Right now I'm thinking that it is best to continue to forbid it. If we go over to the less-sequential implementation that I'm advocating in another thread, the timing of the updates would become a lot less predictable than I say above. If we refuse it for now, we can always remove the restriction later, but the other way is more painful. regards, tom lane
On 2011-02-25 6:12 PM, Tom Lane wrote: > Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi> writes: >> Without hacking it broke when PQdescribePrepared was called on a >> prepared query like: > >> WITH t AS (DELETE FROM foo) >> SELECT 1; > >> Not sure if that's an actual problem, but it seemed like something worht >> fixing. > > I can't replicate such a problem here --- do you have a concrete test > case? ISTM the issue would only have been a problem back when you > were trying to generate multiple PlannedStmts from a query like the > above. I don't have one right now (I lost the one I had because of a hardware failure in a virtual machine), but I can write you one if you want to. But see below. > The current implementation with everything in one plantree > really ought to look just like a SELECT so far as the portal code > is concerned. The problem was that the old code was using PORTAL_MULTI_QUERY whenever a wCTE was present. Are you saying that you are using PORTAL_ONE_SELECT? Doesn't that have problems with triggers, for example? >>> Also, why are we forbidding wCTEs in cursors? Given the current >>> definitions, that case seems to work fine too: the wCTEs will be >>> executed as soon as you fetch something from the cursor. Are you >>> just worried about not allowing a case that might be hard to support >>> later? > >> Honestly, I have no idea. It might be a leftover from the previous >> design. If it looks like it's easy to support, then go for it. > > Right now I'm thinking that it is best to continue to forbid it. > If we go over to the less-sequential implementation that I'm advocating > in another thread, the timing of the updates would become a lot less > predictable than I say above. If we refuse it for now, we can always > remove the restriction later, but the other way is more painful. Fair enough. Regards, Marko Tiikkaja
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > On 2011-02-25 6:12 PM, Tom Lane wrote: >> The current implementation with everything in one plantree >> really ought to look just like a SELECT so far as the portal code >> is concerned. > The problem was that the old code was using PORTAL_MULTI_QUERY whenever > a wCTE was present. Are you saying that you are using > PORTAL_ONE_SELECT? Doesn't that have problems with triggers, for example? Hmmm ... good question. I notice the lack of any regression test cases involving triggers. Will check this. regards, tom lane