Thread: Writeable CTEs documentation patch
Hi, Attached is a documentation patch for writeable CTEs. Most of it is explaining how this feature works in select.sgml. I wasn't sure if that's the right place, but couldn't find a better one. I also wasn't able to find any place discussing the command tag, other than libpq's documentation. Is there one somewhere? While working on the docs, I noticed one problem with the patch itself: it doesn't handle multi-statement DO INSTEAD rules correctly. I'm going to submit a fix for that later. Any suggestions, whatsoever, are welcome. Regards, Marko Tiikkaja
Attachment
Hi, On 2010-02-04 18:04 UTC+2, I wrote: > While working on the docs, I noticed one problem with the patch itself: > it doesn't handle multi-statement DO INSTEAD rules correctly. I'm going > to submit a fix for that later. Here's an updated patch. Only changes from the previous patch are fixing the above issue and a regression test for it. Regards, Marko Tiikkaja
Attachment
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > Here's an updated patch. Only changes from the previous patch are > fixing the above issue and a regression test for it. A brief report for of the patch: * The patch has the following error cases, and also have one regression test for each case. - DML WITH is not allowed in a cursor declaration- DML WITH is not allowed in a view definition- DML WITH without RETURNINGis only allowed inside an unreferenced CTE- DML WITH is only allowed at the top level- Recursive DML WITH statementsare not supported ^-- might be better if "DML WITH cannot have the self-reference" or so? - Conditional DO INSTEAD rules are not supported in DML WITH statements- DO ALSO rules are not supported in DML WITH statements-Multi-statement DO INSTEAD rules are not supported in DML WITH statements- DO INSTEAD NOTHING rules are not supportedin DML WITH statements * In the regression tests, almost all of them don't have ORDER BY clause. They just work, but we might need ORDER BY to getrobust output. What did we do in other regression tests? * I feel odd the following paragraph in the docs, but should be checked by native English speakers. *** a/doc/src/sgml/ref/create_rule.sgml --- b/doc/src/sgml/ref/create_rule.sgml *************** *** 222,227 **** CREATE [ OR REPLACE ] RULE <replaceable class="parameter">name</replaceable> AS --- 222,234 ---- </para> <para> + In an <literal>INSERT</literal>, <literal>UPDATE</literal> or + <literal>DELETE</literal> query within a <literal>WITH</literal> clause, + only unconditional, single-statement <literal>INSTEAD</literal> rules are ^-- and? which commais the sentence separator? + implemented. ^-- might be "available" rather than "implemented"? + </para> Regards, --- Takahiro Itagaki NTT Open Source Software Center
On 2010-02-05 07:14 UTC+2, Takahiro Itagaki wrote: > > Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > >> Here's an updated patch. Only changes from the previous patch are >> fixing the above issue and a regression test for it. > > * In the regression tests, almost all of them don't have ORDER BY clause. > They just work, but we might need ORDER BY to get robust output. > What did we do in other regression tests? Looking at with.sql, it seems to use ORDER BY when it accesses data from a table. But obviously we can't do this if want to test INSERT/UPDATE/DELETE .. RETURNING at the top level and returning.sql seems to be relying on the fact that they come out in the same order every time. Regards, Marko Tiikkaja
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: > * In the regression tests, almost all of them don't have ORDER BY clause. > They just work, but we might need ORDER BY to get robust output. > What did we do in other regression tests? We add ORDER BY only when experience shows it's necessary. The reasoning is explained in regress.sgml: You might wonder why we don't order all the regression test queries explicitly to get rid of this issue once and for all. The reason is that that would make the regression tests less useful, not more, since they'd tend to exercise query plan types that produce ordered results to the exclusion of those that don't. regards, tom lane
On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > On 2010-02-04 18:04 UTC+2, I wrote: >> While working on the docs, I noticed one problem with the patch itself: >> it doesn't handle multi-statement DO INSTEAD rules correctly. I'm going >> to submit a fix for that later. > > Here's an updated patch. Only changes from the previous patch are > fixing the above issue and a regression test for it. The comments on the parts I asked about before are much better in this version. A few other things that I notice: - I'm not sure that canSetTag is the right name for the additional argument to ExecInsert/ExecUpdate/ExecDelete. OTOH, I'm not sure it's the wrong name either. But should we use something like isTopLevelQuery? - It appears that we pull out all of the DML statements first and run them in order, but I'm not sure that's the right thing to do. Consider: WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ... I would assume we would do x, CCI, do y, do z, CCI, do main query, but I don't think that's what this implements. The user might be surprised to find out that y sees the effects of z. - I think that the comment in analyzeCTE that says /* Check that we got something reasonable */ could be fleshed out a bit. You could still reference transformRangeSubselect, for example, but then explain why the checks here are different (viz, CTEs can contain DML). - The comment for RegisterSnapshotCopy identifies the function name as RegisterSnapshot; I think this is a copy-and-pasteo. - It seems like the gram.y changes for common_table_expr might benefit from some factoring; that is, create a production (or find a suitable existing one) for "statements of the sort that can appear within CTEs", and then use that in common_table_expr. Or maybe this doesn't work; I haven't tried it. - I still don't much like the idea of using DML WITH in error messages. One idea I had (which might suck, but I'm just throwing it out there) is to change hasDmlWith to an integer bitmap with a bit for each of insert, update, and delete. But it may be better still to just rephrase the error messages. Could we just write, e.g. "non-SELECT statements are not allowed within a cursor declaration?" Or we could say "INSERT, UPDATE, and DELETE statements are not allowed within a cursor declaration", but I'm thinking we may want to allow things like COPY and EXPLAIN inside CTEs in the future, too, and they'll presumably be treated similarly to DML. For the record, Tom or whoever should feel to swoop in here at any time, or add to any of this. I'm just making suggestions until the big guns show up. ...Robert
On 2010-02-08 18:42 +0200, Robert Haas wrote: > On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja >> Here's an updated patch. Only changes from the previous patch are >> fixing the above issue and a regression test for it. > > - I'm not sure that canSetTag is the right name for the additional > argument to ExecInsert/ExecUpdate/ExecDelete. OTOH, I'm not sure it's > the wrong name either. But should we use something like > isTopLevelQuery? No objection to changing that. > - It appears that we pull out all of the DML statements first and run > them in order, but I'm not sure that's the right thing to do. > Consider: > > WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ... > > I would assume we would do x, CCI, do y, do z, CCI, do main query, but > I don't think that's what this implements. The user might be > surprised to find out that y sees the effects of z. Hmm. Right. That sounds like the right thing to do. Another option (which I seem to recall we've discussed before) is to not allow any SELECT statements between DML WITHs, but I think this is what we should go for. > - I think that the comment in analyzeCTE that says /* Check that we > got something reasonable */ could be fleshed out a bit. You could > still reference transformRangeSubselect, for example, but then explain > why the checks here are different (viz, CTEs can contain DML). Ok, I'll look into that. > - The comment for RegisterSnapshotCopy identifies the function name as > RegisterSnapshot; I think this is a copy-and-pasteo. You're right. Will fix. > - It seems like the gram.y changes for common_table_expr might benefit > from some factoring; that is, create a production (or find a suitable > existing one) for "statements of the sort that can appear within > CTEs", and then use that in common_table_expr. Or maybe this doesn't > work; I haven't tried it. My bison-fu is not exactly strong, but I can look at the feasibility of that. > - I still don't much like the idea of using DML WITH in error > messages. One idea I had (which might suck, but I'm just throwing it > out there) is to change hasDmlWith to an integer bitmap with a bit for > each of insert, update, and delete. But it may be better still to > just rephrase the error messages. I don't see how that would work. We'd still potentially have many different types of DML operations to deal with and that wouldn't help at all at distinguishing which operation actually caused the error. Or did I misunderstand? > Could we just write, e.g. > "non-SELECT statements are not allowed within a cursor declaration?" > Or we could say "INSERT, UPDATE, and DELETE statements are not allowed > within a cursor declaration", but I'm thinking we may want to allow > things like COPY and EXPLAIN inside CTEs in the future, too, and > they'll presumably be treated similarly to DML. "INSERT, UPDATE and DELETE" is quite long and "non-SELECT" is a bit clumsy IMO. But I don't really have anything better to offer, either. Regards, Marko Tiikkaja
On Mon, Feb 8, 2010 at 1:01 PM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: >> Could we just write, e.g. >> "non-SELECT statements are not allowed within a cursor declaration?" >> Or we could say "INSERT, UPDATE, and DELETE statements are not allowed >> within a cursor declaration", but I'm thinking we may want to allow >> things like COPY and EXPLAIN inside CTEs in the future, too, and >> they'll presumably be treated similarly to DML. > > "INSERT, UPDATE and DELETE" is quite long and "non-SELECT" is a bit > clumsy IMO. But I don't really have anything better to offer, either. Yeah, I don't feel good about "INSERT, UPDATE, and DELETE" because in most of the relevant contexts the list might get longer if in the future we allow things like EXPLAIN and COPY within CTEs. I think "Non-SELECT statement" is reasonably clear, though; people might not know which things are statements, but the message implies that SELECT is one such thing, and not the one that's the problem, which should get them pointed in the right direction. ...Robert
Robert Haas escribió: > Yeah, I don't feel good about "INSERT, UPDATE, and DELETE" because in > most of the relevant contexts the list might get longer if in the > future we allow things like EXPLAIN and COPY within CTEs. I think > "Non-SELECT statement" is reasonably clear, though; people might not > know which things are statements, but the message implies that SELECT > is one such thing, and not the one that's the problem, which should > get them pointed in the right direction. "DML statements other than SELECT" perhaps? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Robert Haas escribió: > On Mon, Feb 8, 2010 at 1:01 PM, Marko Tiikkaja > <marko.tiikkaja@cs.helsinki.fi> wrote: > >> Could we just write, e.g. > >> "non-SELECT statements are not allowed within a cursor declaration?" > >> Or we could say "INSERT, UPDATE, and DELETE statements are not allowed > >> within a cursor declaration", but I'm thinking we may want to allow > >> things like COPY and EXPLAIN inside CTEs in the future, too, and > >> they'll presumably be treated similarly to DML. > > > > "INSERT, UPDATE and DELETE" is quite long and "non-SELECT" is a bit > > clumsy IMO. But I don't really have anything better to offer, either. > > Yeah, I don't feel good about "INSERT, UPDATE, and DELETE" because in > most of the relevant contexts the list might get longer if in the > future we allow things like EXPLAIN and COPY within CTEs. I think > "Non-SELECT statement" is reasonably clear, though; people might not > know which things are statements, but the message implies that SELECT > is one such thing, and not the one that's the problem, which should > get them pointed in the right direction. Hmm, how about VALUES? Isn't that a statement on its own right, that would similarly unaffected? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Feb 8, 2010 at 3:30 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Robert Haas escribió: >> On Mon, Feb 8, 2010 at 1:01 PM, Marko Tiikkaja >> <marko.tiikkaja@cs.helsinki.fi> wrote: >> >> Could we just write, e.g. >> >> "non-SELECT statements are not allowed within a cursor declaration?" >> >> Or we could say "INSERT, UPDATE, and DELETE statements are not allowed >> >> within a cursor declaration", but I'm thinking we may want to allow >> >> things like COPY and EXPLAIN inside CTEs in the future, too, and >> >> they'll presumably be treated similarly to DML. >> > >> > "INSERT, UPDATE and DELETE" is quite long and "non-SELECT" is a bit >> > clumsy IMO. But I don't really have anything better to offer, either. >> >> Yeah, I don't feel good about "INSERT, UPDATE, and DELETE" because in >> most of the relevant contexts the list might get longer if in the >> future we allow things like EXPLAIN and COPY within CTEs. I think >> "Non-SELECT statement" is reasonably clear, though; people might not >> know which things are statements, but the message implies that SELECT >> is one such thing, and not the one that's the problem, which should >> get them pointed in the right direction. > > Hmm, how about VALUES? Isn't that a statement on its own right, that > would similarly unaffected? Ouch. You're right, that's a problem. :-( TABLE is a similar case. ...Robert
On 2010-02-08 18:42 +0200, Robert Haas wrote: > On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja > <marko.tiikkaja@cs.helsinki.fi> wrote: >> Here's an updated patch. Only changes from the previous patch are >> fixing the above issue and a regression test for it. > > - I'm not sure that canSetTag is the right name for the additional > argument to ExecInsert/ExecUpdate/ExecDelete. OTOH, I'm not sure it's > the wrong name either. But should we use something like > isTopLevelQuery? I'm going to have to take back my previous statement; this doesn't make a lot of sense in the case of DO ALSO rules (or multiple statements in a DO INSTEAD RULE). Those will have canSetTag=false, but they will be at the top level. > - It appears that we pull out all of the DML statements first and run > them in order, but I'm not sure that's the right thing to do. > Consider: > > WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ... > > I would assume we would do x, CCI, do y, do z, CCI, do main query, but > I don't think that's what this implements. The user might be > surprised to find out that y sees the effects of z. I went ahead and implemented this, but there seems to be one small problem: RECURSIVE. If there is a recursive query between those, it might loop forever even if the top-level SELECT only wanted to see a few rows from it. The docs already discourage writing recursive ctes like that, but still this is a small caveat. > - It seems like the gram.y changes for common_table_expr might benefit > from some factoring; that is, create a production (or find a suitable > existing one) for "statements of the sort that can appear within > CTEs", and then use that in common_table_expr. Or maybe this doesn't > work; I haven't tried it. This seems to work. I used PreparableStmt, but I'm not sure how good idea that really is. Maybe I should create a new one? Regards, Marko Tiikkaja
On Tue, Feb 9, 2010 at 3:13 PM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > On 2010-02-08 18:42 +0200, Robert Haas wrote: >> On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja >> <marko.tiikkaja@cs.helsinki.fi> wrote: >>> Here's an updated patch. Only changes from the previous patch are >>> fixing the above issue and a regression test for it. >> >> - I'm not sure that canSetTag is the right name for the additional >> argument to ExecInsert/ExecUpdate/ExecDelete. OTOH, I'm not sure it's >> the wrong name either. But should we use something like >> isTopLevelQuery? > > I'm going to have to take back my previous statement; this doesn't make > a lot of sense in the case of DO ALSO rules (or multiple statements in a > DO INSTEAD RULE). Those will have canSetTag=false, but they will be at > the top level. Ah. OK. >> - It appears that we pull out all of the DML statements first and run >> them in order, but I'm not sure that's the right thing to do. >> Consider: >> >> WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ... >> >> I would assume we would do x, CCI, do y, do z, CCI, do main query, but >> I don't think that's what this implements. The user might be >> surprised to find out that y sees the effects of z. > > I went ahead and implemented this, but there seems to be one small > problem: RECURSIVE. If there is a recursive query between those, it > might loop forever even if the top-level SELECT only wanted to see a few > rows from it. The docs already discourage writing recursive ctes like > that, but still this is a small caveat. Doesn't seem like a big problem to me. >> - It seems like the gram.y changes for common_table_expr might benefit >> from some factoring; that is, create a production (or find a suitable >> existing one) for "statements of the sort that can appear within >> CTEs", and then use that in common_table_expr. Or maybe this doesn't >> work; I haven't tried it. > > This seems to work. I used PreparableStmt, but I'm not sure how good > idea that really is. Maybe I should create a new one? If it covers the same territory, I wouldn't duplicate it just for fun.Someone might need to split it out in the future, butthat's not a reason to do it now. ...Robert
On 2010-02-08 18:42 +0200, Robert Haas wrote: > On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja > <marko.tiikkaja@cs.helsinki.fi> wrote: >> On 2010-02-04 18:04 UTC+2, I wrote: >>> While working on the docs, I noticed one problem with the patch itself: >>> it doesn't handle multi-statement DO INSTEAD rules correctly. I'm going >>> to submit a fix for that later. >> >> Here's an updated patch. Only changes from the previous patch are >> fixing the above issue and a regression test for it. > > - It appears that we pull out all of the DML statements first and run > them in order, but I'm not sure that's the right thing to do. > Consider: > > WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ... > > I would assume we would do x, CCI, do y, do z, CCI, do main query, but > I don't think that's what this implements. The user might be > surprised to find out that y sees the effects of z. I've updated the patch according to what I said here: http://archives.postgresql.org/pgsql-hackers/2010-02/msg00722.php I haven't done any extensive testing, but it seems to work in the most common cases. Regards, Marko Tiikkaja
Attachment
On 2010-02-08 18:42 +0200, Robert Haas wrote: > On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja > <marko.tiikkaja@cs.helsinki.fi> wrote: >> On 2010-02-04 18:04 UTC+2, I wrote: >>> While working on the docs, I noticed one problem with the patch itself: >>> it doesn't handle multi-statement DO INSTEAD rules correctly. I'm going >>> to submit a fix for that later. >> >> Here's an updated patch. Only changes from the previous patch are >> fixing the above issue and a regression test for it. > > - It appears that we pull out all of the DML statements first and run > them in order, but I'm not sure that's the right thing to do. > Consider: > > WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ... > > I would assume we would do x, CCI, do y, do z, CCI, do main query, but > I don't think that's what this implements. The user might be > surprised to find out that y sees the effects of z. I've updated the patch according to what I said here: http://archives.postgresql.org/pgsql-hackers/2010-02/msg00722.php I haven't done any extensive testing, but it seems to work in the most common cases. Regards, Marko Tiikkaja