Thread: Review of Writeable CTE Patch
Marko, Submission Review ----------------- *) patch applies clean to HEADwever *) applied tests ran ok *) there is some documentation adjustments in the patch. possibly a little underweight, but sufficient. *) A couple of very minor things aside, I think this should be accepted and passed for 8.5 although it could use another set of eyes from someone more comfortable with this part of the code. Usability Review ---------------- *) Works as advertised...'feels right'. Found only one small issue which Marko was already aware of and had adjusted for. *) No, we don't already have it. This is maybe the #2 most requested feature, after in-core replication. *) Yes it follows community-agreed behavior. Feature Test ------------ *) No build issues. *) The feature appears to work. Aside from the supplied tests, I tested with much larger tables (million records) and tables with triggers on them, sometimes in wacky combination: with f as (delete from foo returning *) insert into foo select * from f; with f as (insert into foo returning *) insert into foo select * from f; This threw an assertion failure: with recursive t as (insert into foo select * from t) values(true); TRAP: FailedAssertion("!(((((Node*)(stmt))->type) == T_SelectStmt))", File: "parse_cte.c", Line: 606) Marko was already aware of it and has a fix ready. *) I tested most of the error conditions and got them to fire. No unpleasant surprises. The cursor error ("Non-SELECT cursors are not implemented") is a little misleading. Perhaps it should read something like "Writeable CTE are not supported in cusor declaration" Performance Review ------------------ *) Everything ran exactly as it should. Huge updates rewritten as writeable CTE delete + insert are slightly slower than raw insert but this is expected. Coding Review ------------- *) A lot of the code is sgml/test/grammar changes that should be relatively uncontroversial. This is a small patch for what it does. *) Should canSetTag be passed as the first agument to (for example) in ExecInsert? Is this the proper way to be passing this information? *) execMain.c Line 1224 There is a blank code comment here. IMO this section needs to be documented better: the CommandCounterIncrement is a critical part of how this works. *) execMain.c Line 2033 The adjusted comment is confusing and seems to contradict itself. I would have wriiten: initialize them if they are not run in EvalPlanQual(). Aside: is this an optimization? *) CopySnapshot was promoted from static. Is this legal/good idea? Is a wrapper more appropriate? Architecture Review ------------------- *) Nothing jumped out at me. The changes are small, so it really comes down to if they were done properly/right spot. Review Review ------------- merlin
On Tue, Jan 26, 2010 at 9:13 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > *) Works as advertised...'feels right'. Found only one small issue > which Marko was already aware of and had adjusted for. [...] > Marko was already aware of it and has a fix ready. So it sounds like we should expect an updated patch shortly? ...Robert
On 2010-01-26 16:54, Robert Haas wrote: > On Tue, Jan 26, 2010 at 9:13 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> *) Works as advertised...'feels right'. Found only one small issue >> which Marko was already aware of and had adjusted for. > [...] >> Marko was already aware of it and has a fix ready. > > So it sounds like we should expect an updated patch shortly? Yes. I'm adding more comments and documentation, expect a patch within a couple of days. Regards, Marko Tiikkaja
Merlin Moncure escribió: > *) CopySnapshot was promoted from static. Is this legal/good idea? > Is a wrapper more appropriate? Hmm ... I wonder why isn't the patch doing RegisterSnapshot with the passed snapshot directly -- why is it necessary to create a new copy of it? (I notice that only one of the arms in that "if" creates a copy; if that is correct, I think it warrants a comment explaining why). If the copy is necessary (e.g. because the snapshot must not be modified externally, and there's actual risk that it is), then maybe it would be better to create a new function RegisterSnapshotCopy instead? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 2010-01-26 17:11, Alvaro Herrera wrote: > Merlin Moncure escribió: > >> *) CopySnapshot was promoted from static. Is this legal/good idea? >> Is a wrapper more appropriate? > > Hmm ... I wonder why isn't the patch doing RegisterSnapshot with the > passed snapshot directly -- why is it necessary to create a new copy of > it? (I notice that only one of the arms in that "if" creates a copy; > if that is correct, I think it warrants a comment explaining why). Per discussion here: http://archives.postgresql.org/pgsql-hackers/2009-11/msg01964.php the executor copies the snapshot if it plans on modifying it. A comment explaining this might be in order. > If the copy is necessary (e.g. because the snapshot must not be modified > externally, and there's actual risk that it is), then maybe it would be > better to create a new function RegisterSnapshotCopy instead? Sounds reasonable. Regards, Marko Tiikkaja
Hi, Here's an updated patch. This includes the fix mentioned earlier, some comment improvements and making CopySnapshot() static again. I also changed all references to this feature to "DML WITH" for consistency. I'm not sure if we want to keep it, but it should now be easier to change in the future. On 2010-01-26 16:13, Merlin Moncure wrote: > Marko was already aware of it and has a fix ready. This one includes it. > *) I tested most of the error conditions and got them to fire. No > unpleasant surprises. The cursor error ("Non-SELECT cursors are not > implemented") is a little misleading. Perhaps it should read > something like "Writeable CTE are not supported in cusor declaration" Ok, changed this one. > *) execMain.c Line 1224 > There is a blank code comment here. IMO this section needs to be > documented better: the > CommandCounterIncrement is a critical part of how this works. Added some more comments. > *) execMain.c Line 2033 > The adjusted comment is confusing and seems to contradict itself. I > would have wriiten: initialize them if they are not run in > EvalPlanQual(). Aside: is this an optimization? I tried, but I can't think of a better wording for that comment :-( This is not really an optimization, just doing the right thing. The performance difference here is negligible. > *) CopySnapshot was promoted from static. Is this legal/good idea? > Is a wrapper more appropriate? Added new function RegisterSnapshotCopy() per Alvaro's suggestion. Thanks a lot for reviewing! Regards, Marko Tiikkaja
Attachment
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > Here's an updated patch. This includes the fix mentioned earlier, some > comment improvements and making CopySnapshot() static again. I also > changed all references to this feature to "DML WITH" for consistency. > I'm not sure if we want to keep it, but it should now be easier to > change in the future. Hi, I'm reviewing the writable CTE patch. The code logic seems to be pretty good, but I have a couple of comments about error cases: * Did we have a consensus about user-visible "DML WITH" messages? The term is used in error messages in many places, forexample: "DML WITH without RETURNING is only allowed inside an unreferenced CTE" Since we don't use "DML WITH" nor "CTE"in documentation, I'd like to avoid such technical acronyms in logs if we had better names, or we should have a sectionto explain them in docs. * What can I do to get "Recursive DML WITH statements are not supported" message? I get syntax errors before I get the messagebecause We don't support UNIONs with RETURNING queries. Am I missing something? =# UPDATE tbl SET i = i + 1 WHERE i = 1 UNION ALL UPDATE tbl SET i = i + 1 WHERE i = 2; ERROR: syntax errorat or near "UNION" * The patch includes regression tests, but no error cases in it. More test cases are needed for stupid queries. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Hi, On 2010-02-03 11:04 UTC+2, Takahiro Itagaki wrote: > Hi, I'm reviewing the writable CTE patch. The code logic seems to be > pretty good, but I have a couple of comments about error cases: > > * Did we have a consensus about user-visible "DML WITH" messages? > The term is used in error messages in many places, for example: > "DML WITH without RETURNING is only allowed inside an unreferenced CTE" > Since we don't use "DML WITH" nor "CTE" in documentation, > I'd like to avoid such technical acronyms in logs if we had better names, > or we should have a section to explain them in docs. We have yet to reach a consensus on the name for this feature. I don't think we have any really good candidates, but I like "DML WITH" best so far. > * What can I do to get "Recursive DML WITH statements are not supported" > message? I get syntax errors before I get the message because We don't > support UNIONs with RETURNING queries. Am I missing something? > > =# UPDATE tbl SET i = i + 1 WHERE i = 1 > UNION ALL > UPDATE tbl SET i = i + 1 WHERE i = 2; > ERROR: syntax error at or near "UNION" WITH RECURSIVE t AS (INSERT INTO foo SELECT * FROM t) VALUES(true); would do that. You can do the same with UPDATE .. FROM and DELETE .. USING. > * The patch includes regression tests, but no error cases in it. > More test cases are needed for stupid queries. Ok, I'll add these and send an updated patch in a few hours. Regards, Marko Tiikkaja
Hi, On 2010-02-03 11:04, Takahiro Itagaki wrote: > * The patch includes regression tests, but no error cases in it. > More test cases are needed for stupid queries. Here's an updated patch. Regards, Marko Tiikkaja
Attachment
On Wed, Feb 3, 2010 at 4:09 AM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > Hi, > > On 2010-02-03 11:04 UTC+2, Takahiro Itagaki wrote: >> Hi, I'm reviewing the writable CTE patch. The code logic seems to be >> pretty good, but I have a couple of comments about error cases: >> >> * Did we have a consensus about user-visible "DML WITH" messages? >> The term is used in error messages in many places, for example: >> "DML WITH without RETURNING is only allowed inside an unreferenced CTE" >> Since we don't use "DML WITH" nor "CTE" in documentation, >> I'd like to avoid such technical acronyms in logs if we had better names, >> or we should have a section to explain them in docs. > > We have yet to reach a consensus on the name for this feature. I don't > think we have any really good candidates, but I like "DML WITH" best so far. Why can't we complain about the actual SQL statement the user issued? Like, say: INSERT requires RETURNING when used within a referenced CTE ...Robert
On Wed, Feb 3, 2010 at 5:31 AM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > On 2010-02-03 11:04, Takahiro Itagaki wrote: >> * The patch includes regression tests, but no error cases in it. >> More test cases are needed for stupid queries. > > Here's an updated patch. Some thoughts: The comments in standard_ExecutorStart() don't do a good job of explaining WHY the code does what it does - they just recapitulate what you can already see from reading the code. You say "If there are DML WITH statements, we always need to use the CID and copy the snapshot." That's self-evident from the following code. What's not clear is why this is necessary, and the comment doesn't make any attempt to explain it. The second half of the if statement has the same problem. In ExecTopLevelStmtIsReadOnly, you might perhaps want to rephase the comment in a way that doesn't use the word "Ehm." Like maybe: "Even if this function returns true, the statement might still contain INSERT, UPDATE, or DELETE statements within a CTE; we only check the top-level statement." Also, there should be a newline immediately before the function name, per our usual style conventions. InitPlan makes some references to "leader" scan states, but there's no explanation of what exactly those are. The comment in analyzeCTE that says "Many of these conditions are impossible given restrictions of the grammar, but check 'em anyway." makes less sense with this patch than it did formerly and may need to be rethought... and I'm not sure there's any reason to change this elog() an Assert. In both analyzeCTE() and checkWellFormedRecursion(), I don't like just removing the assertions there; we should try to assert something a bit more sensible, like maybe !IsA(cte->ctequery, Query). This patch removes a number of other assertions as well, but I don't know enough about those other spots to judge whether all of those cases are sensible. The only change to parse_relation.c is the addition of a #include; is this needed? The documentation changes for INSERT, UPDATE, and DELETE seem inadequate, because they add a reference to with_query with no corresponding explanation of what a with_query might be. The limitations of INSERT/UPDATE/DELETE-within-WITH should be documented somewhere: top level CTE only, and no DO ALSO or conditional DO INSTEAD rules. If we don't intend to remove this limitation in a future release, we should probably also document that.I believe there are some other caveats that we've discussedbefore, too, though I'm not sure if they're still true. Stuff like: - CTEs will be executed to completion in sequential order before the main statement begins execution - each CTE will see the results of CTEs already executed, and the main statement will see the results of all CTEs - but queries within each CTE still won't see their own updates (a reference to whatever section of the manual we talk about this in would probably be good) - possible pitfalls of CTEs not being pipelined ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Feb 3, 2010 at 4:09 AM, Marko Tiikkaja > <marko.tiikkaja@cs.helsinki.fi> wrote: >> We have yet to reach a consensus on the name for this feature. �I don't >> think we have any really good candidates, but I like "DML WITH" best so far. > Why can't we complain about the actual SQL statement the user issued? > Like, say: > INSERT requires RETURNING when used within a referenced CTE We could probably make that work for error messages, but what about documentation? It's going to be awkward to write something like "INSERT/UPDATE/DELETE RETURNING" every time we need to make a general statement about the behavior of all three. regards, tom lane
Hi, On 2010-02-03 16:09 UTC+2, Robert Haas wrote: > Why can't we complain about the actual SQL statement the user issued? > Like, say: > > INSERT requires RETURNING when used within a referenced CTE The SELECT equivalent of this query looks like this: => with recursive t as (select * from t) values(true); ERROR: recursive query "t" does not have the form non-recursive-term UNION [ALL] recursive-term but I didn't want to throw people off to think that they can use INSERT/UPDATE/RETURNING in a RECURSIVE CTE, just to get complaints about syntax error. Regards, Marko Tiikkaja
On Wed, Feb 3, 2010 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Feb 3, 2010 at 4:09 AM, Marko Tiikkaja >> <marko.tiikkaja@cs.helsinki.fi> wrote: >>> We have yet to reach a consensus on the name for this feature. I don't >>> think we have any really good candidates, but I like "DML WITH" best so far. > >> Why can't we complain about the actual SQL statement the user issued? >> Like, say: >> INSERT requires RETURNING when used within a referenced CTE > > We could probably make that work for error messages, but what about > documentation? It's going to be awkward to write something like > "INSERT/UPDATE/DELETE RETURNING" every time we need to make a general > statement about the behavior of all three. The current patch includes a total of 5 lines of text documenting this new feature (plus one example), so the issue doesn't really arise. If, as I believe, more documentation is needed, then we may need to think about how to handle this, but it's hard to speculate without a bit more context. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Feb 3, 2010 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We could probably make that work for error messages, but what about >> documentation? �It's going to be awkward to write something like >> "INSERT/UPDATE/DELETE RETURNING" every time we need to make a general >> statement about the behavior of all three. > The current patch includes a total of 5 lines of text documenting this > new feature (plus one example), so the issue doesn't really arise. Well, that's certainly not going to be nearly sufficient. I think what you meant is "Marko hasn't bothered with documentation". There is going to need to be discussion in the RULES chapter, in the page describing returned command tags, and probably six other places that aren't coming to me in the time it takes to type this sentence. regards, tom lane
On 2010-02-03 16:53 UTC+2, Robert Haas wrote: > Some thoughts: > > The comments in standard_ExecutorStart() don't do a good job of > explaining WHY the code does what it does - they just recapitulate > what you can already see from reading the code. You say "If there are > DML WITH statements, we always need to use the CID and copy the > snapshot." That's self-evident from the following code. What's not > clear is why this is necessary, and the comment doesn't make any > attempt to explain it. The second half of the if statement has the > same problem. Ok, I'll try to make this more clear. > In ExecTopLevelStmtIsReadOnly, you might perhaps want to rephase the > comment in a way that doesn't use the word "Ehm." Like maybe: "Even > if this function returns true, the statement might still contain > INSERT, > UPDATE, or DELETE statements within a CTE; we only check the top-level > statement." Also, there should be a newline immediately before the > function name, per our usual style conventions. That comment tries to emphasize the fact that I can't think of any reasonable name for that particular function. If the name looks OK, I can update the comment. > The comment in analyzeCTE that says "Many of these conditions are > impossible given restrictions of the grammar, but check 'em anyway." > makes less sense with this patch than it did formerly and may need to > be rethought... and I'm not sure there's any reason to change this > elog() an Assert. Ok, I'll look at this. > In both analyzeCTE() and checkWellFormedRecursion(), I don't like just > removing the assertions there; we should try to assert something a bit > more sensible, like maybe !IsA(cte->ctequery, Query). This patch > removes a number of other assertions as well, but I don't know enough > about those other spots to judge whether all of those cases are > sensible. I'll look through these again. > The only change to parse_relation.c is the addition of a #include; is > this needed? No, I thought I had removed that long time ago. Will remove. > The documentation changes for INSERT, UPDATE, and DELETE seem > inadequate, because they add a reference to with_query with no > corresponding explanation of what a with_query might be. Ok, I'll add this. > The limitations of INSERT/UPDATE/DELETE-within-WITH should be > documented somewhere: top level CTE only, and no DO ALSO or > conditional DO INSTEAD rules. If we don't intend to remove this > limitation in a future release, we should probably also document that. > I believe there are some other caveats that we've discussed before, > too, though I'm not sure if they're still true. Stuff like: > > - CTEs will be executed to completion in sequential order before the > main statement begins execution > - each CTE will see the results of CTEs already executed, and the main > statement will see the results of all CTEs > - but queries within each CTE still won't see their own updates (a > reference to whatever section of the manual we talk about this in > would probably be good) > - possible pitfalls of CTEs not being pipelined Right. The documentation in its current state is definitely lacking. I've tried to focus all the time I have in making this patch technically good. Regards, Marko Tiikkaja
On Wed, Feb 3, 2010 at 11:18 AM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: > On 2010-02-03 16:53 UTC+2, Robert Haas wrote: >> Some thoughts: >> >> The comments in standard_ExecutorStart() don't do a good job of >> explaining WHY the code does what it does - they just recapitulate >> what you can already see from reading the code. You say "If there are >> DML WITH statements, we always need to use the CID and copy the >> snapshot." That's self-evident from the following code. What's not >> clear is why this is necessary, and the comment doesn't make any >> attempt to explain it. The second half of the if statement has the >> same problem. > > > Right. The documentation in its current state is definitely lacking. > I've tried to focus all the time I have in making this patch technically > good. Outside of documentation issues, where do we stand? Do you need help with the documentation? merlin
On 2010-02-03 18:41 UTC+2, Merlin Moncure wrote: > On Wed, Feb 3, 2010 at 11:18 AM, Marko Tiikkaja > <marko.tiikkaja@cs.helsinki.fi> wrote: >> Right. The documentation in its current state is definitely lacking. >> I've tried to focus all the time I have in making this patch technically >> good. > Do you need help with the documentation? I'm going to work on the documentation tonight, but it will probably need some work from a native English speaker after I'm done. Regards, Marko Tiikkaja
On Wed, Feb 3, 2010 at 11:18 AM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: >> In ExecTopLevelStmtIsReadOnly, you might perhaps want to rephase the >> comment in a way that doesn't use the word "Ehm." Like maybe: "Even >> if this function returns true, the statement might still contain >> INSERT, >> UPDATE, or DELETE statements within a CTE; we only check the top-level >> statement." Also, there should be a newline immediately before the >> function name, per our usual style conventions. > > That comment tries to emphasize the fact that I can't think of any > reasonable name for that particular function. If the name looks OK, I > can update the comment. Name seems fine. Just fix the comment. >> The limitations of INSERT/UPDATE/DELETE-within-WITH should be >> documented somewhere: top level CTE only, and no DO ALSO or >> conditional DO INSTEAD rules. If we don't intend to remove this >> limitation in a future release, we should probably also document that. >> I believe there are some other caveats that we've discussed before, >> too, though I'm not sure if they're still true. Stuff like: >> >> - CTEs will be executed to completion in sequential order before the >> main statement begins execution >> - each CTE will see the results of CTEs already executed, and the main >> statement will see the results of all CTEs >> - but queries within each CTE still won't see their own updates (a >> reference to whatever section of the manual we talk about this in >> would probably be good) >> - possible pitfalls of CTEs not being pipelined > > Right. The documentation in its current state is definitely lacking. > I've tried to focus all the time I have in making this patch technically > good. Well, technically good is certainly a good place to start. :-) Of course, we need the docs, too. Thanks for your work on this. ...Robert
Hi, Attached is an updated patch. I'm now going to start working on the documentation and I'll send it in a separate patch a bit later. On 2010-02-03 16:53 UTC+2, Robert Haas wrote: > Some thoughts: > This patch > removes a number of other assertions as well, but I don't know enough > about those other spots to judge whether all of those cases are > sensible. I put back the Asserts in make_modifytable(). The one in ExecInitModifyTable() is not true any more and neither is the Assert in transformInsertStatement(). Regards, Marko Tiikkaja