Thread: Review of Writeable CTE Patch

Review of Writeable CTE Patch

From
Merlin Moncure
Date:
Marko,

Submission Review
-----------------
*) patch applies clean to HEADwever

*) applied tests ran ok

*) there is some documentation adjustments in the patch. possibly a
little underweight, but sufficient.

*) A couple of very minor things aside, I think this should be
accepted and passed for 8.5  although it could use another set of eyes
from someone more comfortable with this part of the code.

Usability Review
----------------
*) Works as advertised...'feels right'.  Found only one small issue
which Marko was already aware of and had adjusted for.

*) No, we don't already have it.  This is maybe the #2 most requested
feature, after in-core replication.

*) Yes it follows community-agreed behavior.

Feature Test
------------
*) No build issues.

*) The feature appears to work. Aside from the supplied tests, I
tested with much larger tables (million records) and tables with
triggers on them, sometimes in wacky combination:

with f as (delete from foo returning *) insert into foo select * from f;
with f as (insert into foo returning *) insert into foo select * from f;

This threw an assertion failure:
with recursive t as (insert into foo select * from t) values(true);

TRAP: FailedAssertion("!(((((Node*)(stmt))->type) == T_SelectStmt))",
File: "parse_cte.c", Line: 606)

Marko was already aware of it and has a fix ready.

*) I tested most of the error conditions and got them to fire.  No
unpleasant surprises.  The cursor error ("Non-SELECT cursors are not
implemented") is a little misleading.  Perhaps it should read
something like "Writeable CTE are not supported in cusor declaration"


Performance Review
------------------

*) Everything ran exactly as it should.  Huge updates rewritten as
writeable CTE delete + insert are slightly slower than raw insert but
this is expected.


Coding Review
-------------

*) A lot of the code is sgml/test/grammar changes that should be
relatively uncontroversial.  This is a small patch for what it does.

*) Should canSetTag be passed as the first agument to (for example) in
ExecInsert?  Is this the proper way to be passing this information?

*) execMain.c Line 1224
There is a blank code comment here.  IMO this section needs to be
documented better: the
CommandCounterIncrement is a critical part of how this works.

*) execMain.c Line 2033
The adjusted comment is confusing and seems to contradict itself.  I
would have wriiten: initialize them if they are not run in
EvalPlanQual().  Aside: is this an optimization?

*) CopySnapshot was promoted from static.  Is this legal/good idea?
Is a wrapper more appropriate?

Architecture Review
-------------------

*) Nothing jumped out at me.   The changes are small, so it really
comes down to if they were done properly/right spot.

Review Review
-------------

merlin


Re: Review of Writeable CTE Patch

From
Robert Haas
Date:
On Tue, Jan 26, 2010 at 9:13 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> *) Works as advertised...'feels right'.  Found only one small issue
> which Marko was already aware of and had adjusted for.
[...]
> Marko was already aware of it and has a fix ready.

So it sounds like we should expect an updated patch shortly?

...Robert


Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
On 2010-01-26 16:54, Robert Haas wrote:
> On Tue, Jan 26, 2010 at 9:13 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> *) Works as advertised...'feels right'.  Found only one small issue
>> which Marko was already aware of and had adjusted for.
> [...]
>> Marko was already aware of it and has a fix ready.
> 
> So it sounds like we should expect an updated patch shortly?

Yes.  I'm adding more comments and documentation, expect a patch within
a couple of days.


Regards,
Marko Tiikkaja


Re: Review of Writeable CTE Patch

From
Alvaro Herrera
Date:
Merlin Moncure escribió:

> *) CopySnapshot was promoted from static.  Is this legal/good idea?
> Is a wrapper more appropriate?

Hmm ... I wonder why isn't the patch doing RegisterSnapshot with the
passed snapshot directly -- why is it necessary to create a new copy of
it?  (I notice that only one of the arms in that "if" creates a copy;
if that is correct, I think it warrants a comment explaining why).

If the copy is necessary (e.g. because the snapshot must not be modified
externally, and there's actual risk that it is), then maybe it would be
better to create a new function RegisterSnapshotCopy instead?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
On 2010-01-26 17:11, Alvaro Herrera wrote:
> Merlin Moncure escribió:
> 
>> *) CopySnapshot was promoted from static.  Is this legal/good idea?
>> Is a wrapper more appropriate?
> 
> Hmm ... I wonder why isn't the patch doing RegisterSnapshot with the
> passed snapshot directly -- why is it necessary to create a new copy of
> it?  (I notice that only one of the arms in that "if" creates a copy;
> if that is correct, I think it warrants a comment explaining why).

Per discussion here:
http://archives.postgresql.org/pgsql-hackers/2009-11/msg01964.php the
executor copies the snapshot if it plans on modifying it.  A comment
explaining this might be in order.

> If the copy is necessary (e.g. because the snapshot must not be modified
> externally, and there's actual risk that it is), then maybe it would be
> better to create a new function RegisterSnapshotCopy instead?

Sounds reasonable.


Regards,
Marko Tiikkaja


Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
Hi,

Here's an updated patch.  This includes the fix mentioned earlier, some
comment improvements and making CopySnapshot() static again.  I also
changed all references to this feature to "DML WITH" for consistency.
I'm not sure if we want to keep it, but it should now be easier to
change in the future.


On 2010-01-26 16:13, Merlin Moncure wrote:
> Marko was already aware of it and has a fix ready.

This one includes it.

> *) I tested most of the error conditions and got them to fire.  No
> unpleasant surprises.  The cursor error ("Non-SELECT cursors are not
> implemented") is a little misleading.  Perhaps it should read
> something like "Writeable CTE are not supported in cusor declaration"

Ok, changed this one.

> *) execMain.c Line 1224
> There is a blank code comment here.  IMO this section needs to be
> documented better: the
> CommandCounterIncrement is a critical part of how this works.

Added some more comments.

> *) execMain.c Line 2033
> The adjusted comment is confusing and seems to contradict itself.  I
> would have wriiten: initialize them if they are not run in
> EvalPlanQual().  Aside: is this an optimization?

I tried, but I can't think of a better wording for that comment :-(
This is not really an optimization, just doing the right thing.  The
performance difference here is negligible.

> *) CopySnapshot was promoted from static.  Is this legal/good idea?
> Is a wrapper more appropriate?

Added new function RegisterSnapshotCopy() per Alvaro's suggestion.

Thanks a lot for reviewing!


Regards,
Marko Tiikkaja


Attachment

Re: Review of Writeable CTE Patch

From
Takahiro Itagaki
Date:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote:

> Here's an updated patch.  This includes the fix mentioned earlier, some
> comment improvements and making CopySnapshot() static again.  I also
> changed all references to this feature to "DML WITH" for consistency.
> I'm not sure if we want to keep it, but it should now be easier to
> change in the future.

Hi, I'm reviewing the writable CTE patch. The code logic seems to be
pretty good, but I have a couple of comments about error cases:

* Did we have a consensus about user-visible "DML WITH" messages? The term is used in error messages in many places,
forexample:  "DML WITH without RETURNING is only allowed inside an unreferenced CTE" Since we don't use "DML WITH" nor
"CTE"in documentation, I'd like to avoid such technical acronyms in logs if we had better names, or we should have a
sectionto explain them in docs.
 

* What can I do to get "Recursive DML WITH statements are not supported" message? I get syntax errors before I get the
messagebecause We don't support UNIONs with RETURNING queries. Am I missing something?
 
   =# UPDATE tbl SET i = i + 1 WHERE i = 1      UNION ALL      UPDATE tbl SET i = i + 1 WHERE i = 2;   ERROR:  syntax
errorat or near "UNION"
 

* The patch includes regression tests, but no error cases in it. More test cases are needed for stupid queries.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
Hi,

On 2010-02-03 11:04 UTC+2, Takahiro Itagaki wrote:
> Hi, I'm reviewing the writable CTE patch. The code logic seems to be
> pretty good, but I have a couple of comments about error cases:
>
> * Did we have a consensus about user-visible "DML WITH" messages?
>   The term is used in error messages in many places, for example:
>    "DML WITH without RETURNING is only allowed inside an unreferenced CTE"
>   Since we don't use "DML WITH" nor "CTE" in documentation,
>   I'd like to avoid such technical acronyms in logs if we had better names,
>   or we should have a section to explain them in docs.

We have yet to reach a consensus on the name for this feature.  I don't
think we have any really good candidates, but I like "DML WITH" best so far.

> * What can I do to get "Recursive DML WITH statements are not supported"
>   message? I get syntax errors before I get the message because We don't
>   support UNIONs with RETURNING queries. Am I missing something?
> 
>     =# UPDATE tbl SET i = i + 1 WHERE i = 1
>        UNION ALL
>        UPDATE tbl SET i = i + 1 WHERE i = 2;
>     ERROR:  syntax error at or near "UNION"

WITH RECURSIVE t AS (INSERT INTO foo SELECT * FROM t) VALUES(true);
would do that.  You can do the same with UPDATE .. FROM and DELETE .. USING.

> * The patch includes regression tests, but no error cases in it.
>   More test cases are needed for stupid queries.

Ok, I'll add these and send an updated patch in a few hours.


Regards,
Marko Tiikkaja


Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
Hi,

On 2010-02-03 11:04, Takahiro Itagaki wrote:
> * The patch includes regression tests, but no error cases in it.
>   More test cases are needed for stupid queries.

Here's an updated patch.


Regards,
Marko Tiikkaja

Attachment

Re: Review of Writeable CTE Patch

From
Robert Haas
Date:
On Wed, Feb 3, 2010 at 4:09 AM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> Hi,
>
> On 2010-02-03 11:04 UTC+2, Takahiro Itagaki wrote:
>> Hi, I'm reviewing the writable CTE patch. The code logic seems to be
>> pretty good, but I have a couple of comments about error cases:
>>
>> * Did we have a consensus about user-visible "DML WITH" messages?
>>   The term is used in error messages in many places, for example:
>>    "DML WITH without RETURNING is only allowed inside an unreferenced CTE"
>>   Since we don't use "DML WITH" nor "CTE" in documentation,
>>   I'd like to avoid such technical acronyms in logs if we had better names,
>>   or we should have a section to explain them in docs.
>
> We have yet to reach a consensus on the name for this feature.  I don't
> think we have any really good candidates, but I like "DML WITH" best so far.

Why can't we complain about the actual SQL statement the user issued?
Like, say:

INSERT requires RETURNING when used within a referenced CTE

...Robert


Re: Review of Writeable CTE Patch

From
Robert Haas
Date:
On Wed, Feb 3, 2010 at 5:31 AM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> On 2010-02-03 11:04, Takahiro Itagaki wrote:
>> * The patch includes regression tests, but no error cases in it.
>>   More test cases are needed for stupid queries.
>
> Here's an updated patch.

Some thoughts:

The comments in standard_ExecutorStart() don't do a good job of
explaining WHY the code does what it does - they just recapitulate
what you can already see from reading the code.  You say "If there are
DML WITH statements, we always need to use the CID and copy the
snapshot."   That's self-evident from the following code.   What's not
clear is why this is necessary, and the comment doesn't make any
attempt to explain it.  The second half of the if statement has the
same problem.

In ExecTopLevelStmtIsReadOnly, you might perhaps want to rephase the
comment in a way that doesn't use the word "Ehm."  Like maybe: "Even
if this function returns true, the statement might still contain
INSERT,
UPDATE, or DELETE statements within a CTE; we only check the top-level
statement."  Also, there should be a newline immediately before the
function name, per our usual style conventions.

InitPlan makes some references to "leader" scan states, but there's no
explanation of what exactly those are.

The comment in analyzeCTE that says "Many of these conditions are
impossible given restrictions of the grammar, but check 'em anyway."
makes less sense with this patch than it did formerly and may need to
be rethought... and I'm not sure there's any reason to change this
elog() an Assert.

In both analyzeCTE() and checkWellFormedRecursion(), I don't like just
removing the assertions there; we should try to assert something a bit
more sensible, like maybe !IsA(cte->ctequery, Query).  This patch
removes a number of other assertions as well, but I don't know enough
about those other spots to judge whether all of those cases are
sensible.

The only change to parse_relation.c is the addition of a #include; is
this needed?

The documentation changes for INSERT, UPDATE, and DELETE seem
inadequate, because they add a reference to with_query with no
corresponding explanation of what a with_query might be.

The limitations of INSERT/UPDATE/DELETE-within-WITH should be
documented somewhere: top level CTE only, and no DO ALSO or
conditional DO INSTEAD rules.  If we don't intend to remove this
limitation in a future release, we should probably also document that.I believe there are some other caveats that we've
discussedbefore, 
too, though I'm not sure if they're still true.  Stuff like:

- CTEs will be executed to completion in sequential order before the
main statement begins execution
- each CTE will see the results of CTEs already executed, and the main
statement will see the results of all CTEs
- but queries within each CTE still won't see their own updates (a
reference to whatever section of the manual we talk about this in
would probably be good)
- possible pitfalls of CTEs not being pipelined

...Robert


Re: Review of Writeable CTE Patch

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 3, 2010 at 4:09 AM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> We have yet to reach a consensus on the name for this feature. �I don't
>> think we have any really good candidates, but I like "DML WITH" best so far.

> Why can't we complain about the actual SQL statement the user issued?
> Like, say:
> INSERT requires RETURNING when used within a referenced CTE

We could probably make that work for error messages, but what about
documentation?  It's going to be awkward to write something like
"INSERT/UPDATE/DELETE RETURNING" every time we need to make a general
statement about the behavior of all three.
        regards, tom lane


Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
Hi,

On 2010-02-03 16:09 UTC+2, Robert Haas wrote:
> Why can't we complain about the actual SQL statement the user issued?
> Like, say:
> 
> INSERT requires RETURNING when used within a referenced CTE

The SELECT equivalent of this query looks like this:
=> with recursive t as (select * from t) values(true);
ERROR:  recursive query "t" does not have the form non-recursive-term
UNION [ALL] recursive-term

but I didn't want to throw people off to think that they can use
INSERT/UPDATE/RETURNING in a RECURSIVE CTE, just to get complaints about
syntax error.


Regards,
Marko Tiikkaja



Re: Review of Writeable CTE Patch

From
Robert Haas
Date:
On Wed, Feb 3, 2010 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Feb 3, 2010 at 4:09 AM, Marko Tiikkaja
>> <marko.tiikkaja@cs.helsinki.fi> wrote:
>>> We have yet to reach a consensus on the name for this feature.  I don't
>>> think we have any really good candidates, but I like "DML WITH" best so far.
>
>> Why can't we complain about the actual SQL statement the user issued?
>> Like, say:
>> INSERT requires RETURNING when used within a referenced CTE
>
> We could probably make that work for error messages, but what about
> documentation?  It's going to be awkward to write something like
> "INSERT/UPDATE/DELETE RETURNING" every time we need to make a general
> statement about the behavior of all three.

The current patch includes a total of 5 lines of text documenting this
new feature (plus one example), so the issue doesn't really arise.

If, as I believe, more documentation is needed, then we may need to
think about how to handle this, but it's hard to speculate without a
bit more context.

...Robert


Re: Review of Writeable CTE Patch

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 3, 2010 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We could probably make that work for error messages, but what about
>> documentation? �It's going to be awkward to write something like
>> "INSERT/UPDATE/DELETE RETURNING" every time we need to make a general
>> statement about the behavior of all three.

> The current patch includes a total of 5 lines of text documenting this
> new feature (plus one example), so the issue doesn't really arise.

Well, that's certainly not going to be nearly sufficient.  I think what
you meant is "Marko hasn't bothered with documentation".  There is going
to need to be discussion in the RULES chapter, in the page describing
returned command tags, and probably six other places that aren't coming
to me in the time it takes to type this sentence.
        regards, tom lane


Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
On 2010-02-03 16:53 UTC+2, Robert Haas wrote:
> Some thoughts:
> 
> The comments in standard_ExecutorStart() don't do a good job of
> explaining WHY the code does what it does - they just recapitulate
> what you can already see from reading the code.  You say "If there are
> DML WITH statements, we always need to use the CID and copy the
> snapshot."   That's self-evident from the following code.   What's not
> clear is why this is necessary, and the comment doesn't make any
> attempt to explain it.  The second half of the if statement has the
> same problem.

Ok, I'll try to make this more clear.

> In ExecTopLevelStmtIsReadOnly, you might perhaps want to rephase the
> comment in a way that doesn't use the word "Ehm."  Like maybe: "Even
> if this function returns true, the statement might still contain
> INSERT,
> UPDATE, or DELETE statements within a CTE; we only check the top-level
> statement."  Also, there should be a newline immediately before the
> function name, per our usual style conventions.

That comment tries to emphasize the fact that I can't think of any
reasonable name for that particular function.  If the name looks OK, I
can update the comment.

> The comment in analyzeCTE that says "Many of these conditions are
> impossible given restrictions of the grammar, but check 'em anyway."
> makes less sense with this patch than it did formerly and may need to
> be rethought... and I'm not sure there's any reason to change this
> elog() an Assert.

Ok, I'll look at this.

> In both analyzeCTE() and checkWellFormedRecursion(), I don't like just
> removing the assertions there; we should try to assert something a bit
> more sensible, like maybe !IsA(cte->ctequery, Query).  This patch
> removes a number of other assertions as well, but I don't know enough
> about those other spots to judge whether all of those cases are
> sensible.

I'll look through these again.

> The only change to parse_relation.c is the addition of a #include; is
> this needed?

No, I thought I had removed that long time ago.  Will remove.

> The documentation changes for INSERT, UPDATE, and DELETE seem
> inadequate, because they add a reference to with_query with no
> corresponding explanation of what a with_query might be.

Ok, I'll add this.

> The limitations of INSERT/UPDATE/DELETE-within-WITH should be
> documented somewhere: top level CTE only, and no DO ALSO or
> conditional DO INSTEAD rules.  If we don't intend to remove this
> limitation in a future release, we should probably also document that.
>  I believe there are some other caveats that we've discussed before,
> too, though I'm not sure if they're still true.  Stuff like:
> 
> - CTEs will be executed to completion in sequential order before the
> main statement begins execution
> - each CTE will see the results of CTEs already executed, and the main
> statement will see the results of all CTEs
> - but queries within each CTE still won't see their own updates (a
> reference to whatever section of the manual we talk about this in
> would probably be good)
> - possible pitfalls of CTEs not being pipelined

Right.  The documentation in its current state is definitely lacking.
I've tried to focus all the time I have in making this patch technically
good.


Regards,
Marko Tiikkaja


Re: Review of Writeable CTE Patch

From
Merlin Moncure
Date:
On Wed, Feb 3, 2010 at 11:18 AM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> On 2010-02-03 16:53 UTC+2, Robert Haas wrote:
>> Some thoughts:
>>
>> The comments in standard_ExecutorStart() don't do a good job of
>> explaining WHY the code does what it does - they just recapitulate
>> what you can already see from reading the code.  You say "If there are
>> DML WITH statements, we always need to use the CID and copy the
>> snapshot."   That's self-evident from the following code.   What's not
>> clear is why this is necessary, and the comment doesn't make any
>> attempt to explain it.  The second half of the if statement has the
>> same problem.
>
>
> Right.  The documentation in its current state is definitely lacking.
> I've tried to focus all the time I have in making this patch technically
> good.

Outside of documentation issues, where do we stand? Do you need help
with the documentation?

merlin


Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
On 2010-02-03 18:41 UTC+2, Merlin Moncure wrote:
> On Wed, Feb 3, 2010 at 11:18 AM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> Right.  The documentation in its current state is definitely lacking.
>> I've tried to focus all the time I have in making this patch technically
>> good.

> Do you need help with the documentation?

I'm going to work on the documentation tonight, but it will probably
need some work from a native English speaker after I'm done.


Regards,
Marko Tiikkaja



Re: Review of Writeable CTE Patch

From
Robert Haas
Date:
On Wed, Feb 3, 2010 at 11:18 AM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
>> In ExecTopLevelStmtIsReadOnly, you might perhaps want to rephase the
>> comment in a way that doesn't use the word "Ehm."  Like maybe: "Even
>> if this function returns true, the statement might still contain
>> INSERT,
>> UPDATE, or DELETE statements within a CTE; we only check the top-level
>> statement."  Also, there should be a newline immediately before the
>> function name, per our usual style conventions.
>
> That comment tries to emphasize the fact that I can't think of any
> reasonable name for that particular function.  If the name looks OK, I
> can update the comment.

Name seems fine.  Just fix the comment.

>> The limitations of INSERT/UPDATE/DELETE-within-WITH should be
>> documented somewhere: top level CTE only, and no DO ALSO or
>> conditional DO INSTEAD rules.  If we don't intend to remove this
>> limitation in a future release, we should probably also document that.
>>  I believe there are some other caveats that we've discussed before,
>> too, though I'm not sure if they're still true.  Stuff like:
>>
>> - CTEs will be executed to completion in sequential order before the
>> main statement begins execution
>> - each CTE will see the results of CTEs already executed, and the main
>> statement will see the results of all CTEs
>> - but queries within each CTE still won't see their own updates (a
>> reference to whatever section of the manual we talk about this in
>> would probably be good)
>> - possible pitfalls of CTEs not being pipelined
>
> Right.  The documentation in its current state is definitely lacking.
> I've tried to focus all the time I have in making this patch technically
> good.

Well, technically good is certainly a good place to start.  :-)  Of
course, we need the docs, too.  Thanks for your work on this.

...Robert


Re: Review of Writeable CTE Patch

From
Marko Tiikkaja
Date:
Hi,

Attached is an updated patch.  I'm now going to start working on the
documentation and I'll send it in a separate patch a bit later.

On 2010-02-03 16:53 UTC+2, Robert Haas wrote:
> Some thoughts:

> This patch
> removes a number of other assertions as well, but I don't know enough
> about those other spots to judge whether all of those cases are
> sensible.

I put back the Asserts in make_modifytable().  The one in
ExecInitModifyTable() is not true any more and neither is the Assert in
transformInsertStatement().


Regards,
Marko Tiikkaja

Attachment