Thread: Writeable CTEs documentation patch

Writeable CTEs documentation patch

From
Marko Tiikkaja
Date:
Hi,

Attached is a documentation patch for writeable CTEs.

Most of it is explaining how this feature works in select.sgml.  I
wasn't sure if that's the right place, but couldn't find a better one.
I also wasn't able to find any place discussing the command tag, other
than libpq's documentation.  Is there one somewhere?

While working on the docs, I noticed one problem with the patch itself:
it doesn't handle multi-statement DO INSTEAD rules correctly.  I'm going
to submit a fix for that later.

Any suggestions, whatsoever, are welcome.


Regards,
Marko Tiikkaja

Attachment

Writeable CTEs patch

From
Marko Tiikkaja
Date:
Hi,

On 2010-02-04 18:04 UTC+2, I wrote:
> While working on the docs, I noticed one problem with the patch itself:
> it doesn't handle multi-statement DO INSTEAD rules correctly.  I'm going
> to submit a fix for that later.

Here's an updated patch.  Only changes from the previous patch are
fixing the above issue and a regression test for it.


Regards,
Marko Tiikkaja

Attachment

Re: Writeable CTEs patch

From
Takahiro Itagaki
Date:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote:

> Here's an updated patch.  Only changes from the previous patch are
> fixing the above issue and a regression test for it.

A brief report for of the patch:

* The patch has the following error cases, and also have one regression test for each case.
- DML WITH is not allowed in a cursor declaration- DML WITH is not allowed in a view definition- DML WITH without
RETURNINGis only allowed inside an unreferenced CTE- DML WITH is only allowed at the top level- Recursive DML WITH
statementsare not supported    ^-- might be better if "DML WITH cannot have the self-reference" or so?
 
- Conditional DO INSTEAD rules are not supported in DML WITH statements- DO ALSO rules are not supported in DML WITH
statements-Multi-statement DO INSTEAD rules are not supported in DML WITH statements- DO INSTEAD NOTHING rules are not
supportedin DML WITH statements
 

* In the regression tests, almost all of them don't have ORDER BY clause. They just work, but we might need ORDER BY to
getrobust output. What did we do in other regression tests?
 

* I feel odd the following paragraph in the docs, but should be checked by native English speakers.

*** a/doc/src/sgml/ref/create_rule.sgml
--- b/doc/src/sgml/ref/create_rule.sgml
***************
*** 222,227 **** CREATE [ OR REPLACE ] RULE <replaceable class="parameter">name</replaceable> AS
--- 222,234 ----   </para>    <para>
+    In an <literal>INSERT</literal>, <literal>UPDATE</literal> or
+    <literal>DELETE</literal> query within a <literal>WITH</literal> clause,
+    only unconditional, single-statement <literal>INSTEAD</literal> rules are                      ^-- and? which
commais the sentence separator?
 
+    implemented.    ^-- might be "available" rather than "implemented"?
+   </para>


Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Writeable CTEs patch

From
Marko Tiikkaja
Date:
On 2010-02-05 07:14 UTC+2, Takahiro Itagaki wrote:
> 
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote:
> 
>> Here's an updated patch.  Only changes from the previous patch are
>> fixing the above issue and a regression test for it.
> 
> * In the regression tests, almost all of them don't have ORDER BY clause.
>   They just work, but we might need ORDER BY to get robust output.
>   What did we do in other regression tests?

Looking at with.sql, it seems to use ORDER BY when it accesses data from
a table.  But obviously we can't do this if want to test
INSERT/UPDATE/DELETE .. RETURNING at the top level and returning.sql
seems to be relying on the fact that they come out in the same order
every time.


Regards,
Marko Tiikkaja


Re: Writeable CTEs patch

From
Tom Lane
Date:
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> * In the regression tests, almost all of them don't have ORDER BY clause.
>   They just work, but we might need ORDER BY to get robust output.
>   What did we do in other regression tests?

We add ORDER BY only when experience shows it's necessary.  The
reasoning is explained in regress.sgml:

You might wonder why we don't order all the regression test queries explicitly
to get rid of this issue once and for all.  The reason is that that would
make the regression tests less useful, not more, since they'd tend
to exercise query plan types that produce ordered results to the
exclusion of those that don't.
        regards, tom lane


Re: Writeable CTEs patch

From
Robert Haas
Date:
On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> On 2010-02-04 18:04 UTC+2, I wrote:
>> While working on the docs, I noticed one problem with the patch itself:
>> it doesn't handle multi-statement DO INSTEAD rules correctly.  I'm going
>> to submit a fix for that later.
>
> Here's an updated patch.  Only changes from the previous patch are
> fixing the above issue and a regression test for it.

The comments on the parts I asked about before are much better in this
version.  A few other things that I notice:

- I'm not sure that canSetTag is the right name for the additional
argument to ExecInsert/ExecUpdate/ExecDelete.  OTOH, I'm not sure it's
the wrong name either.  But should we use something like
isTopLevelQuery?

- It appears that we pull out all of the DML statements first and run
them in order, but I'm not sure that's the right thing to do.
Consider:

WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ...

I would assume we would do x, CCI, do y, do z, CCI, do main query, but
I don't think that's what this implements.  The user might be
surprised to find out that y sees the effects of z.

- I think that the comment in analyzeCTE that says /* Check that we
got something reasonable */ could be fleshed out a bit.  You could
still reference transformRangeSubselect, for example, but then explain
why the checks here are different (viz, CTEs can contain DML).

- The comment for RegisterSnapshotCopy identifies the function name as
RegisterSnapshot; I think this is a copy-and-pasteo.

- It seems like the gram.y changes for common_table_expr might benefit
from some factoring; that is, create a production (or find a suitable
existing one) for "statements of the sort that can appear within
CTEs", and then use that in common_table_expr.  Or maybe this doesn't
work; I haven't tried it.

- I still don't much like the idea of using DML WITH in error
messages.  One idea I had (which might suck, but I'm just throwing it
out there) is to change hasDmlWith to an integer bitmap with a bit for
each of insert, update, and delete.  But it may be better still to
just rephrase the error messages.  Could we just write, e.g.
"non-SELECT statements are not allowed within a cursor declaration?"
Or we could say "INSERT, UPDATE, and DELETE statements are not allowed
within a cursor declaration", but I'm thinking we may want to allow
things like COPY and EXPLAIN inside CTEs in the future, too, and
they'll presumably be treated similarly to DML.

For the record, Tom or whoever should feel to swoop in here at any
time, or add to any of this.  I'm just making suggestions until the
big guns show up.

...Robert


Re: Writeable CTEs patch

From
Marko Tiikkaja
Date:
On 2010-02-08 18:42 +0200, Robert Haas wrote:
> On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja
>> Here's an updated patch.  Only changes from the previous patch are
>> fixing the above issue and a regression test for it.
> 
> - I'm not sure that canSetTag is the right name for the additional
> argument to ExecInsert/ExecUpdate/ExecDelete.  OTOH, I'm not sure it's
> the wrong name either.  But should we use something like
> isTopLevelQuery?

No objection to changing that.

> - It appears that we pull out all of the DML statements first and run
> them in order, but I'm not sure that's the right thing to do.
> Consider:
> 
> WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ...
> 
> I would assume we would do x, CCI, do y, do z, CCI, do main query, but
> I don't think that's what this implements.  The user might be
> surprised to find out that y sees the effects of z.

Hmm.  Right.  That sounds like the right thing to do.  Another option
(which I seem to recall we've discussed before) is to not allow any
SELECT statements between DML WITHs, but I think this is what we should
go for.

> - I think that the comment in analyzeCTE that says /* Check that we
> got something reasonable */ could be fleshed out a bit.  You could
> still reference transformRangeSubselect, for example, but then explain
> why the checks here are different (viz, CTEs can contain DML).

Ok, I'll look into that.

> - The comment for RegisterSnapshotCopy identifies the function name as
> RegisterSnapshot; I think this is a copy-and-pasteo.

You're right.  Will fix.

> - It seems like the gram.y changes for common_table_expr might benefit
> from some factoring; that is, create a production (or find a suitable
> existing one) for "statements of the sort that can appear within
> CTEs", and then use that in common_table_expr.  Or maybe this doesn't
> work; I haven't tried it.

My bison-fu is not exactly strong, but I can look at the feasibility of
that.

> - I still don't much like the idea of using DML WITH in error
> messages.  One idea I had (which might suck, but I'm just throwing it
> out there) is to change hasDmlWith to an integer bitmap with a bit for
> each of insert, update, and delete.  But it may be better still to
> just rephrase the error messages.

I don't see how that would work.  We'd still potentially have many
different types of DML operations to deal with and that wouldn't help at
all at distinguishing which operation actually caused the error.  Or did
I misunderstand?

> Could we just write, e.g.
> "non-SELECT statements are not allowed within a cursor declaration?"
> Or we could say "INSERT, UPDATE, and DELETE statements are not allowed
> within a cursor declaration", but I'm thinking we may want to allow
> things like COPY and EXPLAIN inside CTEs in the future, too, and
> they'll presumably be treated similarly to DML.

"INSERT, UPDATE and DELETE" is quite long and "non-SELECT" is a bit
clumsy IMO.  But I don't really have anything better to offer, either.


Regards,
Marko Tiikkaja


Re: Writeable CTEs patch

From
Robert Haas
Date:
On Mon, Feb 8, 2010 at 1:01 PM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
>> Could we just write, e.g.
>> "non-SELECT statements are not allowed within a cursor declaration?"
>> Or we could say "INSERT, UPDATE, and DELETE statements are not allowed
>> within a cursor declaration", but I'm thinking we may want to allow
>> things like COPY and EXPLAIN inside CTEs in the future, too, and
>> they'll presumably be treated similarly to DML.
>
> "INSERT, UPDATE and DELETE" is quite long and "non-SELECT" is a bit
> clumsy IMO.  But I don't really have anything better to offer, either.

Yeah, I don't feel good about "INSERT, UPDATE, and DELETE" because in
most of the relevant contexts the list might get longer if in the
future we allow things like EXPLAIN and COPY within CTEs.  I think
"Non-SELECT statement" is reasonably clear, though; people might not
know which things are statements, but the message implies that SELECT
is one such thing, and not the one that's the problem, which should
get them pointed in the right direction.

...Robert


Re: Writeable CTEs patch

From
Alvaro Herrera
Date:
Robert Haas escribió:

> Yeah, I don't feel good about "INSERT, UPDATE, and DELETE" because in
> most of the relevant contexts the list might get longer if in the
> future we allow things like EXPLAIN and COPY within CTEs.  I think
> "Non-SELECT statement" is reasonably clear, though; people might not
> know which things are statements, but the message implies that SELECT
> is one such thing, and not the one that's the problem, which should
> get them pointed in the right direction.

"DML statements other than SELECT" perhaps?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Writeable CTEs patch

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Mon, Feb 8, 2010 at 1:01 PM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
> >> Could we just write, e.g.
> >> "non-SELECT statements are not allowed within a cursor declaration?"
> >> Or we could say "INSERT, UPDATE, and DELETE statements are not allowed
> >> within a cursor declaration", but I'm thinking we may want to allow
> >> things like COPY and EXPLAIN inside CTEs in the future, too, and
> >> they'll presumably be treated similarly to DML.
> >
> > "INSERT, UPDATE and DELETE" is quite long and "non-SELECT" is a bit
> > clumsy IMO.  But I don't really have anything better to offer, either.
> 
> Yeah, I don't feel good about "INSERT, UPDATE, and DELETE" because in
> most of the relevant contexts the list might get longer if in the
> future we allow things like EXPLAIN and COPY within CTEs.  I think
> "Non-SELECT statement" is reasonably clear, though; people might not
> know which things are statements, but the message implies that SELECT
> is one such thing, and not the one that's the problem, which should
> get them pointed in the right direction.

Hmm, how about VALUES?  Isn't that a statement on its own right, that
would similarly unaffected?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Writeable CTEs patch

From
Robert Haas
Date:
On Mon, Feb 8, 2010 at 3:30 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Robert Haas escribió:
>> On Mon, Feb 8, 2010 at 1:01 PM, Marko Tiikkaja
>> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> >> Could we just write, e.g.
>> >> "non-SELECT statements are not allowed within a cursor declaration?"
>> >> Or we could say "INSERT, UPDATE, and DELETE statements are not allowed
>> >> within a cursor declaration", but I'm thinking we may want to allow
>> >> things like COPY and EXPLAIN inside CTEs in the future, too, and
>> >> they'll presumably be treated similarly to DML.
>> >
>> > "INSERT, UPDATE and DELETE" is quite long and "non-SELECT" is a bit
>> > clumsy IMO.  But I don't really have anything better to offer, either.
>>
>> Yeah, I don't feel good about "INSERT, UPDATE, and DELETE" because in
>> most of the relevant contexts the list might get longer if in the
>> future we allow things like EXPLAIN and COPY within CTEs.  I think
>> "Non-SELECT statement" is reasonably clear, though; people might not
>> know which things are statements, but the message implies that SELECT
>> is one such thing, and not the one that's the problem, which should
>> get them pointed in the right direction.
>
> Hmm, how about VALUES?  Isn't that a statement on its own right, that
> would similarly unaffected?

Ouch.  You're right, that's a problem.  :-(

TABLE is a similar case.

...Robert


Re: Writeable CTEs patch

From
Marko Tiikkaja
Date:
On 2010-02-08 18:42 +0200, Robert Haas wrote:
> On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> Here's an updated patch.  Only changes from the previous patch are
>> fixing the above issue and a regression test for it.
> 
> - I'm not sure that canSetTag is the right name for the additional
> argument to ExecInsert/ExecUpdate/ExecDelete.  OTOH, I'm not sure it's
> the wrong name either.  But should we use something like
> isTopLevelQuery?

I'm going to have to take back my previous statement; this doesn't make
a lot of sense in the case of DO ALSO rules (or multiple statements in a
DO INSTEAD RULE).  Those will have canSetTag=false, but they will be at
the top level.

> - It appears that we pull out all of the DML statements first and run
> them in order, but I'm not sure that's the right thing to do.
> Consider:
> 
> WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ...
> 
> I would assume we would do x, CCI, do y, do z, CCI, do main query, but
> I don't think that's what this implements.  The user might be
> surprised to find out that y sees the effects of z.

I went ahead and implemented this, but there seems to be one small
problem: RECURSIVE.  If there is a recursive query between those, it
might loop forever even if the top-level SELECT only wanted to see a few
rows from it.  The docs already discourage writing recursive ctes like
that, but still this is a small caveat.

> - It seems like the gram.y changes for common_table_expr might benefit
> from some factoring; that is, create a production (or find a suitable
> existing one) for "statements of the sort that can appear within
> CTEs", and then use that in common_table_expr.  Or maybe this doesn't
> work; I haven't tried it.

This seems to work.  I used PreparableStmt, but I'm not sure how good
idea that really is.  Maybe I should create a new one?


Regards,
Marko Tiikkaja


Re: Writeable CTEs patch

From
Robert Haas
Date:
On Tue, Feb 9, 2010 at 3:13 PM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> On 2010-02-08 18:42 +0200, Robert Haas wrote:
>> On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja
>> <marko.tiikkaja@cs.helsinki.fi> wrote:
>>> Here's an updated patch.  Only changes from the previous patch are
>>> fixing the above issue and a regression test for it.
>>
>> - I'm not sure that canSetTag is the right name for the additional
>> argument to ExecInsert/ExecUpdate/ExecDelete.  OTOH, I'm not sure it's
>> the wrong name either.  But should we use something like
>> isTopLevelQuery?
>
> I'm going to have to take back my previous statement; this doesn't make
> a lot of sense in the case of DO ALSO rules (or multiple statements in a
> DO INSTEAD RULE).  Those will have canSetTag=false, but they will be at
> the top level.

Ah.  OK.

>> - It appears that we pull out all of the DML statements first and run
>> them in order, but I'm not sure that's the right thing to do.
>> Consider:
>>
>> WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ...
>>
>> I would assume we would do x, CCI, do y, do z, CCI, do main query, but
>> I don't think that's what this implements.  The user might be
>> surprised to find out that y sees the effects of z.
>
> I went ahead and implemented this, but there seems to be one small
> problem: RECURSIVE.  If there is a recursive query between those, it
> might loop forever even if the top-level SELECT only wanted to see a few
> rows from it.  The docs already discourage writing recursive ctes like
> that, but still this is a small caveat.

Doesn't seem like a big problem to me.

>> - It seems like the gram.y changes for common_table_expr might benefit
>> from some factoring; that is, create a production (or find a suitable
>> existing one) for "statements of the sort that can appear within
>> CTEs", and then use that in common_table_expr.  Or maybe this doesn't
>> work; I haven't tried it.
>
> This seems to work.  I used PreparableStmt, but I'm not sure how good
> idea that really is.  Maybe I should create a new one?

If it covers the same territory, I wouldn't duplicate it just for fun.Someone might need to split it out in the future,
butthat's not a 
reason to do it now.

...Robert


Re: Writeable CTEs patch

From
Marko Tiikkaja
Date:
On 2010-02-08 18:42 +0200, Robert Haas wrote:
> On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> On 2010-02-04 18:04 UTC+2, I wrote:
>>> While working on the docs, I noticed one problem with the patch itself:
>>> it doesn't handle multi-statement DO INSTEAD rules correctly.  I'm going
>>> to submit a fix for that later.
>>
>> Here's an updated patch.  Only changes from the previous patch are
>> fixing the above issue and a regression test for it.
>
> - It appears that we pull out all of the DML statements first and run
> them in order, but I'm not sure that's the right thing to do.
> Consider:
>
> WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ...
>
> I would assume we would do x, CCI, do y, do z, CCI, do main query, but
> I don't think that's what this implements.  The user might be
> surprised to find out that y sees the effects of z.

I've updated the patch according to what I said here:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00722.php

I haven't done any extensive testing, but it seems to work in the most
common cases.


Regards,
Marko Tiikkaja

Attachment

Re: Writeable CTEs patch

From
Marko Tiikkaja
Date:
On 2010-02-08 18:42 +0200, Robert Haas wrote:
> On Thu, Feb 4, 2010 at 11:57 AM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> On 2010-02-04 18:04 UTC+2, I wrote:
>>> While working on the docs, I noticed one problem with the patch itself:
>>> it doesn't handle multi-statement DO INSTEAD rules correctly.  I'm going
>>> to submit a fix for that later.
>>
>> Here's an updated patch.  Only changes from the previous patch are
>> fixing the above issue and a regression test for it.
>
> - It appears that we pull out all of the DML statements first and run
> them in order, but I'm not sure that's the right thing to do.
> Consider:
>
> WITH x AS (INSERT ...), y AS (SELECT ...), z AS (INSERT ...) SELECT ...
>
> I would assume we would do x, CCI, do y, do z, CCI, do main query, but
> I don't think that's what this implements.  The user might be
> surprised to find out that y sees the effects of z.

I've updated the patch according to what I said here:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00722.php

I haven't done any extensive testing, but it seems to work in the most
common cases.


Regards,
Marko Tiikkaja

Attachment