Thread: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

[bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From

Mikhail Titov

Date:

13 August 2020, 04:30:50

Hello!

According to the docs[1], one may use DEFAULT keyword while inserting
into generated columns (stored and identity). However, currently it
works only for a single VALUES sublist with DEFAULT for a generated column
but not for the case when VALUES RTE is used. This is not being tested
and it is broken.

I am attaching two patches. One for tests and another one with the
proposed changes based on ideas from Andrew on IRC. So if all good there
goes the credit where credit is due. If patch is no good, then it is
likely my misunderstanding how to put words into code :-)

This is my only second patch to PostgreSQL (the first one was rejected)
so don't be too harsh :-) It may not be perfect but I am open for a
feedback and this is just to get the ball rolling and to let the
community know about this issue.

Before you ask why would I want to insert DEFAULTs ... well, there are
ORMs[2] that still need to be patched and current situation contradicts
documentation[1].

Footnotes:
[1]  https://www.postgresql.org/docs/devel/ddl-generated-columns.html

[2]  https://github.com/rails/rails/pull/39368#issuecomment-670351379

--
Mikhail

Attachment

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From

Pavel Stehule

Date:

13 August 2020, 05:11:08

čt 13. 8. 2020 v 6:31 odesílatel Mikhail Titov <mlt@gmx.us> napsal:

Hello!

According to the docs[1], one may use DEFAULT keyword while inserting
into generated columns (stored and identity). However, currently it
works only for a single VALUES sublist with DEFAULT for a generated column
but not for the case when VALUES RTE is used. This is not being tested
and it is broken.

I am attaching two patches. One for tests and another one with the
proposed changes based on ideas from Andrew on IRC. So if all good there
goes the credit where credit is due. If patch is no good, then it is
likely my misunderstanding how to put words into code :-)

This is my only second patch to PostgreSQL (the first one was rejected)
so don't be too harsh :-) It may not be perfect but I am open for a
feedback and this is just to get the ball rolling and to let the
community know about this issue.

Before you ask why would I want to insert DEFAULTs ... well, there are
ORMs[2] that still need to be patched and current situation contradicts
documentation[1].

please, assign your patch to commitfest application

https://commitfest.postgresql.org/29/

Regards

Pavel

Footnotes:
[1] https://www.postgresql.org/docs/devel/ddl-generated-columns.html

[2] https://github.com/rails/rails/pull/39368#issuecomment-670351379

--
Mikhail

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From

Mikhail Titov

Date:

14 August 2020, 19:57:38

Previously submitted patch got somehow trailing spaces mangled on the
way out. This is an attempt to use application/octet-stream MIME instead
of text/x-patch to preserve those for regression tests.

On Thu, Aug 13, 2020 at 12:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> please, assign your patch to commitfest application

Here is the backlink https://commitfest.postgresql.org/29/2681/

--
Mikhail

Attachment

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From

Tom Lane

Date:

06 September 2020, 21:42:40

Mikhail Titov <mlt@gmx.us> writes:
> Previously submitted patch got somehow trailing spaces mangled on the
> way out. This is an attempt to use application/octet-stream MIME instead
> of text/x-patch to preserve those for regression tests.

I took a quick look through this.  I agree with the general idea of
detecting cases where all of the entries in a VALUES column are DEFAULT,
but the implementation needs work.

The cfbot reports that it doesn't compile [1]:

parse_relation.c: In function ‘expandNSItemVars’:
parse_relation.c:2992:34: error: ‘T_Node’ undeclared (first use in this function)
     std = list_nth_node(Node, row, colindex);
                                  ^

I suspect this indicates that you did not use --enable-cassert in your own
testing, which is usually a bad idea; that enables a lot of checks that
you really want to have active for development purposes.

Hacking expandNSItemVars() for this purpose is an extremely bad idea.
The API spec for that is
 *      Produce a list of Vars, and optionally a list of column names,
 *      for the non-dropped columns of the nsitem.
This patch breaks that specification, and in turn breaks callers that
expect it to be adhered to.  I see at least one caller that will suffer
assertion failures because of that, which reinforces my suspicion that
you did not test with assertions on.

I think you'd be better off to make transformInsertStmt(), specifically
its multi-VALUES-rows code path, check for all-DEFAULT columns and adjust
the tlist itself.  Doing it there might be a good bit less inefficient
for very long VALUES lists, too, which is a case that we do worry about.
Since that's already iterating through the sub-lists, you could track
whether all rows seen so far contain SetToDefault in each column position,
and avoid extra scans of the sublists.  (A BitmapSet might be a convenient
representation of that, though you could also use a bool array I suppose.)

I do not care for what you did in rewriteValuesRTE() either: just removing
a sanity check isn't OK, unless you've done something to make the sanity
check unnecessary which you surely have not.  Perhaps you could extend
the initial scan of the tlist (lines 1294-1310) to notice SetToDefault
nodes as well as Var nodes and keep track of which columns have those.
Then you could cross-check that one or the other case applies whenever
you see a SetToDefault in the VALUES lists.

BTW, another thing that needs checking is whether a rule containing
an INSERT like this will reverse-list sanely.  The whole idea of
replacing some of the Vars might not work so far as ruleutils.c is
concerned.  In that case I think we might have to implement this
by having transformInsertStmt restructure the VALUES lists to
physically remove the all-DEFAULT column, and adjust the target column
list accordingly --- that is, make a parse-time transformation of
    INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT);
into
    INSERT INTO gtest0(a) VALUES (1), (2);
That'd have the advantage that you'd not have to hack up the
rewriter at all.

Also, in the case where the user has failed to ensure that all the
column entries are DEFAULT, I suppose that we'll still get the same
error as now:

    regression=# INSERT INTO gtest0 VALUES (1, DEFAULT), (2, 42);
    ERROR:  cannot insert into column "b"
    DETAIL:  Column "b" is a generated column.

This seems fairly confusing and unhelpful.  Perhaps it's not this
patch's job to improve it, but it'd be nice if we could do better.
One easy change would be to make the error message more specific:

    ERROR:  cannot insert a non-DEFAULT value into column "b"

(I think this wording is accurate, but I might be wrong.)  It'd be
even better if we could emit an error cursor pointing at (one of)
the entries that are not DEFAULT, since in a command with a long
VALUES list it might not be that obvious where you screwed up.

FWIW, I would not bother splitting a patch like this into two parts.
That increases your effort level, and it increases the reviewer's
effort to apply it too, and this patch isn't big enough to justify it.

            regards, tom lane

[1] https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/724543451

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From

Dean Rasheed

Date:

20 November 2020, 14:30:25

On Sun, 6 Sept 2020 at 22:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I think you'd be better off to make transformInsertStmt(), specifically
> its multi-VALUES-rows code path, check for all-DEFAULT columns and adjust
> the tlist itself.  Doing it there might be a good bit less inefficient
> for very long VALUES lists, too, which is a case that we do worry about.
> Since that's already iterating through the sub-lists, you could track
> whether all rows seen so far contain SetToDefault in each column position,
> and avoid extra scans of the sublists.  (A BitmapSet might be a convenient
> representation of that, though you could also use a bool array I suppose.)
>
> I do not care for what you did in rewriteValuesRTE() either: just removing
> a sanity check isn't OK, unless you've done something to make the sanity
> check unnecessary which you surely have not.  Perhaps you could extend
> the initial scan of the tlist (lines 1294-1310) to notice SetToDefault
> nodes as well as Var nodes and keep track of which columns have those.
> Then you could cross-check that one or the other case applies whenever
> you see a SetToDefault in the VALUES lists.

That's not quite right because by the time rewriteValuesRTE() sees the
tlist, it will contain already-rewritten generated column expressions,
not SetToDefault nodes. If we're going to keep that sanity check (and
I think that we should), I think that the way to do it is to have
rewriteTargetListIU() record which columns it has expanded defaults
for, and pass that information to rewriteValuesRTE(). Those columns of
the VALUES RTE are no longer used in the query, so the sanity check
can be amended to ignore them while continuing to check the other
columns.

Incidentally, there is another way of causing that sanity check to
fail -- an INSERT ... OVERRIDING USER VALUE query with some DEFAULTS
in the VALUES RTE (but not necessarily all DEFAULTs) will trigger it.
So even if the parser completely removed any all-DEFAULT columns from
the VALUES RTE, some work in the rewriter would still be necessary.

> BTW, another thing that needs checking is whether a rule containing
> an INSERT like this will reverse-list sanely.  The whole idea of
> replacing some of the Vars might not work so far as ruleutils.c is
> concerned.  In that case I think we might have to implement this
> by having transformInsertStmt restructure the VALUES lists to
> physically remove the all-DEFAULT column, and adjust the target column
> list accordingly --- that is, make a parse-time transformation of
>         INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT);
> into
>         INSERT INTO gtest0(a) VALUES (1), (2);
> That'd have the advantage that you'd not have to hack up the
> rewriter at all.

I think it's actually easier to just do it all in the rewriter -- at
the point where we see that we're about to insert potentially illegal
values from a VALUES RTE into a generated column, scan it to see if
all the values in that column are DEFAULTs, and if so trigger the
existing code to replace the Var in the tlist with the generated
column expression. That way we only do extra work in the case for
which we're currently throwing an error, rather than for every query.
Also, I think that scanning the VALUES lists in this way is likely to
be cheaper than rebuilding them to remove a column.

Attached is a patch doing it that way, along with additional
regression tests that trigger both the original error and the
sanity-check error triggered by INSERT ... OVERRIDING USER VALUES. I
also added a few additional comments where I found the existing code a
little non-obvious.

I haven't touched the existing error messages. I think that's a
subject for a separate patch.

Regards,
Dean

Attachment

fix-generated-cols-with-defaults-in-values-rte.patch

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From

Tom Lane

Date:

22 November 2020, 20:58:05

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> I think it's actually easier to just do it all in the rewriter -- at
> the point where we see that we're about to insert potentially illegal
> values from a VALUES RTE into a generated column, scan it to see if
> all the values in that column are DEFAULTs, and if so trigger the
> existing code to replace the Var in the tlist with the generated
> column expression. That way we only do extra work in the case for
> which we're currently throwing an error, rather than for every query.

That makes sense, and it leads to a nicely small patch.  I reviewed
this and pushed it.  I found only one nitpicky bug: in
findDefaultOnlyColumns, the test must be bms_is_empty(default_only_cols)
not just default_only_cols == NULL, or it will fail to fall out early
as intended when the first row contains some DEFAULTs but later rows
don't.  I did tweak some of the commentary, too.

> I haven't touched the existing error messages. I think that's a
> subject for a separate patch.

Fair.  After looking around a bit, I think that getting an error
cursor as I'd speculated about is more trouble than it's worth.
For starters, we'd have to pass down the query string into this
code, and there might be some ticklish issues about whether a given
chunk of parsetree came from that string or from some rule or view.
However, I think that just adjusting the error string would be
helpful, as attached.

(I'm also wondering why the second case is generic ERRCODE_SYNTAX_ERROR
and not ERRCODE_GENERATED_ALWAYS.  Didn't change it here, though.)

            regards, tom lane

diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 839583f834..00a48686c4 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -861,7 +861,7 @@ rewriteTargetListIU(List *targetList,
                     if (!apply_default)
                         ereport(ERROR,
                                 (errcode(ERRCODE_GENERATED_ALWAYS),
-                                 errmsg("cannot insert into column \"%s\"",
+                                 errmsg("cannot insert a non-DEFAULT value into column \"%s\"",
                                         NameStr(att_tup->attname)),
                                  errdetail("Column \"%s\" is an identity column defined as GENERATED ALWAYS.",
                                            NameStr(att_tup->attname)),
@@ -900,7 +900,7 @@ rewriteTargetListIU(List *targetList,
                 if (!apply_default)
                     ereport(ERROR,
                             (errcode(ERRCODE_SYNTAX_ERROR),
-                             errmsg("cannot insert into column \"%s\"",
+                             errmsg("cannot insert a non-DEFAULT value into column \"%s\"",
                                     NameStr(att_tup->attname)),
                              errdetail("Column \"%s\" is a generated column.",
                                        NameStr(att_tup->attname))));
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index 559fafa09e..ca721d38bf 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -93,16 +93,16 @@ LINE 1: ...E gtest_err_8 (a int PRIMARY KEY, b int GENERATED BY DEFAULT...
 INSERT INTO gtest1 VALUES (1);
 INSERT INTO gtest1 VALUES (2, DEFAULT);  -- ok
 INSERT INTO gtest1 VALUES (3, 33);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1 VALUES (3, 33), (4, 44);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1 VALUES (3, DEFAULT), (4, 44);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1 VALUES (3, 33), (4, DEFAULT);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1 VALUES (3, DEFAULT), (4, DEFAULT);  -- ok
 SELECT * FROM gtest1 ORDER BY a;
@@ -193,25 +193,25 @@ SELECT * FROM gtest1v;
 (1 row)

 INSERT INTO gtest1v VALUES (4, 8);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1v VALUES (5, DEFAULT);  -- ok
 INSERT INTO gtest1v VALUES (6, 66), (7, 77);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1v VALUES (6, DEFAULT), (7, 77);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1v VALUES (6, 66), (7, DEFAULT);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1v VALUES (6, DEFAULT), (7, DEFAULT);  -- ok
 ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
 INSERT INTO gtest1v VALUES (8, DEFAULT);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT);  -- error
-ERROR:  cannot insert into column "b"
+ERROR:  cannot insert a non-DEFAULT value into column "b"
 DETAIL:  Column "b" is a generated column.
 SELECT * FROM gtest1v;
  a | b
diff --git a/src/test/regress/expected/identity.out b/src/test/regress/expected/identity.out
index 7e36b36ecd..fbca0333a2 100644
--- a/src/test/regress/expected/identity.out
+++ b/src/test/regress/expected/identity.out
@@ -107,20 +107,20 @@ SELECT * FROM itest4;
 -- VALUES RTEs
 CREATE TABLE itest5 (a int generated always as identity, b text);
 INSERT INTO itest5 VALUES (1, 'a');  -- error
-ERROR:  cannot insert into column "a"
+ERROR:  cannot insert a non-DEFAULT value into column "a"
 DETAIL:  Column "a" is an identity column defined as GENERATED ALWAYS.
 HINT:  Use OVERRIDING SYSTEM VALUE to override.
 INSERT INTO itest5 VALUES (DEFAULT, 'a');  -- ok
 INSERT INTO itest5 VALUES (2, 'b'), (3, 'c');  -- error
-ERROR:  cannot insert into column "a"
+ERROR:  cannot insert a non-DEFAULT value into column "a"
 DETAIL:  Column "a" is an identity column defined as GENERATED ALWAYS.
 HINT:  Use OVERRIDING SYSTEM VALUE to override.
 INSERT INTO itest5 VALUES (DEFAULT, 'b'), (3, 'c');  -- error
-ERROR:  cannot insert into column "a"
+ERROR:  cannot insert a non-DEFAULT value into column "a"
 DETAIL:  Column "a" is an identity column defined as GENERATED ALWAYS.
 HINT:  Use OVERRIDING SYSTEM VALUE to override.
 INSERT INTO itest5 VALUES (2, 'b'), (DEFAULT, 'c');  -- error
-ERROR:  cannot insert into column "a"
+ERROR:  cannot insert a non-DEFAULT value into column "a"
 DETAIL:  Column "a" is an identity column defined as GENERATED ALWAYS.
 HINT:  Use OVERRIDING SYSTEM VALUE to override.
 INSERT INTO itest5 VALUES (DEFAULT, 'b'), (DEFAULT, 'c');  -- ok
@@ -197,7 +197,7 @@ SELECT * FROM itest1;
 -- GENERATED ALWAYS
 -- This is an error:
 INSERT INTO itest2 VALUES (10, 'xyz');
-ERROR:  cannot insert into column "a"
+ERROR:  cannot insert a non-DEFAULT value into column "a"
 DETAIL:  Column "a" is an identity column defined as GENERATED ALWAYS.
 HINT:  Use OVERRIDING SYSTEM VALUE to override.
 -- This inserts the row as presented:
@@ -313,7 +313,7 @@ SELECT * FROM itestv10;
 (4 rows)

 INSERT INTO itestv11 VALUES (10, 'xyz');
-ERROR:  cannot insert into column "a"
+ERROR:  cannot insert a non-DEFAULT value into column "a"
 DETAIL:  Column "a" is an identity column defined as GENERATED ALWAYS.
 HINT:  Use OVERRIDING SYSTEM VALUE to override.
 INSERT INTO itestv11 OVERRIDING SYSTEM VALUE VALUES (11, 'xyz');
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 6a977006ef..24905332b1 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1475,10 +1475,10 @@ INSERT INTO rw_view1 (id) VALUES (2);
 INSERT INTO base_tbl (id, idplus1) VALUES (3, DEFAULT);
 INSERT INTO rw_view1 (id, idplus1) VALUES (4, DEFAULT);
 INSERT INTO base_tbl (id, idplus1) VALUES (5, 6);  -- error
-ERROR:  cannot insert into column "idplus1"
+ERROR:  cannot insert a non-DEFAULT value into column "idplus1"
 DETAIL:  Column "idplus1" is a generated column.
 INSERT INTO rw_view1 (id, idplus1) VALUES (6, 7);  -- error
-ERROR:  cannot insert into column "idplus1"
+ERROR:  cannot insert a non-DEFAULT value into column "idplus1"
 DETAIL:  Column "idplus1" is a generated column.
 SELECT * FROM base_tbl;
  id | idplus1

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From

Dean Rasheed

Date:

23 November 2020, 14:45:18

On Sun, 22 Nov 2020 at 20:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I found only one nitpicky bug: in
> findDefaultOnlyColumns, the test must be bms_is_empty(default_only_cols)
> not just default_only_cols == NULL, or it will fail to fall out early
> as intended when the first row contains some DEFAULTs but later rows
> don't.

Ah, good point. Thanks for fixing that.

> > I haven't touched the existing error messages. I think that's a
> > subject for a separate patch.
>
> Fair.  After looking around a bit, I think that getting an error
> cursor as I'd speculated about is more trouble than it's worth.
> For starters, we'd have to pass down the query string into this
> code, and there might be some ticklish issues about whether a given
> chunk of parsetree came from that string or from some rule or view.
> However, I think that just adjusting the error string would be
> helpful, as attached.

+1

> (I'm also wondering why the second case is generic ERRCODE_SYNTAX_ERROR
> and not ERRCODE_GENERATED_ALWAYS.  Didn't change it here, though.)

I can't see any reason for it to be different, and
ERRCODE_GENERATED_ALWAYS seems like the right code to use for both
cases.

Regards,
Dean

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From

Tom Lane

Date:

23 November 2020, 16:16:21

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On Sun, 22 Nov 2020 at 20:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> However, I think that just adjusting the error string would be
>> helpful, as attached.

> +1

>> (I'm also wondering why the second case is generic ERRCODE_SYNTAX_ERROR
>> and not ERRCODE_GENERATED_ALWAYS.  Didn't change it here, though.)

> I can't see any reason for it to be different, and
> ERRCODE_GENERATED_ALWAYS seems like the right code to use for both
> cases.

Sounds good to me; pushed that way.

            regards, tom lane