Thread: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE
Hello! According to the docs[1], one may use DEFAULT keyword while inserting into generated columns (stored and identity). However, currently it works only for a single VALUES sublist with DEFAULT for a generated column but not for the case when VALUES RTE is used. This is not being tested and it is broken. I am attaching two patches. One for tests and another one with the proposed changes based on ideas from Andrew on IRC. So if all good there goes the credit where credit is due. If patch is no good, then it is likely my misunderstanding how to put words into code :-) This is my only second patch to PostgreSQL (the first one was rejected) so don't be too harsh :-) It may not be perfect but I am open for a feedback and this is just to get the ball rolling and to let the community know about this issue. Before you ask why would I want to insert DEFAULTs ... well, there are ORMs[2] that still need to be patched and current situation contradicts documentation[1]. Footnotes: [1] https://www.postgresql.org/docs/devel/ddl-generated-columns.html [2] https://github.com/rails/rails/pull/39368#issuecomment-670351379 -- Mikhail
Attachment
Hi
čt 13. 8. 2020 v 6:31 odesílatel Mikhail Titov <mlt@gmx.us> napsal:
Hello!
According to the docs[1], one may use DEFAULT keyword while inserting
into generated columns (stored and identity). However, currently it
works only for a single VALUES sublist with DEFAULT for a generated column
but not for the case when VALUES RTE is used. This is not being tested
and it is broken.
I am attaching two patches. One for tests and another one with the
proposed changes based on ideas from Andrew on IRC. So if all good there
goes the credit where credit is due. If patch is no good, then it is
likely my misunderstanding how to put words into code :-)
This is my only second patch to PostgreSQL (the first one was rejected)
so don't be too harsh :-) It may not be perfect but I am open for a
feedback and this is just to get the ball rolling and to let the
community know about this issue.
Before you ask why would I want to insert DEFAULTs ... well, there are
ORMs[2] that still need to be patched and current situation contradicts
documentation[1].
please, assign your patch to commitfest application
Regards
Pavel
Footnotes:
[1] https://www.postgresql.org/docs/devel/ddl-generated-columns.html
[2] https://github.com/rails/rails/pull/39368#issuecomment-670351379
--
Mikhail
Previously submitted patch got somehow trailing spaces mangled on the way out. This is an attempt to use application/octet-stream MIME instead of text/x-patch to preserve those for regression tests. On Thu, Aug 13, 2020 at 12:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > please, assign your patch to commitfest application Here is the backlink https://commitfest.postgresql.org/29/2681/ -- Mikhail
Attachment
Mikhail Titov <mlt@gmx.us> writes: > Previously submitted patch got somehow trailing spaces mangled on the > way out. This is an attempt to use application/octet-stream MIME instead > of text/x-patch to preserve those for regression tests. I took a quick look through this. I agree with the general idea of detecting cases where all of the entries in a VALUES column are DEFAULT, but the implementation needs work. The cfbot reports that it doesn't compile [1]: parse_relation.c: In function ‘expandNSItemVars’: parse_relation.c:2992:34: error: ‘T_Node’ undeclared (first use in this function) std = list_nth_node(Node, row, colindex); ^ I suspect this indicates that you did not use --enable-cassert in your own testing, which is usually a bad idea; that enables a lot of checks that you really want to have active for development purposes. Hacking expandNSItemVars() for this purpose is an extremely bad idea. The API spec for that is * Produce a list of Vars, and optionally a list of column names, * for the non-dropped columns of the nsitem. This patch breaks that specification, and in turn breaks callers that expect it to be adhered to. I see at least one caller that will suffer assertion failures because of that, which reinforces my suspicion that you did not test with assertions on. I think you'd be better off to make transformInsertStmt(), specifically its multi-VALUES-rows code path, check for all-DEFAULT columns and adjust the tlist itself. Doing it there might be a good bit less inefficient for very long VALUES lists, too, which is a case that we do worry about. Since that's already iterating through the sub-lists, you could track whether all rows seen so far contain SetToDefault in each column position, and avoid extra scans of the sublists. (A BitmapSet might be a convenient representation of that, though you could also use a bool array I suppose.) I do not care for what you did in rewriteValuesRTE() either: just removing a sanity check isn't OK, unless you've done something to make the sanity check unnecessary which you surely have not. Perhaps you could extend the initial scan of the tlist (lines 1294-1310) to notice SetToDefault nodes as well as Var nodes and keep track of which columns have those. Then you could cross-check that one or the other case applies whenever you see a SetToDefault in the VALUES lists. BTW, another thing that needs checking is whether a rule containing an INSERT like this will reverse-list sanely. The whole idea of replacing some of the Vars might not work so far as ruleutils.c is concerned. In that case I think we might have to implement this by having transformInsertStmt restructure the VALUES lists to physically remove the all-DEFAULT column, and adjust the target column list accordingly --- that is, make a parse-time transformation of INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT); into INSERT INTO gtest0(a) VALUES (1), (2); That'd have the advantage that you'd not have to hack up the rewriter at all. Also, in the case where the user has failed to ensure that all the column entries are DEFAULT, I suppose that we'll still get the same error as now: regression=# INSERT INTO gtest0 VALUES (1, DEFAULT), (2, 42); ERROR: cannot insert into column "b" DETAIL: Column "b" is a generated column. This seems fairly confusing and unhelpful. Perhaps it's not this patch's job to improve it, but it'd be nice if we could do better. One easy change would be to make the error message more specific: ERROR: cannot insert a non-DEFAULT value into column "b" (I think this wording is accurate, but I might be wrong.) It'd be even better if we could emit an error cursor pointing at (one of) the entries that are not DEFAULT, since in a command with a long VALUES list it might not be that obvious where you screwed up. FWIW, I would not bother splitting a patch like this into two parts. That increases your effort level, and it increases the reviewer's effort to apply it too, and this patch isn't big enough to justify it. regards, tom lane [1] https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/724543451
On Sun, 6 Sept 2020 at 22:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think you'd be better off to make transformInsertStmt(), specifically > its multi-VALUES-rows code path, check for all-DEFAULT columns and adjust > the tlist itself. Doing it there might be a good bit less inefficient > for very long VALUES lists, too, which is a case that we do worry about. > Since that's already iterating through the sub-lists, you could track > whether all rows seen so far contain SetToDefault in each column position, > and avoid extra scans of the sublists. (A BitmapSet might be a convenient > representation of that, though you could also use a bool array I suppose.) > > I do not care for what you did in rewriteValuesRTE() either: just removing > a sanity check isn't OK, unless you've done something to make the sanity > check unnecessary which you surely have not. Perhaps you could extend > the initial scan of the tlist (lines 1294-1310) to notice SetToDefault > nodes as well as Var nodes and keep track of which columns have those. > Then you could cross-check that one or the other case applies whenever > you see a SetToDefault in the VALUES lists. That's not quite right because by the time rewriteValuesRTE() sees the tlist, it will contain already-rewritten generated column expressions, not SetToDefault nodes. If we're going to keep that sanity check (and I think that we should), I think that the way to do it is to have rewriteTargetListIU() record which columns it has expanded defaults for, and pass that information to rewriteValuesRTE(). Those columns of the VALUES RTE are no longer used in the query, so the sanity check can be amended to ignore them while continuing to check the other columns. Incidentally, there is another way of causing that sanity check to fail -- an INSERT ... OVERRIDING USER VALUE query with some DEFAULTS in the VALUES RTE (but not necessarily all DEFAULTs) will trigger it. So even if the parser completely removed any all-DEFAULT columns from the VALUES RTE, some work in the rewriter would still be necessary. > BTW, another thing that needs checking is whether a rule containing > an INSERT like this will reverse-list sanely. The whole idea of > replacing some of the Vars might not work so far as ruleutils.c is > concerned. In that case I think we might have to implement this > by having transformInsertStmt restructure the VALUES lists to > physically remove the all-DEFAULT column, and adjust the target column > list accordingly --- that is, make a parse-time transformation of > INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT); > into > INSERT INTO gtest0(a) VALUES (1), (2); > That'd have the advantage that you'd not have to hack up the > rewriter at all. I think it's actually easier to just do it all in the rewriter -- at the point where we see that we're about to insert potentially illegal values from a VALUES RTE into a generated column, scan it to see if all the values in that column are DEFAULTs, and if so trigger the existing code to replace the Var in the tlist with the generated column expression. That way we only do extra work in the case for which we're currently throwing an error, rather than for every query. Also, I think that scanning the VALUES lists in this way is likely to be cheaper than rebuilding them to remove a column. Attached is a patch doing it that way, along with additional regression tests that trigger both the original error and the sanity-check error triggered by INSERT ... OVERRIDING USER VALUES. I also added a few additional comments where I found the existing code a little non-obvious. I haven't touched the existing error messages. I think that's a subject for a separate patch. Regards, Dean
Attachment
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > I think it's actually easier to just do it all in the rewriter -- at > the point where we see that we're about to insert potentially illegal > values from a VALUES RTE into a generated column, scan it to see if > all the values in that column are DEFAULTs, and if so trigger the > existing code to replace the Var in the tlist with the generated > column expression. That way we only do extra work in the case for > which we're currently throwing an error, rather than for every query. That makes sense, and it leads to a nicely small patch. I reviewed this and pushed it. I found only one nitpicky bug: in findDefaultOnlyColumns, the test must be bms_is_empty(default_only_cols) not just default_only_cols == NULL, or it will fail to fall out early as intended when the first row contains some DEFAULTs but later rows don't. I did tweak some of the commentary, too. > I haven't touched the existing error messages. I think that's a > subject for a separate patch. Fair. After looking around a bit, I think that getting an error cursor as I'd speculated about is more trouble than it's worth. For starters, we'd have to pass down the query string into this code, and there might be some ticklish issues about whether a given chunk of parsetree came from that string or from some rule or view. However, I think that just adjusting the error string would be helpful, as attached. (I'm also wondering why the second case is generic ERRCODE_SYNTAX_ERROR and not ERRCODE_GENERATED_ALWAYS. Didn't change it here, though.) regards, tom lane diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 839583f834..00a48686c4 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -861,7 +861,7 @@ rewriteTargetListIU(List *targetList, if (!apply_default) ereport(ERROR, (errcode(ERRCODE_GENERATED_ALWAYS), - errmsg("cannot insert into column \"%s\"", + errmsg("cannot insert a non-DEFAULT value into column \"%s\"", NameStr(att_tup->attname)), errdetail("Column \"%s\" is an identity column defined as GENERATED ALWAYS.", NameStr(att_tup->attname)), @@ -900,7 +900,7 @@ rewriteTargetListIU(List *targetList, if (!apply_default) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("cannot insert into column \"%s\"", + errmsg("cannot insert a non-DEFAULT value into column \"%s\"", NameStr(att_tup->attname)), errdetail("Column \"%s\" is a generated column.", NameStr(att_tup->attname)))); diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out index 559fafa09e..ca721d38bf 100644 --- a/src/test/regress/expected/generated.out +++ b/src/test/regress/expected/generated.out @@ -93,16 +93,16 @@ LINE 1: ...E gtest_err_8 (a int PRIMARY KEY, b int GENERATED BY DEFAULT... INSERT INTO gtest1 VALUES (1); INSERT INTO gtest1 VALUES (2, DEFAULT); -- ok INSERT INTO gtest1 VALUES (3, 33); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1 VALUES (3, 33), (4, 44); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1 VALUES (3, DEFAULT), (4, 44); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1 VALUES (3, 33), (4, DEFAULT); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1 VALUES (3, DEFAULT), (4, DEFAULT); -- ok SELECT * FROM gtest1 ORDER BY a; @@ -193,25 +193,25 @@ SELECT * FROM gtest1v; (1 row) INSERT INTO gtest1v VALUES (4, 8); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1v VALUES (5, DEFAULT); -- ok INSERT INTO gtest1v VALUES (6, 66), (7, 77); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1v VALUES (6, DEFAULT), (7, 77); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1v VALUES (6, 66), (7, DEFAULT); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1v VALUES (6, DEFAULT), (7, DEFAULT); -- ok ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100; INSERT INTO gtest1v VALUES (8, DEFAULT); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error -ERROR: cannot insert into column "b" +ERROR: cannot insert a non-DEFAULT value into column "b" DETAIL: Column "b" is a generated column. SELECT * FROM gtest1v; a | b diff --git a/src/test/regress/expected/identity.out b/src/test/regress/expected/identity.out index 7e36b36ecd..fbca0333a2 100644 --- a/src/test/regress/expected/identity.out +++ b/src/test/regress/expected/identity.out @@ -107,20 +107,20 @@ SELECT * FROM itest4; -- VALUES RTEs CREATE TABLE itest5 (a int generated always as identity, b text); INSERT INTO itest5 VALUES (1, 'a'); -- error -ERROR: cannot insert into column "a" +ERROR: cannot insert a non-DEFAULT value into column "a" DETAIL: Column "a" is an identity column defined as GENERATED ALWAYS. HINT: Use OVERRIDING SYSTEM VALUE to override. INSERT INTO itest5 VALUES (DEFAULT, 'a'); -- ok INSERT INTO itest5 VALUES (2, 'b'), (3, 'c'); -- error -ERROR: cannot insert into column "a" +ERROR: cannot insert a non-DEFAULT value into column "a" DETAIL: Column "a" is an identity column defined as GENERATED ALWAYS. HINT: Use OVERRIDING SYSTEM VALUE to override. INSERT INTO itest5 VALUES (DEFAULT, 'b'), (3, 'c'); -- error -ERROR: cannot insert into column "a" +ERROR: cannot insert a non-DEFAULT value into column "a" DETAIL: Column "a" is an identity column defined as GENERATED ALWAYS. HINT: Use OVERRIDING SYSTEM VALUE to override. INSERT INTO itest5 VALUES (2, 'b'), (DEFAULT, 'c'); -- error -ERROR: cannot insert into column "a" +ERROR: cannot insert a non-DEFAULT value into column "a" DETAIL: Column "a" is an identity column defined as GENERATED ALWAYS. HINT: Use OVERRIDING SYSTEM VALUE to override. INSERT INTO itest5 VALUES (DEFAULT, 'b'), (DEFAULT, 'c'); -- ok @@ -197,7 +197,7 @@ SELECT * FROM itest1; -- GENERATED ALWAYS -- This is an error: INSERT INTO itest2 VALUES (10, 'xyz'); -ERROR: cannot insert into column "a" +ERROR: cannot insert a non-DEFAULT value into column "a" DETAIL: Column "a" is an identity column defined as GENERATED ALWAYS. HINT: Use OVERRIDING SYSTEM VALUE to override. -- This inserts the row as presented: @@ -313,7 +313,7 @@ SELECT * FROM itestv10; (4 rows) INSERT INTO itestv11 VALUES (10, 'xyz'); -ERROR: cannot insert into column "a" +ERROR: cannot insert a non-DEFAULT value into column "a" DETAIL: Column "a" is an identity column defined as GENERATED ALWAYS. HINT: Use OVERRIDING SYSTEM VALUE to override. INSERT INTO itestv11 OVERRIDING SYSTEM VALUE VALUES (11, 'xyz'); diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out index 6a977006ef..24905332b1 100644 --- a/src/test/regress/expected/updatable_views.out +++ b/src/test/regress/expected/updatable_views.out @@ -1475,10 +1475,10 @@ INSERT INTO rw_view1 (id) VALUES (2); INSERT INTO base_tbl (id, idplus1) VALUES (3, DEFAULT); INSERT INTO rw_view1 (id, idplus1) VALUES (4, DEFAULT); INSERT INTO base_tbl (id, idplus1) VALUES (5, 6); -- error -ERROR: cannot insert into column "idplus1" +ERROR: cannot insert a non-DEFAULT value into column "idplus1" DETAIL: Column "idplus1" is a generated column. INSERT INTO rw_view1 (id, idplus1) VALUES (6, 7); -- error -ERROR: cannot insert into column "idplus1" +ERROR: cannot insert a non-DEFAULT value into column "idplus1" DETAIL: Column "idplus1" is a generated column. SELECT * FROM base_tbl; id | idplus1
On Sun, 22 Nov 2020 at 20:58, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I found only one nitpicky bug: in > findDefaultOnlyColumns, the test must be bms_is_empty(default_only_cols) > not just default_only_cols == NULL, or it will fail to fall out early > as intended when the first row contains some DEFAULTs but later rows > don't. Ah, good point. Thanks for fixing that. > > I haven't touched the existing error messages. I think that's a > > subject for a separate patch. > > Fair. After looking around a bit, I think that getting an error > cursor as I'd speculated about is more trouble than it's worth. > For starters, we'd have to pass down the query string into this > code, and there might be some ticklish issues about whether a given > chunk of parsetree came from that string or from some rule or view. > However, I think that just adjusting the error string would be > helpful, as attached. +1 > (I'm also wondering why the second case is generic ERRCODE_SYNTAX_ERROR > and not ERRCODE_GENERATED_ALWAYS. Didn't change it here, though.) I can't see any reason for it to be different, and ERRCODE_GENERATED_ALWAYS seems like the right code to use for both cases. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On Sun, 22 Nov 2020 at 20:58, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> However, I think that just adjusting the error string would be >> helpful, as attached. > +1 >> (I'm also wondering why the second case is generic ERRCODE_SYNTAX_ERROR >> and not ERRCODE_GENERATED_ALWAYS. Didn't change it here, though.) > I can't see any reason for it to be different, and > ERRCODE_GENERATED_ALWAYS seems like the right code to use for both > cases. Sounds good to me; pushed that way. regards, tom lane