Thread: Generated column is not updated (Postgres 13)
Hello,
I would like to report the following:
- a generated column is never updated if you pass the whole record to a stored procedure (an immutable function);
- however, it works fine if you pass individual columns to the function;
- both options work fine on Postgres 12.
I consider it a backward compatibility issue.
Please correct me if I am wrong, but I was not able to find anything related to this in the release notes for 13.x.
Please correct me if I am wrong, but I was not able to find anything related to this in the release notes for 13.x.
The background of this issue.
During the upgrade from 12.5 to 13.3 with pg_upgrade I had to drop a generated column and some other stuff. After the upgrade was successfully done, I needed to restore everything back. When I was adding the generated column all of a sudden I got a "Segmentation fault". I retried a few times with slightly different variants of the code, but each time I would get the same result - server crashed. This has been in production for many months now and never caused any issue.
2021-05-18 19:30:46 GMT [235415-9] LOG: server process (PID 235429) was terminated by signal 11: Segmentation fault
2021-05-18 19:30:46 GMT [235415-10] DETAIL: Failed process was running: ALTER TABLE ordering.requests_3pp ADD unique_hash bytea GENERATED ALWAYS AS (fn_ordering.calc_3pp_req_hash(requests_3pp.*)) STORED NOT NULL;
2021-05-18 19:30:46 GMT [235415-11] LOG: terminating any other active server processes
2021-05-18 19:30:46 GMT [235422-1] WARNING: terminating connection because of crash of another server process
2021-05-18 19:30:46 GMT [235422-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-05-18 19:30:46 GMT [235422-3] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-05-18 19:30:46 GMT [235415-12] LOG: archiver process (PID 235423) exited with exit code 2
2021-05-18 19:30:46 GMT [235473-1] postgres@scas_acceptance FATAL: the database system is in recovery mode
2021-05-18 19:30:46 GMT [235415-13] LOG: all server processes terminated; reinitializing
2021-05-18 19:30:46 GMT [235474-1] LOG: database system was interrupted; last known up at 2021-05-18 19:29:47 GMT
2021-05-18 19:30:46 GMT [235474-2] LOG: database system was not properly shut down; automatic recovery in progress
2021-05-18 19:30:46 GMT [235474-3] LOG: redo starts at C5B/E70000A0
2021-05-18 19:30:46 GMT [235474-4] LOG: redo done at C5B/E706FF60
2021-05-18 19:30:46 GMT [235415-14] LOG: database system is ready to accept connections
2021-05-18 19:30:46 GMT [235415-10] DETAIL: Failed process was running: ALTER TABLE ordering.requests_3pp ADD unique_hash bytea GENERATED ALWAYS AS (fn_ordering.calc_3pp_req_hash(requests_3pp.*)) STORED NOT NULL;
2021-05-18 19:30:46 GMT [235415-11] LOG: terminating any other active server processes
2021-05-18 19:30:46 GMT [235422-1] WARNING: terminating connection because of crash of another server process
2021-05-18 19:30:46 GMT [235422-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-05-18 19:30:46 GMT [235422-3] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-05-18 19:30:46 GMT [235415-12] LOG: archiver process (PID 235423) exited with exit code 2
2021-05-18 19:30:46 GMT [235473-1] postgres@scas_acceptance FATAL: the database system is in recovery mode
2021-05-18 19:30:46 GMT [235415-13] LOG: all server processes terminated; reinitializing
2021-05-18 19:30:46 GMT [235474-1] LOG: database system was interrupted; last known up at 2021-05-18 19:29:47 GMT
2021-05-18 19:30:46 GMT [235474-2] LOG: database system was not properly shut down; automatic recovery in progress
2021-05-18 19:30:46 GMT [235474-3] LOG: redo starts at C5B/E70000A0
2021-05-18 19:30:46 GMT [235474-4] LOG: redo done at C5B/E706FF60
2021-05-18 19:30:46 GMT [235415-14] LOG: database system is ready to accept connections
I started digging, each time simplifying something, and eventually I managed to make it working. I created a simple procedure which returns "id::text::bytea", then I added a new column and then I recreated the procedure with the initial business logic. And that's the reason I am reporting it, because it turned out that with Postgres 13 a generated column is never updated if you pass the whole record to the stored procedure. Like I said, it works fine in production on Postgres 12.5. As a workaround I will use a trigger, calling the same function until it's fixed.
I've run my tests against 13.2 and 13.3 (see below). But actually I think the best and easiest way of reproducing it would be using the official Docker images from Docker Hub. You can be absolutely sure it's executed in the exact same environment as I have, and that I don't use any specific configuration options etc. However, it's at your choice. This is how you can create the containers and start "psql" inside them:
Postgres 13.3:
$ docker pull postgres:13.3$ docker run -d --name pg13 -e POSTGRES_PASSWORD=postgres postgres:13.3
$ docker exec -it -u postgres pg13 psql
Postgres 12.7:
$ docker pull postgres:12.7
$ docker run -d --name pg12 -e POSTGRES_PASSWORD=postgres postgres:12.7
$ docker exec -it -u postgres pg12 psql
And then you can run the following SQL test case:
-----------------------------------------------------------
create table t(a text, b int);insert into t values ('A', 1), ('B', 2), ('C', 3);
create or replace function calc_gen_plain(text, int) returns text
language sql immutable as
$$ select $1||'-'||$2; $$;
create or replace function calc_gen_rec_sql(t) returns text
language sql immutable as
$$ select $1.a||'-'||$1.b; $$;
create or replace function calc_gen_rec_plpgsql(t) returns text
language plpgsql immutable as
$$ begin
raise notice '[plpgsql] a=%, b=%', $1.a, $1.b;
return $1.a||'-'||$1.b;
end; $$;
alter table t add gen_val text not null generated always as (calc_gen_plain(a, b)) stored;
alter table t add gen_rec1 text not null generated always as (calc_gen_rec_sql(t)) stored;
alter table t add gen_rec2 text not null generated always as (calc_gen_rec_plpgsql(t)) stored;
select t.* from t;
update t set a = chr(ascii(a) + 3), b = b + 3;
select t.* from t;
-----------------------------------------------------------
As a result, after the UPDATE command I expect all generated columns to contain the same value (within each row), and that's what I actually get with Postgres 12:
a | b | gen_val | gen_rec1 | gen_rec2
---+---+---------+----------+----------
D | 4 | D-4 | D-4 | D-4
E | 5 | E-5 | E-5 | E-5
F | 6 | F-6 | F-6 | F-6
---+---+---------+----------+----------
D | 4 | D-4 | D-4 | D-4
E | 5 | E-5 | E-5 | E-5
F | 6 | F-6 | F-6 | F-6
But with Postgres 13 the two last columns are not updated.
Moreover, the "raise notice" statement in the calc_gen_rec_plpgsql() function is not executed for the "UPDATE" command, so I think the function is simply not called at all.
Moreover, the "raise notice" statement in the calc_gen_rec_plpgsql() function is not executed for the "UPDATE" command, so I think the function is simply not called at all.
a | b | gen_val | gen_rec1 | gen_rec2
---+---+---------+----------+----------
D | 4 | D-4 | A-1 | A-1
E | 5 | E-5 | B-2 | B-2
F | 6 | F-6 | C-3 | C-3
---+---+---------+----------+----------
D | 4 | D-4 | A-1 | A-1
E | 5 | E-5 | B-2 | B-2
F | 6 | F-6 | C-3 | C-3
Please look into the attachments for detailed output.
Besides, below you will find some info about my environments where I've got the initial "Segmentation fault" issue.
Thank you!
Postgres 13.2 installed from standard binary packages.
$ apt list --installed | grep postgres
postgresql-13/now 13.2-1.pgdg18.04+1 amd64 [installed,upgradable to: 13.3-1.pgdg18.04+1]
postgresql-13-cron/now 1.3.0-2.pgdg18.04+1 amd64 [installed,upgradable to: 1.3.1-1.pgdg18.04+1]
postgresql-13-mysql-fdw/bionic-pgdg,now 2.5.5-2.pgdg18.04+1 amd64 [installed]
postgresql-13-repack/bionic-pgdg,now 1.4.6-1.pgdg18.04+1 amd64 [installed]
postgresql-client-13/now 13.2-1.pgdg18.04+1 amd64 [installed,upgradable to: 13.3-1.pgdg18.04+1]
postgresql-client-common/now 225.pgdg18.04+1 all [installed,upgradable to: 226.pgdg18.04+1]
postgresql-common/now 225.pgdg18.04+1 all [installed,upgradable to: 226.pgdg18.04+1]
postgresql-13/now 13.2-1.pgdg18.04+1 amd64 [installed,upgradable to: 13.3-1.pgdg18.04+1]
postgresql-13-cron/now 1.3.0-2.pgdg18.04+1 amd64 [installed,upgradable to: 1.3.1-1.pgdg18.04+1]
postgresql-13-mysql-fdw/bionic-pgdg,now 2.5.5-2.pgdg18.04+1 amd64 [installed]
postgresql-13-repack/bionic-pgdg,now 1.4.6-1.pgdg18.04+1 amd64 [installed]
postgresql-client-13/now 13.2-1.pgdg18.04+1 amd64 [installed,upgradable to: 13.3-1.pgdg18.04+1]
postgresql-client-common/now 225.pgdg18.04+1 all [installed,upgradable to: 226.pgdg18.04+1]
postgresql-common/now 225.pgdg18.04+1 all [installed,upgradable to: 226.pgdg18.04+1]
$ uname -a
Linux elxajw3dxz1 5.4.0-73-generic #82~18.04.1-Ubuntu SMP Fri Apr 16 15:10:02 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux elxajw3dxz1 5.4.0-73-generic #82~18.04.1-Ubuntu SMP Fri Apr 16 15:10:02 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
Release: 18.04
Codename: bionic
Postgres 13.3 installed from binary packages "13.3-2PGDG.rhel8"
posted on 18 May (https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-8-x86_64/):
postgresql13-13.3-2PGDG.rhel8.x86_64.rpm
postgresql13-contrib-13.3-2PGDG.rhel8.x86_64.rpm
postgresql13-libs-13.3-2PGDG.rhel8.x86_64.rpm
postgresql13-server-13.3-2PGDG.rhel8.x86_64.rpm
Linux seliiudb01107 4.18.0-193.19.1.el8_2.x86_64 #1 SMP Wed Aug 26 15:29:02 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description: Red Hat Enterprise Linux release 8.2 (Ootpa)
Release: 8.2
Codename: Ootpa
Distributor ID: RedHatEnterprise
Description: Red Hat Enterprise Linux release 8.2 (Ootpa)
Release: 8.2
Codename: Ootpa
Regards,
Vitaly Ustinov
Attachment
Vitaly Ustinov <vitaly@ustinov.ca> writes: > I would like to report the following: > - a generated column is never updated if you pass the whole record to a > stored procedure (an immutable function); TBH, I think that this is insane and needs to be forbidden. What value are you expecting that the function would see in the column of the whole-row var that corresponds to the generated column? It surely cannot be passed the value that it hasn't computed yet. You could perhaps argue that it'd be okay to pass NULL. The problem with that, though, is that it'd violate the NOT NULL constraint that exists for the generated column. Quite aside from any confusion that ensues at the user level, I'm afraid that that could result in C code crashes --- there are, for example, places in tuple deforming that assume that NOT NULL constraints are truthful. Anyway, I tried your test case on a debug build and observed assertion failures, which I traced to ATRewriteTable not being careful enough to zero out the whole tuple each time through. Conceivably we could fix that as per the attached quick hack. However, I think we ought to disallow the case instead. I observe that we already disallow generated columns depending on each other: regression=# create table foo (f1 int); CREATE TABLE regression=# alter table foo add f2 int generated always as (f1+1) stored; ALTER TABLE regression=# alter table foo add f3 int generated always as (f2+1) stored; ERROR: cannot use generated column "f2" in column generation expression DETAIL: A generated column cannot reference another generated column. But a whole-row var violates this concept completely: it makes the generated column depend, not only on every other column, but on itself too. Also, even if you don't mind null-for-not-yet-computed-value, that would expose the computation order of the generated columns. regards, tom lane diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index ebc62034d2..4178cde3b2 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -5699,16 +5699,6 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) table_slot_callbacks(oldrel)); newslot = MakeSingleTupleTableSlot(newTupDesc, table_slot_callbacks(newrel)); - - /* - * Set all columns in the new slot to NULL initially, to ensure - * columns added as part of the rewrite are initialized to NULL. - * That is necessary as tab->newvals will not contain an - * expression for columns with a NULL default, e.g. when adding a - * column without a default together with a column with a default - * requiring an actual rewrite. - */ - ExecStoreAllNullTuple(newslot); } else { @@ -5747,9 +5737,21 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) if (tab->rewrite > 0) { + /* + * Set all columns in the new slot to NULL initially, to + * ensure columns added as part of the rewrite are initialized + * to NULL. That is necessary as tab->newvals will not + * contain an expression for columns with a NULL default, e.g. + * when adding a column without a default together with a + * column with a default requiring an actual rewrite. + * Moreover, we must do it each time through, so that the slot + * contents are sane in case any generated expression contains + * a whole-row Var (which will read this same slot). + */ + ExecStoreAllNullTuple(newslot); + /* Extract data from old tuple */ slot_getallattrs(oldslot); - ExecClearTuple(newslot); /* copy attributes */ memcpy(newslot->tts_values, oldslot->tts_values, @@ -5782,12 +5784,12 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) &newslot->tts_isnull[ex->attnum - 1]); } - ExecStoreVirtualTuple(newslot); - /* * Now, evaluate any expressions whose inputs come from the * new tuple. We assume these columns won't reference each - * other, so that there's no ordering dependency. + * other, so that there's no ordering dependency. However, + * because we allow whole-row references in GENERATED + * expressions, that's not strictly true ... */ econtext->ecxt_scantuple = newslot;
On Wed, May 19, 2021 at 3:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Vitaly Ustinov <vitaly@ustinov.ca> writes:
> I would like to report the following:
> - a generated column is never updated if you pass the whole record to a
> stored procedure (an immutable function);
You could perhaps argue that it'd be okay to pass NULL. The problem
with that, though, is that it'd violate the NOT NULL constraint that
exists for the generated column. Quite aside from any confusion that
ensues at the user level, I'm afraid that that could result in C code
crashes --- there are, for example, places in tuple deforming that
assume that NOT NULL constraints are truthful.
Knowing why version 12 worked would be helpful in judging whether to live with past decisions, fix poorly made previous decisions, or just move forward with the new behavior in v14 and leave the past alone. I agree that the better design choice would have us forbid this, though that seems problematic in its own right. The function itself is defined correctly and aside from the fact it is in a generated expression it is also called correctly. I suppose the error would be of the form "cannot form a valid whole-row var due to unspecified generated column data"?
FWIW I can definitely see where the OP is coming from with this - the expression at first blush, if one assumes PostgreSQL can handle the nuances, seems like a perfectly reasonable semantic way to express the programmer's desire. Combined with the fact it used to work makes me want to lean toward keeping it working even if it takes come code hackery.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > FWIW I can definitely see where the OP is coming from with this - the > expression at first blush, if one assumes PostgreSQL can handle the > nuances, seems like a perfectly reasonable semantic way to express the > programmer's desire. Combined with the fact it used to work makes me want > to lean toward keeping it working even if it takes come code hackery. I dunno, I think it "used to work" only for exceedingly small values of "work". For me, the given test case hits the same assertion failure in all versions back to v12, which is unsurprising because the code in ATRewriteTable is about the same. Also, considering only the table-rewrite code path is a mistake. The fundamental problem here is that the behavior is ill-defined and necessarily implementation-dependent; which makes it likely that other code paths behave inconsistently with ALTER TABLE ADD COLUMN. regards, tom lane
I wrote: > ... I think we ought > to disallow the case instead. I observe that we already disallow > generated columns depending on each other: ... > But a whole-row var violates this concept completely: it makes the > generated column depend, not only on every other column, but on itself > too. Also, even if you don't mind null-for-not-yet-computed-value, > that would expose the computation order of the generated columns. After actually looking at the code involved, I'm even more on the warpath. Not only is it failing to reject whole-row vars, but it's failing to reject system columns. That is (a) infeasible to support, given that we don't know the values of the system columns at the time we compute generated expressions, and (b) just plain ludicrous in expressions that are required to be immutable. I see that there is actually a regression test case that believes that "tableoid" should be allowed, but I think that is nonsense. In the first place, it's impossible to claim that tableoid is an immutable expression. Consider, say, "tableoid > 30000". Do you think such a column is likely to survive dump-and-reload unchanged? Also, while that example is artificial, I'm having a hard time coming up with realistic immutable use-cases for generation expressions involving tableoid. In the second place, there are a bunch of implementation dependencies that we'd have to fix if we want to consider that supported. I think it's mostly accidental that the case seems to work in the mainline INSERT code path. It's not hard to find cases where it does not work, for example regression=# create table foo (f1 int); CREATE TABLE regression=# insert into foo values (1); INSERT 0 1 regression=# alter table foo add column f2 oid GENERATED ALWAYS AS (tableoid) STORED; ALTER TABLE regression=# table foo; f1 | f2 ----+---- 1 | 0 (1 row) So I think we should just forbid tableoid along with other system columns, as attached. regards, tom lane diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 431e62e389..53ff26774a 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -3020,15 +3020,29 @@ check_nested_generated_walker(Node *node, void *context) AttrNumber attnum; relid = rt_fetch(var->varno, pstate->p_rtable)->relid; + if (!OidIsValid(relid)) + return false; /* XXX shouldn't we raise an error? */ + attnum = var->varattno; - if (OidIsValid(relid) && AttributeNumberIsValid(attnum) && get_attgenerated(relid, attnum)) + if (attnum > 0 && get_attgenerated(relid, attnum)) ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("cannot use generated column \"%s\" in column generation expression", get_attname(relid, attnum, false)), errdetail("A generated column cannot reference another generated column."), parser_errposition(pstate, var->location))); + if (attnum == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot use whole-row variable in column generation expression"), + parser_errposition(pstate, var->location))); + if (attnum < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot use system column \"%s\" in column generation expression", + get_attname(relid, attnum, false)), + parser_errposition(pstate, var->location))); return false; } diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out index 675773f0c1..896b568165 100644 --- a/src/test/regress/expected/generated.out +++ b/src/test/regress/expected/generated.out @@ -46,6 +46,16 @@ ERROR: cannot use generated column "b" in column generation expression LINE 1: ...AYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (b * 3) STO... ^ DETAIL: A generated column cannot reference another generated column. +-- a whole-row var is a self-reference on steroids +CREATE TABLE gtest_err_2c (a int PRIMARY KEY, b int GENERATED ALWAYS AS (length(gtest_err_2c::text)) STORED); +ERROR: cannot use whole-row variable in column generation expression +LINE 1: ...nt PRIMARY KEY, b int GENERATED ALWAYS AS (length(gtest_err_... + ^ +-- no system columns, either, as those aren't known when generation occurs +CREATE TABLE gtest_err_2d (a int PRIMARY KEY, b oid GENERATED ALWAYS AS (tableoid) STORED); +ERROR: cannot use system column "tableoid" in column generation expression +LINE 1: ...2d (a int PRIMARY KEY, b oid GENERATED ALWAYS AS (tableoid) ... + ^ -- invalid reference CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STORED); ERROR: column "c" does not exist @@ -452,19 +462,6 @@ SELECT * FROM gtest4; DROP TABLE gtest4; DROP TYPE double_int; --- using tableoid is allowed -CREATE TABLE gtest_tableoid ( - a int PRIMARY KEY, - b bool GENERATED ALWAYS AS (tableoid <> 0) STORED -); -INSERT INTO gtest_tableoid VALUES (1), (2); -SELECT * FROM gtest_tableoid; - a | b ----+--- - 1 | t - 2 | t -(2 rows) - -- drop column behavior CREATE TABLE gtest10 (a int PRIMARY KEY, b int, c int GENERATED ALWAYS AS (b * 2) STORED); ALTER TABLE gtest10 DROP COLUMN b; diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql index 63251c443a..69c334742f 100644 --- a/src/test/regress/sql/generated.sql +++ b/src/test/regress/sql/generated.sql @@ -17,6 +17,11 @@ CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) S -- references to other generated columns, including self-references CREATE TABLE gtest_err_2a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (b * 2) STORED); CREATE TABLE gtest_err_2b (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (b * 3)STORED); +-- a whole-row var is a self-reference on steroids +CREATE TABLE gtest_err_2c (a int PRIMARY KEY, b int GENERATED ALWAYS AS (length(gtest_err_2c::text)) STORED); + +-- no system columns, either, as those aren't known when generation occurs +CREATE TABLE gtest_err_2d (a int PRIMARY KEY, b oid GENERATED ALWAYS AS (tableoid) STORED); -- invalid reference CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STORED); @@ -215,14 +220,6 @@ SELECT * FROM gtest4; DROP TABLE gtest4; DROP TYPE double_int; --- using tableoid is allowed -CREATE TABLE gtest_tableoid ( - a int PRIMARY KEY, - b bool GENERATED ALWAYS AS (tableoid <> 0) STORED -); -INSERT INTO gtest_tableoid VALUES (1), (2); -SELECT * FROM gtest_tableoid; - -- drop column behavior CREATE TABLE gtest10 (a int PRIMARY KEY, b int, c int GENERATED ALWAYS AS (b * 2) STORED); ALTER TABLE gtest10 DROP COLUMN b;
Hi, Thank you very much for the quick response and feedback. I completely understand your point, Tom. And I can go back to using triggers instead. After all, this whole "generated columns" feature is just syntax sugar. In my real case, the function accepts a row containing dozens of columns and returns a SHA-1 hash that must be unique, following pretty sophisticated business logic. Something like: if type = X and subtype = Y then combine these fields, else if ... and so on. That's why it's so convenient to pass the whole row. For the record, I think that passing NULL as a value for all generated columns would not be such a bad idea, because that's exactly what NULL represents - an unknown value. And I agree that it would be insane to rely on the order of calculation, if someone decided to read a value that is still being computed. It reminds me of the famous "mutating table" issue while using triggers. As to the "NOT NULL" and other sorts of constraints - it's also fine, because integrity constraints are applied later. Just to illustrate my idea: create table foo( id serial, val text, hash bytea not null unique ); insert into foo(val) values('A'); If I had a "before insert on foo for each row" trigger, what would be the initial "hash" value in it? It would be NULL. Can I temporarily assign "NEW.hash := NULL" in this trigger, until I have not yet reached the "return NEW" statement? Yes, I can. Can I temporarily assign a non-unique value to "NEW.hash"? Yes, I can. Anyway, I trust your discretion. Thanks! Regards, Vitaly Ustinov Regards, Vitaly Ustinov On Wed, May 19, 2021 at 9:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > > ... I think we ought > > to disallow the case instead. I observe that we already disallow > > generated columns depending on each other: ... > > But a whole-row var violates this concept completely: it makes the > > generated column depend, not only on every other column, but on itself > > too. Also, even if you don't mind null-for-not-yet-computed-value, > > that would expose the computation order of the generated columns. > > After actually looking at the code involved, I'm even more on the > warpath. Not only is it failing to reject whole-row vars, but it's > failing to reject system columns. That is (a) infeasible to support, > given that we don't know the values of the system columns at the time > we compute generated expressions, and (b) just plain ludicrous in > expressions that are required to be immutable. > > I see that there is actually a regression test case that believes > that "tableoid" should be allowed, but I think that is nonsense. > > In the first place, it's impossible to claim that tableoid is an > immutable expression. Consider, say, "tableoid > 30000". Do you > think such a column is likely to survive dump-and-reload unchanged? > Also, while that example is artificial, I'm having a hard time > coming up with realistic immutable use-cases for generation > expressions involving tableoid. > > In the second place, there are a bunch of implementation dependencies > that we'd have to fix if we want to consider that supported. I think > it's mostly accidental that the case seems to work in the mainline > INSERT code path. It's not hard to find cases where it does not work, > for example > > regression=# create table foo (f1 int); > CREATE TABLE > regression=# insert into foo values (1); > INSERT 0 1 > regression=# alter table foo add column f2 oid GENERATED ALWAYS AS (tableoid) STORED; > ALTER TABLE > regression=# table foo; > f1 | f2 > ----+---- > 1 | 0 > (1 row) > > So I think we should just forbid tableoid along with other system > columns, as attached. > > regards, tom lane >
Tom Lane wrote: > TBH, I think that this is insane and needs to be forbidden. What value > are you expecting that the function would see in the column of the > whole-row var that corresponds to the generated column? It surely > cannot be passed the value that it hasn't computed yet. Perhaps this problem has already been addressed and resolved. https://www.postgresql.org/docs/13/trigger-definition.html > Stored generated columns are computed after BEFORE triggers and > before AFTER triggers. Therefore, the generated value can be inspected > in AFTER triggers. In BEFORE triggers, the OLD row contains the old > generated value, as one would expect, but the NEW row does not yet > contain the new generated value and should not be accessed. > In the C language interface, the content of the column is undefined at > this point; a higher-level programming language should prevent access > to a stored generated column in the NEW row in a BEFORE trigger. To bottom-line, this is the responsibility of the app developers. Disclaimer: we warned you. Tom Lane wrote: > However, I think we ought to disallow the case instead. I observe that > we already disallow generated columns depending on each other. Right, and a function getting a whole-row var can bypass this limitation. But let's take a look at the same documentation page again and check what it says about interdependent triggers: > If more than one trigger is defined for the same event on the same > relation, the triggers will be fired in alphabetical order by trigger name. > In the case of BEFORE and INSTEAD OF triggers, the possibly-modified > row returned by each trigger becomes the input to the next trigger. Similar to the described behavior, a solution could be to calculate the generated columns in alphabetical order. But I'd rather use the same formulation "should not be accessed" and shift the responsibility to the app developers. Regards, Vitaly Ustinov
After looking at this further, I see that there already is a check for system columns in GENERATED expressions; it's just in the parser (scanNSItemForColumn), not where I was looking for it. And it explicitly exempts tableoid. I continue to think that that's a bad idea and we're gonna regret it, but at least the issue in ATRewriteTable turns out to be trivial to fix. So I did that and upgraded the test scenario to be a bit more realistic, in 0001 below. 0002 then just disallows whole-row variables. (I added an errdetail trying to explain that restriction, too.) It's worth pointing out here that what seems to me to be a "more realistic" test scenario involves a regclass constant for the table's own OID. If that doesn't scare you, it should, because it implies all kinds of constraints on the order in which these expressions are processed during CREATE or ALTER TABLE. The test passes check-world, which includes dump/reload, but I am sorely afraid that there are ways in which this sort of thing would fail. Do we *really* want to buy into supporting it? regards, tom lane diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index ebc62034d2..85dcc33063 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -5761,6 +5761,14 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) foreach(lc, dropped_attrs) newslot->tts_isnull[lfirst_int(lc)] = true; + /* + * Constraints and GENERATED expressions might reference the + * tableoid column, so fill tts_tableOid with the desired + * value. (We must do this each time, because it gets + * overwritten with newrel's OID during storing.) + */ + newslot->tts_tableOid = RelationGetRelid(oldrel); + /* * Process supplied expressions to replace selected columns. * @@ -5804,11 +5812,6 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) &newslot->tts_isnull[ex->attnum - 1]); } - /* - * Constraints might reference the tableoid column, so - * initialize t_tableOid before evaluating them. - */ - newslot->tts_tableOid = RelationGetRelid(oldrel); insertslot = newslot; } else diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out index 675773f0c1..71542e95d5 100644 --- a/src/test/regress/expected/generated.out +++ b/src/test/regress/expected/generated.out @@ -64,6 +64,7 @@ ERROR: both identity and generation expression specified for column "b" of tabl LINE 1: ...t PRIMARY KEY, b int GENERATED ALWAYS AS identity GENERATED ... ^ -- reference to system column not allowed in generated column +-- (except tableoid, which we test below) CREATE TABLE gtest_err_6a (a int PRIMARY KEY, b bool GENERATED ALWAYS AS (xmin <> 37) STORED); ERROR: cannot use system column "xmin" in column generation expression LINE 1: ...a (a int PRIMARY KEY, b bool GENERATED ALWAYS AS (xmin <> 37... @@ -455,14 +456,16 @@ DROP TYPE double_int; -- using tableoid is allowed CREATE TABLE gtest_tableoid ( a int PRIMARY KEY, - b bool GENERATED ALWAYS AS (tableoid <> 0) STORED + b bool GENERATED ALWAYS AS (tableoid = 'gtest_tableoid'::regclass) STORED ); INSERT INTO gtest_tableoid VALUES (1), (2); +ALTER TABLE gtest_tableoid ADD COLUMN + c regclass GENERATED ALWAYS AS (tableoid) STORED; SELECT * FROM gtest_tableoid; - a | b ----+--- - 1 | t - 2 | t + a | b | c +---+---+---------------- + 1 | t | gtest_tableoid + 2 | t | gtest_tableoid (2 rows) -- drop column behavior diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql index 63251c443a..914197608b 100644 --- a/src/test/regress/sql/generated.sql +++ b/src/test/regress/sql/generated.sql @@ -29,6 +29,7 @@ CREATE TABLE gtest_err_5a (a int PRIMARY KEY, b int DEFAULT 5 GENERATED ALWAYS A CREATE TABLE gtest_err_5b (a int PRIMARY KEY, b int GENERATED ALWAYS AS identity GENERATED ALWAYS AS (a * 2) STORED); -- reference to system column not allowed in generated column +-- (except tableoid, which we test below) CREATE TABLE gtest_err_6a (a int PRIMARY KEY, b bool GENERATED ALWAYS AS (xmin <> 37) STORED); -- various prohibited constructs @@ -218,9 +219,11 @@ DROP TYPE double_int; -- using tableoid is allowed CREATE TABLE gtest_tableoid ( a int PRIMARY KEY, - b bool GENERATED ALWAYS AS (tableoid <> 0) STORED + b bool GENERATED ALWAYS AS (tableoid = 'gtest_tableoid'::regclass) STORED ); INSERT INTO gtest_tableoid VALUES (1), (2); +ALTER TABLE gtest_tableoid ADD COLUMN + c regclass GENERATED ALWAYS AS (tableoid) STORED; SELECT * FROM gtest_tableoid; -- drop column behavior diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 431e62e389..ba3e1ecf45 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -3020,15 +3020,26 @@ check_nested_generated_walker(Node *node, void *context) AttrNumber attnum; relid = rt_fetch(var->varno, pstate->p_rtable)->relid; + if (!OidIsValid(relid)) + return false; /* XXX shouldn't we raise an error? */ + attnum = var->varattno; - if (OidIsValid(relid) && AttributeNumberIsValid(attnum) && get_attgenerated(relid, attnum)) + if (attnum > 0 && get_attgenerated(relid, attnum)) ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("cannot use generated column \"%s\" in column generation expression", get_attname(relid, attnum, false)), errdetail("A generated column cannot reference another generated column."), parser_errposition(pstate, var->location))); + /* A whole-row Var is necessarily self-referential, so forbid it */ + if (attnum == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot use whole-row variable in column generation expression"), + errdetail("This would cause the generated column to depend on its own value."), + parser_errposition(pstate, var->location))); + /* System columns were already checked in the parser */ return false; } diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out index 71542e95d5..c2e5676196 100644 --- a/src/test/regress/expected/generated.out +++ b/src/test/regress/expected/generated.out @@ -46,6 +46,13 @@ ERROR: cannot use generated column "b" in column generation expression LINE 1: ...AYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (b * 3) STO... ^ DETAIL: A generated column cannot reference another generated column. +-- a whole-row var is a self-reference on steroids, so disallow that too +CREATE TABLE gtest_err_2c (a int PRIMARY KEY, + b int GENERATED ALWAYS AS (num_nulls(gtest_err_2c)) STORED); +ERROR: cannot use whole-row variable in column generation expression +LINE 2: b int GENERATED ALWAYS AS (num_nulls(gtest_err_2c)) STOR... + ^ +DETAIL: This would cause the generated column to depend on its own value. -- invalid reference CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STORED); ERROR: column "c" does not exist diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql index 914197608b..b7eb072671 100644 --- a/src/test/regress/sql/generated.sql +++ b/src/test/regress/sql/generated.sql @@ -17,6 +17,9 @@ CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) S -- references to other generated columns, including self-references CREATE TABLE gtest_err_2a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (b * 2) STORED); CREATE TABLE gtest_err_2b (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (b * 3)STORED); +-- a whole-row var is a self-reference on steroids, so disallow that too +CREATE TABLE gtest_err_2c (a int PRIMARY KEY, + b int GENERATED ALWAYS AS (num_nulls(gtest_err_2c)) STORED); -- invalid reference CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STORED);
In a BEFORE trigger (PL/pgSQL) I can access not yet computed generated columns via the "NEW" whole-row var. I can read from it (NULL), and I can write to it, but the assigned value will be ignored. This behavior seems obvious and well thought through to me. It's not something app developers should do, because it wouldn't make sense, but at least it's not forbidden and the server process does not crash with a seg fault. Is it okay with you that the patch 0002 introduces some inconsistency with that? Regards, Vitaly
Vitaly Ustinov <vitaly@ustinov.ca> writes: > In a BEFORE trigger (PL/pgSQL) I can access not yet computed generated > columns via the "NEW" whole-row var. I can read from it (NULL), and I > can write to it, but the assigned value will be ignored. NEW is not really a whole-row var in the sense that we're discussing here. In any case, yes, nodeModifyTable runs computation of generated columns after firing BEFORE triggers. I suppose that's to prevent the triggers from interfering with those values. > Is it okay with you that the patch 0002 introduces some inconsistency with that? In what way would this introduce (new) inconsistency? I didn't move the computation of those values from where they were. Moreover, so far as I can see, ATRewriteTable doesn't fire BEFORE triggers at all. BTW, while looking around nodeModifyTable, I noted some other gaps in our support for "tableoid" in GENERATED expressions: the FDW code paths in ExecInsert and ExecUpdate fail to fill tts_tableOid till after running ExecComputeStoredGenerated. So they have the same bug that 0001 fixes in ATRewriteTable. It can be fixed the same way, ie move the code that fills tts_tableOid; but I remain very concerned about how many other gaps there are. regards, tom lane