Thread: Virtual generated columns
Here is a patch set to implement virtual generated columns. Some history first: The original development of generated columns was discussed in [0]. It started with virtual columns, then added stored columns. Before the release of PG12, it was decided that only stored columns were ready, so I cut out virtual columns, and stored generated columns shipped with PG12, which is where we are today. Virtual generated columns are occasionally requested still, and it's a bit of unfinished business for me, too, so I started to resurrect it. What I did here first was to basically reverse interdiff the patches where I cut out virtual generated columns above (this was between patches v8 and v9 in [0]) and clean that up and make it work again. One thing that I needed to decide was how to organize the tests for this. The original patch series had both stored and virtual tests in the same test file src/test/regress/sql/generated.sql. As that file has grown, I think it would have been a mess to weave another whole set of tests into that. So instead I figured I'd make two separate test files src/test/regress/sql/generated_stored.sql (renamed from current) src/test/regress/sql/generated_virtual.sql and kind of try to keep them aligned, similar to how the various collate* tests are handled. So I put that renaming in as preparatory patches. And there are also some other preparatory cleanup patches that I'm including. The main feature patch (0005 here) generally works but has a number of open corner cases that need to be thought about and/or fixed, many of which are marked in the code or the tests. I'll continue working on that. But I wanted to see if I can get some feedback on the test structure, so I don't have to keep changing it around later. [0]: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
Attachment
On Mon, Apr 29, 2024 at 4:24 AM Peter Eisentraut <peter@eisentraut.org> wrote:
Here is a patch set to implement virtual generated columns.
I'm very excited about this!
The main feature patch (0005 here) generally works but has a number of
open corner cases that need to be thought about and/or fixed, many of
which are marked in the code or the tests. I'll continue working on
that. But I wanted to see if I can get some feedback on the test
structure, so I don't have to keep changing it around later.
I'd be very interested to see virtual generated columns working, as one of my past customers had a need to reclassify data in a partitioned table, and the ability to detach a partition, alter the virtual generated columns, and re-attach would have been great. In case you care, it was basically an "expired" flag, but the rules for what data "expired" varied by country of customer and level of service.
+ * Stored generated columns cannot work: They are computed after
+ * BEFORE triggers, but partition routing is done before all
+ * triggers. Maybe virtual generated columns could be made to
+ * work, but then they would need to be handled as an expression
+ * below.
I'd say you nailed it with the test structure. The stored/virtual copy/split is the ideal way to approach this, which makes the diff very easy to understand.
+ * BEFORE triggers, but partition routing is done before all
+ * triggers. Maybe virtual generated columns could be made to
+ * work, but then they would need to be handled as an expression
+ * below.
I'd say you nailed it with the test structure. The stored/virtual copy/split is the ideal way to approach this, which makes the diff very easy to understand.
+1 for not handling domain types yet.
-- generation expression must be immutable-CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision GENERATED ALWAYS AS (random()) STORED);+CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision GENERATED ALWAYS AS (random()) VIRTUAL);
Does a VIRTUAL generated column have to be immutable? I can see where the STORED one has to be, but consider the following:
CREATE TABLE foo (created_at timestamptz DEFAULT CURRENT_TIMESTAMP,row_age interval GENERATED ALWAYS AS CURRENT_TIMESTAMP - created_at);
-- can't have generated column that is a child of normal column
CREATE TABLE gtest_normal (a int, b int);
-CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS (a * 2) STORED) INHERITS (gtest_normal); -- error
+CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) INHERITS (gtest_normal); -- error
This is the barrier to the partitioning reorganization scheme I described above. Is there any hard rule why a child table couldn't have a generated column matching the parent's regular column? I can see where it might prevent indexing that column on the parent table, but is there some other dealbreaker or is this just a "it doesn't work yet" situation?
One last thing to keep in mind is that there are two special case expressions in the spec:
GENERATED ALWAYS AS ROW STARTGENERATED ALWAYS AS ROW END
and we'll need to be able to fit those into the catalog. I'll start another thread for that unless you prefer I keep it here.
On 29.04.24 10:23, Peter Eisentraut wrote: > Here is a patch set to implement virtual generated columns. > The main feature patch (0005 here) generally works but has a number of > open corner cases that need to be thought about and/or fixed, many of > which are marked in the code or the tests. I'll continue working on > that. But I wanted to see if I can get some feedback on the test > structure, so I don't have to keep changing it around later. Here is an updated patch set. It needed some rebasing, especially around the reverting of the catalogued not-null constraints. I have also fixed up the various incomplete or "fixme" pieces of code mentioned above. I have in most cases added "not supported yet" error messages for now, with the idea that some of these things can be added in later, as incremental features. In particular, quoting from the commit message, the following are currently not supported (but could possibly be added as incremental features, some easier than others): - index on virtual column - expression index using a virtual column - hence also no unique constraints on virtual columns - not-null constraints on virtual columns - (check constraints are supported) - foreign key constraints on virtual columns - extended statistics on virtual columns - ALTER TABLE / SET EXPRESSION - ALTER TABLE / DROP EXPRESSION - virtual columns as trigger columns - virtual column cannot have domain type So, I think this basically works now, and the things that don't work should be appropriately prevented. So if someone wants to test this and tell me what in fact doesn't work correctly, that would be helpful.
Attachment
On 29.04.24 20:54, Corey Huinker wrote: > -- generation expression must be immutable > -CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision > GENERATED ALWAYS AS (random()) STORED); > +CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision > GENERATED ALWAYS AS (random()) VIRTUAL); > > Does a VIRTUAL generated column have to be immutable? I can see where > the STORED one has to be, but consider the following: > > CREATE TABLE foo ( > created_at timestamptz DEFAULT CURRENT_TIMESTAMP, > row_age interval GENERATED ALWAYS AS CURRENT_TIMESTAMP - created_at > ); I have been hesitant about this, but I'm now leaning toward that we could allow this. > -- can't have generated column that is a child of normal column > CREATE TABLE gtest_normal (a int, b int); > -CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS > (a * 2) STORED) INHERITS (gtest_normal); -- error > +CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS > (a * 2) VIRTUAL) INHERITS (gtest_normal); -- error > > This is the barrier to the partitioning reorganization scheme I > described above. Is there any hard rule why a child table couldn't have > a generated column matching the parent's regular column? I can see where > it might prevent indexing that column on the parent table, but is there > some other dealbreaker or is this just a "it doesn't work yet" situation? We had a quite a difficult time getting the inheritance business of stored generated columns working correctly. I'm sticking to the well-trodden path here. We can possibly expand this if someone wants to work out the details. > One last thing to keep in mind is that there are two special case > expressions in the spec: > > GENERATED ALWAYS AS ROW START > GENERATED ALWAYS AS ROW END > > and we'll need to be able to fit those into the catalog. I'll start > another thread for that unless you prefer I keep it here. I think this is a separate feature.
On Wed, 2024-05-22 at 19:22 +0200, Peter Eisentraut wrote: > On 29.04.24 10:23, Peter Eisentraut wrote: > > Here is a patch set to implement virtual generated columns. > > > The main feature patch (0005 here) generally works but has a number > > of > > open corner cases that need to be thought about and/or fixed, many > > of > > which are marked in the code or the tests. I'll continue working > > on > > that. But I wanted to see if I can get some feedback on the test > > structure, so I don't have to keep changing it around later. > > Here is an updated patch set. It needed some rebasing, especially > around the reverting of the catalogued not-null constraints. I have > also fixed up the various incomplete or "fixme" pieces of code > mentioned > above. I have in most cases added "not supported yet" error messages > for now, with the idea that some of these things can be added in > later, > as incremental features. > This is not (yet) full review. Patches applied cleanly on 76618097a6c027ec603a3dd143f61098e3fb9794 from 2024-06-14. I've run ./configure && make world-bin && make check && make check-world on 0001, then 0001+0002, then 0001+0002+0003, up to applying all 5 patches. All cases passed on Debian unstable on aarch64 (ARM64) on gcc (Debian 13.2.0-25) 13.2.0. v1-0001-Rename-regress-test-generated-to-generated_stored.patch: no objections here, makes sense as preparation for future changes v1-0002-Put-generated_stored-test-objects-in-a-schema.patch: also no objections. OTOH other tests (like publication.out, rowsecurity.out, stats_ext.out, tablespace.out) are creating schemas and later dropping them - so here it might also make sense to drop schema at the end of testing. v1-0003-Remove-useless-initializations.patch: All other cases (I checked directory src/backend/utils/cache) calling MemoryContextAllocZero only initialize fields when they are non-zero, so removing partial initialization with false brings consistency to the code. v1-0004-Remove-useless-code.patch: Patch removes filling in of constraints from function BuildDescForRelation. This function is only called from file view.c and tablecmds.c (twice). In none of those cases result->constr is used, so proposed change makes sense. While I do not know code well, so might be wrong here, I would apply this patch. I haven't looked at the most important (and biggest) file yet, v1-0005-Virtual-generated-columns.patch; hope to look at it this week. At the same I believe 0001-0004 can be applied - even backported if it'll make maintenance of future changes easier. But that should be commiter's decision. Best regards -- Tomasz Rybak, Debian Developer <serpent@debian.org> GPG: A565 CE64 F866 A258 4DDC F9C7 ECB7 3E37 E887 AA8C
Attachment
On Thu, May 23, 2024 at 1:23 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 29.04.24 10:23, Peter Eisentraut wrote: > > Here is a patch set to implement virtual generated columns. > > > The main feature patch (0005 here) generally works but has a number of > > open corner cases that need to be thought about and/or fixed, many of > > which are marked in the code or the tests. I'll continue working on > > that. But I wanted to see if I can get some feedback on the test > > structure, so I don't have to keep changing it around later. the test structure you made ( generated_stored.sql, generated_virtual.sq) looks ok to me. but do we need to reset the search_path at the end of generated_stored.sql, generated_virtual.sql? most of the test tables didn't use much storage, maybe not necessary to clean up (drop the test table) at the end of sql files. > > So, I think this basically works now, and the things that don't work > should be appropriately prevented. So if someone wants to test this and > tell me what in fact doesn't work correctly, that would be helpful. in https://www.postgresql.org/docs/current/catalog-pg-attrdef.html >>> The catalog pg_attrdef stores column default values. The main information about columns is stored in pg_attribute. Only columns for which a default value has been explicitly set will have an entry here. >> didn't mention generated columns related expressions. Do we need to add something here? maybe a separate issue? + /* + * TODO: This could be done, but it would need a different implementation: + * no rewriting, but still need to recheck any constraints. + */ + if (attTup->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("ALTER TABLE / SET EXPRESSION is not supported for virtual generated columns"), + errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.", + colName, RelationGetRelationName(rel)))); minor typo, should be + errmsg("ALTER TABLE SET EXPRESSION is not supported for virtual generated columns"), insert/update/delete/merge returning have problems: CREATE TABLE t2 ( a int , b int GENERATED ALWAYS AS (a * 2), d int default 22); insert into t2(a) select g from generate_series(1,10) g; insert into t2 select 100 returning *, (select t2.b), t2.b = t2.a * 2; update t2 set a = 12 returning *, (select t2.b), t2.b = t2.a * 2; update t2 set a = 12 returning *, (select (select t2.b)), t2.b = t2.a * 2; delete from t2 where t2.b = t2.a * 2 returning *, 1,((select t2.b)); currently all these query, error message is "unexpected virtual generated column reference" we expect above these query work? issue with merge: CREATE TABLE t0 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL); insert into t0(a) select g from generate_series(1,10) g; MERGE INTO t0 t USING t0 AS s ON 2 * t.a = s.b WHEN MATCHED THEN DELETE returning *; the above query returns zero rows, but for stored generated columns it will return 10 rows. in transformMergeStmt(ParseState *pstate, MergeStmt *stmt) add `qry->hasGeneratedVirtual = pstate->p_hasGeneratedVirtual;` before `assign_query_collations(pstate, qry);` solve the problem.
On 28.06.24 02:00, jian he wrote: > inhttps://www.postgresql.org/docs/current/catalog-pg-attrdef.html > The catalog pg_attrdef stores column default values. The main > information about columns is stored in pg_attribute. Only columns for > which a default value has been explicitly set will have an entry here. > didn't mention generated columns related expressions. > Do we need to add something here? maybe a separate issue? Yes and yes. I have committed a separate change to update the documentation here.
On 17.06.24 21:31, Tomasz Rybak wrote: > v1-0001-Rename-regress-test-generated-to-generated_stored.patch: > no objections here, makes sense as preparation for future changes > > v1-0002-Put-generated_stored-test-objects-in-a-schema.patch: > also no objections. > OTOH other tests (like publication.out, rowsecurity.out, stats_ext.out, > tablespace.out) are creating schemas and later dropping them - so here > it might also make sense to drop schema at the end of testing. The existing tests for generated columns don't drop what they create at the end, which can be useful for pg_upgrade testing for example. So unless there are specific reasons to change it, I would leave that as is. Other tests might have other reasons. For example, publications or row security might interfere with many other tests. > v1-0003-Remove-useless-initializations.patch: > All other cases (I checked directory src/backend/utils/cache) > calling MemoryContextAllocZero only initialize fields when they > are non-zero, so removing partial initialization with false brings > consistency to the code. > > v1-0004-Remove-useless-code.patch: > Patch removes filling in of constraints from function > BuildDescForRelation. This function is only called from file > view.c and tablecmds.c (twice). In none of those cases > result->constr is used, so proposed change makes sense. > While I do not know code well, so might be wrong here, > I would apply this patch. I have committed these two now.
On 28.06.24 02:00, jian he wrote: > the test structure you made ( generated_stored.sql, > generated_virtual.sq) looks ok to me. > but do we need to reset the search_path at the end of > generated_stored.sql, generated_virtual.sql? No, the session ends at the end of the test file, so we don't need to reset session state. > + /* > + * TODO: This could be done, but it would need a different implementation: > + * no rewriting, but still need to recheck any constraints. > + */ > + if (attTup->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("ALTER TABLE / SET EXPRESSION is not supported for virtual > generated columns"), > + errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.", > + colName, RelationGetRelationName(rel)))); > > minor typo, should be > + errmsg("ALTER TABLE SET EXPRESSION is not supported for virtual > generated columns"), This style "ALTER TABLE / something else" is also used for other error messages related to ALTER TABLE subcommands, so I am using the same here. > insert/update/delete/merge returning have problems: > CREATE TABLE t2 ( > a int , > b int GENERATED ALWAYS AS (a * 2), > d int default 22); > insert into t2(a) select g from generate_series(1,10) g; > > insert into t2 select 100 returning *, (select t2.b), t2.b = t2.a * 2; > update t2 set a = 12 returning *, (select t2.b), t2.b = t2.a * 2; > update t2 set a = 12 returning *, (select (select t2.b)), t2.b = t2.a * 2; > delete from t2 where t2.b = t2.a * 2 returning *, 1,((select t2.b)); > > currently all these query, error message is "unexpected virtual > generated column reference" > we expect above these query work? Yes, this is a bug. I'm looking into it. > issue with merge: > CREATE TABLE t0 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL); > insert into t0(a) select g from generate_series(1,10) g; > MERGE INTO t0 t USING t0 AS s ON 2 * t.a = s.b WHEN MATCHED THEN > DELETE returning *; > > the above query returns zero rows, but for stored generated columns it > will return 10 rows. > > in transformMergeStmt(ParseState *pstate, MergeStmt *stmt) > add > `qry->hasGeneratedVirtual = pstate->p_hasGeneratedVirtual;` > before > `assign_query_collations(pstate, qry);` > solve the problem. Good catch. Will fix. Thanks for this review. I will work on fixing the issues above and come back with a new patch set.
statistic related bug. borrow examples from https://www.postgresql.org/docs/current/sql-createstatistics.html CREATE TABLE t3 (a timestamp PRIMARY KEY, b timestamp GENERATED ALWAYS AS (a) VIRTUAL); CREATE STATISTICS s3 (ndistinct) ON b FROM t3; INSERT INTO t3(a) SELECT i FROM generate_series('2020-01-01'::timestamp, '2020-12-31'::timestamp, '1 minute'::interval) s(i); ANALYZE t3; CREATE STATISTICS s3 (ndistinct) ON date_trunc('month', a), date_trunc('day', b) FROM t3; ANALYZE t3; ERROR: unexpected virtual generated column reference --this is allowed CREATE STATISTICS s5 ON (b + interval '1 day') FROM t3; --this is not allowed. seems inconsistent? CREATE STATISTICS s6 ON (b ) FROM t3; in CreateStatistics(CreateStatsStmt *stmt) we have if (selem->name) { if (attForm->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("statistics creation on virtual generated columns is not supported"))); } else if (IsA(selem->expr, Var)) /* column reference in parens */ { if (get_attgenerated(relid, var->varattno) == ATTRIBUTE_GENERATED_VIRTUAL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("statistics creation on virtual generated columns is not supported"))); } else /* expression */ { ... } you didn't make sure the last "else" branch is not related to virtual generated columns
another bug? drop table gtest12v; CREATE TABLE gtest12v (a int PRIMARY KEY, b bigint, c int GENERATED ALWAYS AS (b * 2) VIRTUAL); insert into gtest12v (a,b) values (11, 22147483647); table gtest12v; insert ok, but select error: ERROR: integer out of range should insert fail? CREATE TABLE gtest12v (a int PRIMARY KEY, b bigint, c int GENERATED ALWAYS AS (b * 2) VIRTUAL); CREATE SEQUENCE sequence_testx OWNED BY gtest12v.c; seems to work. But I am not sure if there are any corner cases that make it not work. just want to raise this issue.
drop table t3; CREATE TABLE t3( b bigint, c int GENERATED ALWAYS AS (b * 2) VIRTUAL); insert into t3 (b) values (22147483647); ANALYZE t3; for ANALYZE since column c has no actual storage, so it's not analyzable? we need to change the function examine_attribute accordingly? For the above example, for each insert row, we actually need to call int84 to validate c value. we probably need something similar to have ExecComputeStoredGenerated etc, but we don't need to store it.
On 22.07.24 12:53, jian he wrote: > another bug? > drop table gtest12v; > CREATE TABLE gtest12v (a int PRIMARY KEY, b bigint, c int GENERATED > ALWAYS AS (b * 2) VIRTUAL); > insert into gtest12v (a,b) values (11, 22147483647); > table gtest12v; > > insert ok, but select error: > ERROR: integer out of range > > should insert fail? I think this is the correct behavior. There has been a previous discussion: https://www.postgresql.org/message-id/2e3d5147-16f8-af0f-00ab-4c72cafc896f%402ndquadrant.com > CREATE TABLE gtest12v (a int PRIMARY KEY, b bigint, c int GENERATED > ALWAYS AS (b * 2) VIRTUAL); > CREATE SEQUENCE sequence_testx OWNED BY gtest12v.c; > > seems to work. But I am not sure if there are any corner cases that > make it not work. > just want to raise this issue. I don't think this matters. You can make a sequence owned by any column, even if that column doesn't have a default that invokes the sequence. So nonsensical setups are possible, but they are harmless.
Thank you for your extensive testing. Here is a new patch set that has fixed all the issues you have reported (MERGE, sublinks, statistics, ANALYZE).
Attachment
On Thu, 8 Aug 2024 at 07:23, Peter Eisentraut <peter@eisentraut.org> wrote: > > Thank you for your extensive testing. Here is a new patch set that has > fixed all the issues you have reported (MERGE, sublinks, statistics, > ANALYZE). I had a quick look at this and found one issue, which is that it doesn't properly deal with virtual generated columns in wholerow attributes: CREATE TABLE foo(a int, a2 int GENERATED ALWAYS AS (a*2) VIRTUAL); INSERT INTO foo VALUES (1); SELECT foo FROM foo; foo ------ (1,) (1 row) Looking at the rewriter changes, it occurred to me that it could perhaps be done more simply using ReplaceVarsFromTargetList() for each RTE with virtual generated columns. That function already has the required wholerow handling code, so there'd be less code duplication. I think it might be better to do this from within fireRIRrules(), just after RLS policies are applied, so it wouldn't need to worry about CTEs and sublink subqueries. That would also make the hasGeneratedVirtual flags unnecessary, since we'd already only be doing the extra work for tables with virtual generated columns. That would eliminate possible bugs caused by failing to set those flags. Regards, Dean
On Thu, Aug 8, 2024 at 2:23 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > Thank you for your extensive testing. Here is a new patch set that has > fixed all the issues you have reported (MERGE, sublinks, statistics, > ANALYZE). if (coldef->generated && restdef->generated && coldef->generated != restdef->generated) ereport(ERROR, (errcode(ERRCODE_INVALID_COLUMN_DEFINITION), errmsg("column \"%s\" inherits from generated column of different kind", restdef->colname))); the error message is not informal. maybe add errhint that "column \"%s\" should be same as parent table's generated column kind: %s", "virtual"|"stored" .../regress/expected/create_table_like.out | 23 +- .../regress/expected/generated_stored.out | 27 +- ...rated_stored.out => generated_virtual.out} | 835 +++++++++--------- src/test/regress/parallel_schedule | 3 + src/test/regress/sql/create_table_like.sql | 2 +- src/test/regress/sql/generated_stored.sql | 10 +- ...rated_stored.sql => generated_virtual.sql} | 301 ++++--- src/test/subscription/t/011_generated.pl | 38 +- 55 files changed, 1280 insertions(+), 711 deletions(-) copy src/test/regress/expected/{generated_stored.out generated_virtual.out} (69%) copy src/test/regress/sql/{generated_stored.sql => generated_virtual.sql} (72%) I don't understand the "copy =>" part, I guess related to copy content from stored to virtual. anyway. some minor issue: -- alter generation expression of parent and all its children altogether ALTER TABLE gtest_parent ALTER COLUMN f3 SET EXPRESSION AS (f2 * 2); \d gtest_parent \d gtest_child \d gtest_child2 \d gtest_child3 SELECT tableoid::regclass, * FROM gtest_parent ORDER BY 1, 2, 3; The first line ALTER TABLE will fail for src/test/regress/sql/generated_virtual.sql. so no need """ \d gtest_parent \d gtest_child \d gtest_child2 \d gtest_child3 SELECT tableoid::regclass, * FROM gtest_parent ORDER BY 1, 2, 3; """ Similarly the following tests for gtest29 may aslo need change -- ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION since we cannot do ALTER TABLE SET EXPRESSION for virtual generated columns. -- ALTER TABLE ... ALTER COLUMN CREATE TABLE gtest27 ( a int, b int, x int GENERATED ALWAYS AS ((a + b) * 2) VIRTUAL ); INSERT INTO gtest27 (a, b) VALUES (3, 7), (4, 11); ALTER TABLE gtest27 ALTER COLUMN a TYPE text; -- error ALTER TABLE gtest27 ALTER COLUMN x TYPE numeric; will ALTER TABLE gtest27 ALTER COLUMN a TYPE int4; be a no-op? do we need a document that virtual generated columns will use the expression's collation. see: drop table if exists t5; CREATE TABLE t5 ( a text collate "C", b text collate "C" GENERATED ALWAYS AS (a collate case_insensitive) , d int DEFAULT 22 ); INSERT INTO t5(a,d) values ('d1',28), ('D2',27), ('D1',26); select * from t5 order by b asc, d asc; + /* + * TODO: Prevent virtual generated columns from having a + * domain type. We would have to enforce domain constraints + * when columns underlying the generated column change. This + * could possibly be implemented, but it's not. + * + * XXX If column->typeName is not set, then this column + * definition is probably a partition definition and will + * presumably get its pre-vetted type from elsewhere. If that + * doesn't hold, maybe this check needs to be moved elsewhere. + */ + if (column->generated == ATTRIBUTE_GENERATED_VIRTUAL && column->typeName) + { + Type ctype; + + ctype = typenameType(cxt->pstate, column->typeName, NULL); + if (((Form_pg_type) GETSTRUCT(ctype))->typtype == TYPTYPE_DOMAIN) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("virtual generated column \"%s\" cannot have a domain type", + column->colname), + parser_errposition(cxt->pstate, + column->location))); + ReleaseSysCache(ctype); + } create domain mydomain as int4; create type mydomainrange as range(subtype=mydomain); CREATE TABLE t3( b bigint, c mydomain GENERATED ALWAYS AS ('11') VIRTUAL); CREATE TABLE t3( b bigint, c mydomainrange GENERATED ALWAYS AS ('[4,50)') VIRTUAL); domain will error out, domain over range is ok, is this fine? + When <literal>VIRTUAL</literal> is specified, the column will be + computed when it is read, and it will not occupy any storage. When + <literal>STORED</literal> is specified, the column will be computed on + write and will be stored on disk. <literal>VIRTUAL</literal> is the + default. drop table if exists t5; CREATE TABLE t5 ( a int, b text storage extended collate "C" GENERATED ALWAYS AS (a::text collate case_insensitive) , d int DEFAULT 22 ); select reltoastrelid <> 0 as has_toast_table from pg_class where oid = 't5'::regclass; if really no storage, should table t5 have an associated toast table or not? also check ALTER TABLE variant: alter table t5 alter b set storage extended; Do we need to do something in ATExecSetStatistics for cases like: ALTER TABLE t5 ALTER b SET STATISTICS 2000; (b is a generated virtual column). because of examine_attribute if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) return NULL; i guess, this won't have a big impact. There are some issues with changing virtual generated column type. like: drop table if exists another; create table another (f4 int, f2 text, f3 text, f1 int GENERATED ALWAYS AS (f4)); insert into another values(1, 'one', 'uno'), (2, 'two', 'due'),(3, 'three', 'tre'); alter table another alter f1 type text using f2 || ' and ' || f3 || ' more'; table another; or alter table another alter f1 type text using f2 || ' and ' || f3 || ' more', drop column f1; ERROR: column "f1" of relation "another" does not exist These two command outputs seem not right. the stored generated column which works as expected. in src/test/regress/sql/alter_table.sql -- We disallow changing table's row type if it's used for storage create table at_tab1 (a int, b text); create table at_tab2 (x int, y at_tab1); alter table at_tab1 alter column b type varchar; -- fails drop table at_tab2; I think the above restriction should apply to virtual generated columns too. given in ATPrepAlterColumnType, not storage we still call find_composite_type_dependencies if (!RELKIND_HAS_STORAGE(tab->relkind)) { /* * For relations without storage, do this check now. Regular tables * will check it later when the table is being rewritten. */ find_composite_type_dependencies(rel->rd_rel->reltype, rel, NULL); } so i think in ATPrepAlterColumnType, we should do: if (attTup->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) { find_composite_type_dependencies(rel->rd_rel->reltype, rel, NULL); } else if (tab->relkind == RELKIND_RELATION || tab->relkind == RELKIND_PARTITIONED_TABLE) { } else if (transform) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not a table", RelationGetRelationName(rel)))); you may add following tests: ------------------------------------------------------------------------ create table at_tab1 (a int, b text GENERATED ALWAYS AS ('hello'), c text); create table at_tab2 (x int, y at_tab1); alter table at_tab1 alter column b type varchar; -- fails drop table at_tab1, at_tab2; -- Check it for a partitioned table, too create table at_tab1 (a int, b text GENERATED ALWAYS AS ('hello'), c text) partition by list(a);; create table at_tab2 (x int, y at_tab1); alter table at_tab1 alter column b type varchar; -- fails drop table at_tab1, at_tab2; ---------------------------------------------------------------------------------
On 08.08.24 20:22, Dean Rasheed wrote: > Looking at the rewriter changes, it occurred to me that it could > perhaps be done more simply using ReplaceVarsFromTargetList() for each > RTE with virtual generated columns. That function already has the > required wholerow handling code, so there'd be less code duplication. Hmm, I don't quite see how ReplaceVarsFromTargetList() could be used here. It does have the wholerow logic that we need somehow, but other than that it seems to target something different? > I think it might be better to do this from within fireRIRrules(), just > after RLS policies are applied, so it wouldn't need to worry about > CTEs and sublink subqueries. That would also make the > hasGeneratedVirtual flags unnecessary, since we'd already only be > doing the extra work for tables with virtual generated columns. That > would eliminate possible bugs caused by failing to set those flags. Yes, ideally, we'd piggy-back this into fireRIRrules(). One thing I'm missing is that if you're descending into subqueries, there is no link to the upper levels' range tables, which we need to lookup the pg_attribute entries of column referencing Vars. That's why there is this whole custom walk with its own context data. Maybe there is a way to do this already that I missed?
On Wed, 21 Aug 2024 at 08:00, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 08.08.24 20:22, Dean Rasheed wrote: > > Looking at the rewriter changes, it occurred to me that it could > > perhaps be done more simply using ReplaceVarsFromTargetList() for each > > RTE with virtual generated columns. That function already has the > > required wholerow handling code, so there'd be less code duplication. > > Hmm, I don't quite see how ReplaceVarsFromTargetList() could be used > here. It does have the wholerow logic that we need somehow, but other > than that it seems to target something different? > Well what I was thinking was that (in fireRIRrules()'s final loop over relations in the rtable), if the relation had any virtual generated columns, you'd build a targetlist containing a TLE for each one, containing the generated expression. Then you could just call ReplaceVarsFromTargetList() to replace any Vars in the query with the corresponding generated expressions. That takes care of descending into subqueries, adjusting varlevelsup, and expanding wholerow Vars that might refer to the generated expression. I also have half an eye on how this patch will interact with my patch to support RETURNING OLD/NEW values. If you use ReplaceVarsFromTargetList(), it should just do the right thing for RETURNING OLD/NEW generated expressions. > > I think it might be better to do this from within fireRIRrules(), just > > after RLS policies are applied, so it wouldn't need to worry about > > CTEs and sublink subqueries. That would also make the > > hasGeneratedVirtual flags unnecessary, since we'd already only be > > doing the extra work for tables with virtual generated columns. That > > would eliminate possible bugs caused by failing to set those flags. > > Yes, ideally, we'd piggy-back this into fireRIRrules(). One thing I'm > missing is that if you're descending into subqueries, there is no link > to the upper levels' range tables, which we need to lookup the > pg_attribute entries of column referencing Vars. That's why there is > this whole custom walk with its own context data. Maybe there is a way > to do this already that I missed? > That link to the upper levels' range tables wouldn't be needed because essentially using ReplaceVarsFromTargetList() flips the whole thing round: instead of traversing the tree looking for Var nodes that need to be replaced (possibly from upper query levels), you build a list of replacement expressions to be applied and apply them from the top, descending into subqueries as needed. Another argument for doing it that way round is to not add too many extra cycles to the processing of existing queries that don't reference generated expressions. ISTM that this patch is potentially adding quite a lot of additional overhead -- it looks like, for every Var in the tree, it's calling get_attgenerated(), which involves a syscache lookup to see if that column is a generated expression (which most won't be). Ideally, we should be trying to do the minimum amount of extra work in the common case where there are no generated expressions. Looking ahead, I can also imagine that one day we might want to support subqueries in generated expressions. That would require recursive processing of generated expressions in the generated expression's subquery, as well as applying RLS policies to the new relations pulled in, and checks to guard against infinite recursion. fireRIRrules() already has the infrastructure to support all of that, so that feels like a much more natural place to do this. Regards, Dean
drop table if exists gtest_err_1 cascade; CREATE TABLE gtest_err_1 ( a int PRIMARY KEY generated by default as identity, b int GENERATED ALWAYS AS (22), d int default 22); create view gtest_err_1_v as select * from gtest_err_1; SELECT events & 4 != 0 AS can_upd, events & 8 != 0 AS can_ins,events & 16 != 0 AS can_del FROM pg_catalog.pg_relation_is_updatable('gtest_err_1_v'::regclass, false) t(events); insert into gtest_err_1_v(a,b, d) values ( 11, default,33) returning *; should the above query, b return 22? even b is "b int default" will return 22. drop table if exists comment_test cascade; CREATE TABLE comment_test ( id int, positive_col int GENERATED ALWAYS AS (22) CHECK (positive_col > 0), positive_col1 int GENERATED ALWAYS AS (22) stored CHECK (positive_col > 0) , indexed_col int, CONSTRAINT comment_test_pk PRIMARY KEY (id)); CREATE INDEX comment_test_index ON comment_test(indexed_col); ALTER TABLE comment_test ALTER COLUMN positive_col1 SET DATA TYPE text; ALTER TABLE comment_test ALTER COLUMN positive_col SET DATA TYPE text; the last query should work just fine? drop table if exists def_test cascade; create table def_test ( c0 int4 GENERATED ALWAYS AS (22) stored, c1 int4 GENERATED ALWAYS AS (22), c2 text default 'initial_default' ); alter table def_test alter column c1 set default 10; ERROR: column "c1" of relation "def_test" is a generated column HINT: Use ALTER TABLE ... ALTER COLUMN ... SET EXPRESSION instead. alter table def_test alter column c1 drop default; ERROR: column "c1" of relation "def_test" is a generated column Is the first error message hint wrong? also the second error message (column x is a generated column) is not helpful. here, we should just say that cannot set/drop default for virtual generated column? drop table if exists bar1, bar2; create table bar1(a integer, b integer GENERATED ALWAYS AS (22)) partition by range (a); create table bar2(a integer); alter table bar2 add column b integer GENERATED ALWAYS AS (22) stored; alter table bar1 attach partition bar2 default; this works, which will make partitioned table and partition have different kinds of generated column, but this is not what we expected? another variant: CREATE TABLE list_parted ( a int NOT NULL, b char(2) COLLATE "C", c int GENERATED ALWAYS AS (22) ) PARTITION BY LIST (a); CREATE TABLE parent (LIKE list_parted); ALTER TABLE parent drop column c, add column c int GENERATED ALWAYS AS (22) stored; ALTER TABLE list_parted ATTACH PARTITION parent FOR VALUES IN (1); drop table if exists tp, tpp1, tpp2; CREATE TABLE tp (a int NOT NULL,b text GENERATED ALWAYS AS (22),c text) PARTITION BY LIST (a); CREATE TABLE tpp1(a int NOT NULL, b text GENERATED ALWAYS AS (c ||'1000' ), c text); ALTER TABLE tp ATTACH PARTITION tpp1 FOR VALUES IN (1); insert into tp(a,b,c) values (1,default, 'hello') returning a,b,c; insert into tpp1(a,b,c) values (1,default, 'hello') returning a,b,c; select tableoid::regclass, * from tpp1; select tableoid::regclass, * from tp; the above two queries return different results, slightly unintuitive, i guess. Do we need to mention it somewhere? CREATE TABLE atnotnull1 (); ALTER TABLE atnotnull1 ADD COLUMN c INT GENERATED ALWAYS AS (22), ADD PRIMARY KEY (c); ERROR: not-null constraints are not supported on virtual generated columns DETAIL: Column "c" of relation "atnotnull1" is a virtual generated column. I guess this error message is fine. The last issue in the previous thread [1], ATPrepAlterColumnType seems not addressed. [1] https://postgr.es/m/CACJufxEGPYtFe79hbsMeOBOivfNnPRsw7Gjvk67m1x2MQggyiQ@mail.gmail.com
On Wed, Aug 21, 2024 at 6:52 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Wed, 21 Aug 2024 at 08:00, Peter Eisentraut <peter@eisentraut.org> wrote: > > > > On 08.08.24 20:22, Dean Rasheed wrote: > > > Looking at the rewriter changes, it occurred to me that it could > > > perhaps be done more simply using ReplaceVarsFromTargetList() for each > > > RTE with virtual generated columns. That function already has the > > > required wholerow handling code, so there'd be less code duplication. > > > > Hmm, I don't quite see how ReplaceVarsFromTargetList() could be used > > here. It does have the wholerow logic that we need somehow, but other > > than that it seems to target something different? > > > > Well what I was thinking was that (in fireRIRrules()'s final loop over > relations in the rtable), if the relation had any virtual generated > columns, you'd build a targetlist containing a TLE for each one, > containing the generated expression. Then you could just call > ReplaceVarsFromTargetList() to replace any Vars in the query with the > corresponding generated expressions. That takes care of descending > into subqueries, adjusting varlevelsup, and expanding wholerow Vars > that might refer to the generated expression. > > I also have half an eye on how this patch will interact with my patch > to support RETURNING OLD/NEW values. If you use > ReplaceVarsFromTargetList(), it should just do the right thing for > RETURNING OLD/NEW generated expressions. > > > > I think it might be better to do this from within fireRIRrules(), just > > > after RLS policies are applied, so it wouldn't need to worry about > > > CTEs and sublink subqueries. That would also make the > > > hasGeneratedVirtual flags unnecessary, since we'd already only be > > > doing the extra work for tables with virtual generated columns. That > > > would eliminate possible bugs caused by failing to set those flags. > > > > Yes, ideally, we'd piggy-back this into fireRIRrules(). One thing I'm > > missing is that if you're descending into subqueries, there is no link > > to the upper levels' range tables, which we need to lookup the > > pg_attribute entries of column referencing Vars. That's why there is > > this whole custom walk with its own context data. Maybe there is a way > > to do this already that I missed? > > > > That link to the upper levels' range tables wouldn't be needed because > essentially using ReplaceVarsFromTargetList() flips the whole thing > round: instead of traversing the tree looking for Var nodes that need > to be replaced (possibly from upper query levels), you build a list of > replacement expressions to be applied and apply them from the top, > descending into subqueries as needed. > CREATE TABLE gtest1 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL); INSERT INTO gtest1 VALUES (1,default), (2, DEFAULT); select b from (SELECT b FROM gtest1) sub; here we only need to translate the second "b" to (a *2), not the first one. but these two "b" query tree representation almost the same (varno, varattno, varlevelsup) I am not sure how ReplaceVarsFromTargetList can disambiguate this? Currently v4-0001-Virtual-generated-columns.patch works. because v4 properly tags the main query hasGeneratedVirtual to false, and tag subquery's hasGeneratedVirtual to true.
On Wed, 4 Sept 2024 at 09:40, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 21.08.24 12:51, Dean Rasheed wrote: > >> > > Well what I was thinking was that (in fireRIRrules()'s final loop over > > relations in the rtable), if the relation had any virtual generated > > columns, you'd build a targetlist containing a TLE for each one, > > containing the generated expression. Then you could just call > > ReplaceVarsFromTargetList() to replace any Vars in the query with the > > corresponding generated expressions. > > Here is an implementation of this. It's much nicer! It also appears to > fix all the additional test cases that have been presented. (I haven't > integrated them into the patch set yet.) > > I left the 0001 patch alone for now and put the new rewriting > implementation into 0002. (Unfortunately, the diff is kind of useless > for visual inspection.) Let me know if this matches what you had in > mind, please. Also, is this the right place in fireRIRrules()? Yes, that's what I had in mind except that it has to be called from the second loop in fireRIRrules(), after any RLS policies have been added, because it's possible for a RLS policy expression to refer to virtual generated columns. It's OK to do it in the same loop that expands RLS policies, because such policies can only refer to columns of the same relation, so once the RLS policies have been expanded for a given relation, nothing else should get added to the query that can refer to columns of that relation, at that query level, so at that point it should be safe to expand virtual generated columns. Regards, Dean
On Wed, Sep 4, 2024 at 4:40 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > > Here is an implementation of this. It's much nicer! It also appears to > fix all the additional test cases that have been presented. (I haven't > integrated them into the patch set yet.) > > I left the 0001 patch alone for now and put the new rewriting > implementation into 0002. (Unfortunately, the diff is kind of useless > for visual inspection.) Let me know if this matches what you had in > mind, please. Also, is this the right place in fireRIRrules()? hi. some minor issues. in get_dependent_generated_columns we can /* skip if not generated column */ if (!TupleDescAttr(tupdesc, defval->adnum - 1)->attgenerated) continue; change to /* skip if not generated stored column */ if (!(TupleDescAttr(tupdesc, defval->adnum - 1)->attgenerated == ATTRIBUTE_GENERATED_STORED)) continue; in ExecInitStoredGenerated "if ((tupdesc->constr && tupdesc->constr->has_generated_stored)))" is true. then later we finish the loop (for (int i = 0; i < natts; i++) loop) we can "Assert(ri_NumGeneratedNeeded > 0)" so we can ensure once has_generated_stored flag is true, then we should have at least one stored generated attribute. similarly, in expand_generated_columns_internal we can aslo add "Assert(list_length(tlist) > 0);" above node = ReplaceVarsFromTargetList(node, rt_index, 0, rte, tlist, REPLACEVARS_CHANGE_VARNO, rt_index, NULL); @@ -2290,7 +2291,9 @@ ExecBuildSlotValueDescription(Oid reloid, if (table_perm || column_perm) { - if (slot->tts_isnull[i]) + if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) + val = "virtual"; + else if (slot->tts_isnull[i]) val = "null"; else { Oid foutoid; bool typisvarlena; getTypeOutputInfo(att->atttypid, &foutoid, &typisvarlena); val = OidOutputFunctionCall(foutoid, slot->tts_values[i]); } we can add Assert here, if i understand it correctly, like if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) { Assert(slot->tts_isnull[i]); val = "virtual"; }
On 04.09.24 12:33, Dean Rasheed wrote: >> I left the 0001 patch alone for now and put the new rewriting >> implementation into 0002. (Unfortunately, the diff is kind of useless >> for visual inspection.) Let me know if this matches what you had in >> mind, please. Also, is this the right place in fireRIRrules()? > Yes, that's what I had in mind except that it has to be called from > the second loop in fireRIRrules(), after any RLS policies have been > added, because it's possible for a RLS policy expression to refer to > virtual generated columns. It's OK to do it in the same loop that > expands RLS policies, because such policies can only refer to columns > of the same relation, so once the RLS policies have been expanded for > a given relation, nothing else should get added to the query that can > refer to columns of that relation, at that query level, so at that > point it should be safe to expand virtual generated columns. If I move the code like that, then the postgres_fdw test fails. So there is some additional interaction there that I need to study.
On 05.09.24 10:27, jian he wrote: > On Wed, Sep 4, 2024 at 4:40 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> >> >> Here is an implementation of this. It's much nicer! It also appears to >> fix all the additional test cases that have been presented. (I haven't >> integrated them into the patch set yet.) >> >> I left the 0001 patch alone for now and put the new rewriting >> implementation into 0002. (Unfortunately, the diff is kind of useless >> for visual inspection.) Let me know if this matches what you had in >> mind, please. Also, is this the right place in fireRIRrules()? > > hi. some minor issues. > > in get_dependent_generated_columns we can > > /* skip if not generated column */ > if (!TupleDescAttr(tupdesc, defval->adnum - 1)->attgenerated) > continue; > change to > /* skip if not generated stored column */ > if (!(TupleDescAttr(tupdesc, defval->adnum - > 1)->attgenerated == ATTRIBUTE_GENERATED_STORED)) > continue; I need to study more what to do with this function. I'm not completely sure whether this should apply only to stored generated columns. > in ExecInitStoredGenerated > "if ((tupdesc->constr && tupdesc->constr->has_generated_stored)))" > is true. > then later we finish the loop > (for (int i = 0; i < natts; i++) loop) > > we can "Assert(ri_NumGeneratedNeeded > 0)" > so we can ensure once has_generated_stored flag is true, > then we should have at least one stored generated attribute. This is technically correct, but this code isn't touched by this patch, so I don't think it belongs here. > similarly, in expand_generated_columns_internal > we can aslo add "Assert(list_length(tlist) > 0);" > above > node = ReplaceVarsFromTargetList(node, rt_index, 0, rte, tlist, > REPLACEVARS_CHANGE_VARNO, rt_index, NULL); Ok, I'll add that. > @@ -2290,7 +2291,9 @@ ExecBuildSlotValueDescription(Oid reloid, > if (table_perm || column_perm) > { > - if (slot->tts_isnull[i]) > + if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) > + val = "virtual"; > + else if (slot->tts_isnull[i]) > val = "null"; > else > { > Oid foutoid; > bool typisvarlena; > getTypeOutputInfo(att->atttypid, &foutoid, &typisvarlena); > val = OidOutputFunctionCall(foutoid, slot->tts_values[i]); > } > > we can add Assert here, if i understand it correctly, like > if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) > { > Assert(slot->tts_isnull[i]); > val = "virtual"; > } Also technically correct, but I don't see what benefit this would bring. The code guarded by that assert would not make use of the thing being asserted.
On 16.09.24 11:22, jian he wrote: > in v7. > > doc/src/sgml/ref/alter_table.sgml > <phrase>and <replaceable > class="parameter">column_constraint</replaceable> is:</phrase> > > section need representation of: > GENERATED ALWAYS AS ( <replaceable>generation_expr</replaceable> ) [VIRTUAL] I have addressed this in patch v8. > in RelationBuildTupleDesc(Relation relation) > we need to add "constr->has_generated_virtual" for the following code? > > if (constr->has_not_null || > constr->has_generated_stored || > ndef > 0 || > attrmiss || > relation->rd_rel->relchecks > 0) fixed in v8 > also seems there will be table_rewrite for adding virtual generated > columns, but we can avoid that. > The attached patch is the change and the tests. > > i've put the tests in src/test/regress/sql/fast_default.sql, > since it already has event triggers and trigger functions, we don't > want to duplicate it. Also added in v8. Thanks!
On 18.09.24 04:38, jian he wrote: > On Mon, Sep 16, 2024 at 5:22 PM jian he <jian.universality@gmail.com> wrote: >> >> in v7. >> > seems I am confused with the version number. > > here, I attached another minor change in tests. > > make > ERROR: invalid ON DELETE action for foreign key constraint containing > generated column > becomes > ERROR: foreign key constraints on virtual generated columns are not supported I think the existing behavior is fine. The first message is about something that is invalid anyway. The second message is just that something is not supported yet. If we end up implementing, then users will get the first message. > change contrib/pageinspect/sql/page.sql > expand information on t_infomask, t_bits information. added to v8 patch > change RelationBuildLocalRelation > make the transient TupleDesc->TupleConstr three bool flags more accurate. I don't think we need that. At the time this is used, the generation expressions are not added to the table yet. Note that stored generated columns are not dealt with here either. If there is a bug, then we can fix it, but if not, then I'd rather keep the code simpler.
On Tue, 5 Nov 2024 at 16:17, Peter Eisentraut <peter@eisentraut.org> wrote: > > New patch version. In expand_generated_columns_in_expr(): + /* + * XXX For the benefit of triggers, make two passes, so it covers + * PRS2_OLD_VARNO and PRS2_NEW_VARNO. + */ + node = expand_generated_columns_internal(node, rel, 1, rte); + node = expand_generated_columns_internal(node, rel, 2, rte); It seems a bit messy to be doing these two passes in expand_generated_columns_in_expr(), when it is only needed for triggers. I think it was better the way it was in the v7 patch, passing rt_index to expand_generated_columns_in_expr(), so that TriggerEnabled() did this: + tgqual = (Node *) expand_generated_columns_in_expr(tgqual, relinfo->ri_RelationDesc, PRS2_OLD_VARNO); + tgqual = (Node *) expand_generated_columns_in_expr(tgqual, relinfo->ri_RelationDesc, PRS2_NEW_VARNO); Regards, Dean
On Tue, Nov 5, 2024 at 9:48 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 30.09.24 04:09, Peter Eisentraut wrote: > > I'm attaching a consolidated patch here, so we have something up to date > > on the record. I haven't worked through all the other recent feedback > > from Jian He yet; I'll do that next. > > New patch version. I've gone through the whole thread again and looked > at all the feedback and various bug reports and test cases and made sure > they are all addressed in the latest patch version. (I'll send some > separate messages to respond to some individual messages, but I'm > keeping the latest patch here.) > I have tried to analyze this patch's interaction with logical replication. The patch allows virtual generated columns in row filters and column lists. But for the column list, it doesn't seem to be computing the correct value whereas for the row filter, it is working due to the following change: @@ -992,7 +993,7 @@ pgoutput_row_filter_init(PGOutputData *data, List *publications, continue; foreach(lc, rfnodes[idx]) - filters = lappend(filters, stringToNode((char *) lfirst(lc))); + filters = lappend(filters, expand_generated_columns_in_expr(stringToNode((char *) lfirst(lc)), relation)); The possible idea to replicate virtual generated columns is to compute the corresponding expression before sending the data to the client. If we can allow it in the row filter than why not to publish it as well. To allow updates, we need to ensure that the replica identity should include all columns referenced by the generated expression. For example, if the generated column is defined as generated always as (c1 + c2), the replica identity must include both c1 and c2. Now, if we can't support the replication of virtual generated columns due to some reason then we can mention in docs for publish_generated_columns that it is used only to replicate STORED generated columns but if we can support it then the publish_generated_columns can accept string values like 'stored', 'virtual', 'all'. Thoughts? -- With Regards, Amit Kapila.
On Tue, 5 Nov 2024 at 21:48, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 30.09.24 04:09, Peter Eisentraut wrote: > > I'm attaching a consolidated patch here, so we have something up to date > > on the record. I haven't worked through all the other recent feedback > > from Jian He yet; I'll do that next. > > New patch version. I've gone through the whole thread again and looked > at all the feedback and various bug reports and test cases and made sure > they are all addressed in the latest patch version. (I'll send some > separate messages to respond to some individual messages, but I'm > keeping the latest patch here.) The patch needs to be rebased due to a recent commit 14e87ffa5c5. I have verified the behavior of logical replication of row filters on the virtual generated column, and everything appears to be functioning as expected. One suggestion would be to add a test case for the row filter on a virtual generated column. Regards, Vignesh
On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > New patch version. I've gone through the whole thread again and looked > at all the feedback and various bug reports and test cases and made sure > they are all addressed in the latest patch version. (I'll send some > separate messages to respond to some individual messages, but I'm > keeping the latest patch here.) just quickly note the not good error message before you rebase. src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS (2) ; ERROR: unrecognized constraint subtype: 4 src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS (2) stored; ERROR: unrecognized constraint subtype: 4 src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS (2) virtual; ERROR: unrecognized constraint subtype: 4 reading gram.y, typedef struct Constraint seems cannot distinguish, we are creating a domain or create table. I cannot found a way to error out in gram.y. so we have to error out at DefineDomain.
> On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > > > New patch version. I've gone through the whole thread again and looked > > at all the feedback and various bug reports and test cases and made sure > > they are all addressed in the latest patch version. (I'll send some > > separate messages to respond to some individual messages, but I'm > > keeping the latest patch here.) > RelationBuildPartitionKey if (!isnull) { char *exprString; Node *expr; exprString = TextDatumGetCString(datum); expr = stringToNode(exprString); pfree(exprString); expr = expand_generated_columns_in_expr(expr, relation); } no need expand_generated_columns_in_expr? in ComputePartitionAttrs, we already forbidden generated columns to be part of the partition key. check_modified_virtual_generated, we can replace fastgetattr to heap_attisnull? like: // bool isnull; // fastgetattr(tuple, i + 1, tupdesc, &isnull); // if (!isnull) // ereport(ERROR, // (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), // errmsg("trigger modified virtual generated column value"))); if (!heap_attisnull(tuple, i+1, tupdesc)) ereport(ERROR, (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), errmsg("trigger modified virtual generated column value")));
On 07.11.24 11:02, Dean Rasheed wrote: > On Tue, 5 Nov 2024 at 16:17, Peter Eisentraut <peter@eisentraut.org> wrote: >> >> New patch version. > > In expand_generated_columns_in_expr(): > > + /* > + * XXX For the benefit of triggers, make two passes, so it covers > + * PRS2_OLD_VARNO and PRS2_NEW_VARNO. > + */ > + node = expand_generated_columns_internal(node, rel, 1, rte); > + node = expand_generated_columns_internal(node, rel, 2, rte); > > It seems a bit messy to be doing these two passes in > expand_generated_columns_in_expr(), when it is only needed for > triggers. I think it was better the way it was in the v7 patch, > passing rt_index to expand_generated_columns_in_expr(), so that > TriggerEnabled() did this: > > + tgqual = (Node *) > expand_generated_columns_in_expr(tgqual, relinfo->ri_RelationDesc, > PRS2_OLD_VARNO); > + tgqual = (Node *) > expand_generated_columns_in_expr(tgqual, relinfo->ri_RelationDesc, > PRS2_NEW_VARNO); Yeah, I put it back that way in v9.
On 11.11.24 12:37, jian he wrote: > On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> >> New patch version. I've gone through the whole thread again and looked >> at all the feedback and various bug reports and test cases and made sure >> they are all addressed in the latest patch version. (I'll send some >> separate messages to respond to some individual messages, but I'm >> keeping the latest patch here.) > > just quickly note the not good error message before you rebase. > > src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS (2) ; > ERROR: unrecognized constraint subtype: 4 > src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS > (2) stored; > ERROR: unrecognized constraint subtype: 4 > src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS > (2) virtual; > ERROR: unrecognized constraint subtype: 4 > > reading gram.y, typedef struct Constraint seems cannot distinguish, we > are creating a domain or create table. > I cannot found a way to error out in gram.y. > > so we have to error out at DefineDomain. This appears to be a very old problem independent of this patch. I'll take a look at fixing it.
On 12.11.24 09:49, jian he wrote: >> On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote: > RelationBuildPartitionKey > if (!isnull) > { > char *exprString; > Node *expr; > exprString = TextDatumGetCString(datum); > expr = stringToNode(exprString); > pfree(exprString); > expr = expand_generated_columns_in_expr(expr, relation); > } > no need expand_generated_columns_in_expr? > in ComputePartitionAttrs, we already forbidden generated columns to be > part of the partition key. True. I have removed this extra code in v9. > check_modified_virtual_generated, we can replace fastgetattr to > heap_attisnull? like: > // bool isnull; > // fastgetattr(tuple, i + 1, tupdesc, &isnull); > // if (!isnull) > // ereport(ERROR, > // (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > // errmsg("trigger modified virtual generated > column value"))); > if (!heap_attisnull(tuple, i+1, tupdesc)) > ereport(ERROR, > (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > errmsg("trigger modified virtual generated > column value"))); I don't know. fastgetattr() is supposed to be "fast". ;-) It's all inline functions, so maybe that is actually correct. I don't have a strong opinion either way.
On 11.11.24 06:51, vignesh C wrote: > The patch needs to be rebased due to a recent commit 14e87ffa5c5. done in v9 > I > have verified the behavior of logical replication of row filters on > the virtual generated column, and everything appears to be functioning > as expected. One suggestion would be to add a test case for the row > filter on a virtual generated column. Yes, I just need a find a good place to put it into src/test/subscription/t/028_row_filter.pl. It's very long. ;-)
On 10.11.24 04:16, Amit Kapila wrote: > The possible idea to replicate virtual generated columns is to compute > the corresponding expression before sending the data to the client. If > we can allow it in the row filter than why not to publish it as well. Row filters have pretty strong restrictions for what kind of operations they can contain. Applying those restrictions to virtual generated columns would probably not make that feature very useful. (You want to use virtual columns for expressions that are too cumbersome to write out by hand every time.) Moreover, we would have to implement some elaborate cross-checks if a table gets added to a publication. How would that work? "Can't add table x to publication because it contains a virtual generated column with a non-simple expression"? With row filters, this is less of a problem, because the row filter a property of the publication.
On 2024-Nov-12, Peter Eisentraut wrote: > On 12.11.24 09:49, jian he wrote: > > > On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > check_modified_virtual_generated, we can replace fastgetattr to > > heap_attisnull? like: > > // bool isnull; > > // fastgetattr(tuple, i + 1, tupdesc, &isnull); > > // if (!isnull) > > // ereport(ERROR, > > // (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > > // errmsg("trigger modified virtual generated > > column value"))); > > if (!heap_attisnull(tuple, i+1, tupdesc)) > > ereport(ERROR, > > (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > > errmsg("trigger modified virtual generated > > column value"))); > > I don't know. fastgetattr() is supposed to be "fast". ;-) It's all inline > functions, so maybe that is actually correct. I don't have a strong opinion > either way. I think Jian is right: if you're only interested in the isnull bit, then heap_attisnull is more appropriate, because it doesn't have to decode ("deform") the tuple before giving you the answer; it knows the answer by checking just the nulls bitmap. With fastgetattr you still fetch the value from the data bytes, even though your function doesn't care about it. That's probably even measurable for wide tuples if the generated attrs are at the end, which sounds common. Personally I dislike using 0-based loops for attribute numbers, which are 1-based. For peace of mind, I'd write this as for (AttrNumber i = 1; i <= tupdesc->natts; i++) { if (TupleDescAttr(tupdesc, i - 1)->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) { bool isnull; fastgetattr(tuple, i, tupdesc, &isnull); // heap_attisnull here actually I'm kind of annoyed that TupleDescAttr() was made to refer to array indexes rather than attribute numbers, but by the time I realized it had happened, it was too late. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte" (Ijon Tichy en Viajes, Stanislaw Lem)
in transformColumnDefinition we can add parser_errposition for the error report. if (column->is_not_null && column->generated == ATTRIBUTE_GENERATED_VIRTUAL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("not-null constraints are not supported on virtual generated columns"), parser_errposition(cxt->pstate, constraint->location))); sometimes, it points to the word "generated", sometimes "not". I guess this should be fine. example: create table t13 (a int, b bool generated always as ((true )) VIRTUAL not null); create table t13 (a int, b bool not null generated always as ((true )) VIRTUAL); These 3 functions will call StoreRelNotNull to store the not-null constraint. StoreConstraints AddRelationNotNullConstraints AddRelationNewConstraints we can disallow not-null on virtual generated columns via these 3 functions. I guess we don't want to add more complexity to AddRelationNotNullConstraints. we can do it in StoreRelNotNull. like: @@ -2185,8 +2196,19 @@ StoreRelNotNull(Relation rel, const char *nnname, AttrNumber attnum, { Oid constrOid; + TupleDesc tupdesc; + Form_pg_attribute att; Assert(attnum > InvalidAttrNumber); + tupdesc = RelationGetDescr(rel); + att = TupleDescAttr(tupdesc, attnum - 1); + + if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("not-null constraints are not supported on virtual generated columns"), + errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.", + NameStr(att->attname), RelationGetRelationName(rel)))); related tests: create table t12(b int, a int generated always as (11) virtual, constraint nn not null a); create table t12(b int, constraint nn not null a, a int generated always as (11) virtual); drop table if exists t14; create table t14(b int, a int generated always as (11) virtual); alter table t14 add constraint nn not null a; alter table t14 add constraint nn not null a no inherit;
On Wed, Nov 13, 2024 at 11:30 AM jian he <jian.universality@gmail.com> wrote: > > These 3 functions will call StoreRelNotNull to store the not-null constraint. > StoreConstraints > AddRelationNotNullConstraints > AddRelationNewConstraints > > we can disallow not-null on virtual generated columns via these 3 functions. > I guess we don't want to add more complexity to AddRelationNotNullConstraints. > we can do it in StoreRelNotNull. inspired by not-null and check check_modified_virtual_generated again. in plpgsql_exec_trigger, we can: /* * In BEFORE trigger, stored generated columns are not computed yet, * so make them null in the NEW row. (Only needed in UPDATE branch; * in the INSERT case, they are already null, but in UPDATE, the field * still contains the old value.) Alternatively, we could construct a * whole new row structure without the generated columns, but this way * seems more efficient and potentially less confusing. */ if (tupdesc->constr && tupdesc->constr->has_generated_stored && TRIGGER_FIRED_BEFORE(trigdata->tg_event)) { for (int i = 0; i < tupdesc->natts; i++) { if (TupleDescAttr(tupdesc, i)->attgenerated == ATTRIBUTE_GENERATED_STORED || TupleDescAttr(tupdesc, i)->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) expanded_record_set_field_internal(rec_new->erh, i + 1, (Datum) 0, true, /* isnull */ false, false); } } then we don't need check_modified_virtual_generated at all. this will align with the stored generated column behave for BEFORE UPDATE/INSERT FOR EACH ROW trigger. that is you are free to assign the virtual generated column any value, but at the plpgsql_exec_trigger, we will rewrite it to null. also i understand correctly. later if we want to implement virtual generated column with not-null then check_modified_virtual_generated needs to be removed?
On Tue, Nov 12, 2024 at 9:47 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 10.11.24 04:16, Amit Kapila wrote: > > The possible idea to replicate virtual generated columns is to compute > > the corresponding expression before sending the data to the client. If > > we can allow it in the row filter than why not to publish it as well. > > Row filters have pretty strong restrictions for what kind of operations > they can contain. Applying those restrictions to virtual generated > columns would probably not make that feature very useful. (You want to > use virtual columns for expressions that are too cumbersome to write out > by hand every time.) > From this paragraph, it sounds like you are saying we can't support virtual columns in row filters. But the patch already works (not checked all possible cases). For example, postgres=# CREATE TABLE gtest1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL); CREATE TABLE postgres=# create publication pub2 for table gtest1 WHERE (b > 5); CREATE PUBLICATION After this, Insert also adheres to this row filter. I haven't tested it in any further detail but its basic usage in row filters works. > Moreover, we would have to implement some elaborate cross-checks if a > table gets added to a publication. How would that work? "Can't add > table x to publication because it contains a virtual generated column > with a non-simple expression"? With row filters, this is less of a > problem, because the row filter a property of the publication. > Because virtual generated columns work in row filters, so I thought it could follow the rules for column lists as well. If the virtual column doesn't adhere to the rules of the row filter then it shouldn't even work there. My response was based on the theory that the expression for virtual columns could be computed during logical decoding. So, let's first clarify that before discussing this point further. -- With Regards, Amit Kapila.