Thread: Virtual generated columns

Virtual generated columns

From
Peter Eisentraut
Date:
Here is a patch set to implement virtual generated columns.

Some history first:  The original development of generated columns was 
discussed in [0].  It started with virtual columns, then added stored 
columns.  Before the release of PG12, it was decided that only stored 
columns were ready, so I cut out virtual columns, and stored generated 
columns shipped with PG12, which is where we are today.

Virtual generated columns are occasionally requested still, and it's a 
bit of unfinished business for me, too, so I started to resurrect it. 
What I did here first was to basically reverse interdiff the patches 
where I cut out virtual generated columns above (this was between 
patches v8 and v9 in [0]) and clean that up and make it work again.

One thing that I needed to decide was how to organize the tests for 
this.  The original patch series had both stored and virtual tests in 
the same test file src/test/regress/sql/generated.sql.  As that file has 
grown, I think it would have been a mess to weave another whole set of 
tests into that.  So instead I figured I'd make two separate test files

     src/test/regress/sql/generated_stored.sql (renamed from current)
     src/test/regress/sql/generated_virtual.sql

and kind of try to keep them aligned, similar to how the various 
collate* tests are handled.  So I put that renaming in as preparatory 
patches.  And there are also some other preparatory cleanup patches that 
I'm including.

The main feature patch (0005 here) generally works but has a number of 
open corner cases that need to be thought about and/or fixed, many of 
which are marked in the code or the tests.  I'll continue working on 
that.  But I wanted to see if I can get some feedback on the test 
structure, so I don't have to keep changing it around later.


[0]: 
https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
Attachment

Re: Virtual generated columns

From
Corey Huinker
Date:
On Mon, Apr 29, 2024 at 4:24 AM Peter Eisentraut <peter@eisentraut.org> wrote:
Here is a patch set to implement virtual generated columns.

I'm very excited about this!



The main feature patch (0005 here) generally works but has a number of
open corner cases that need to be thought about and/or fixed, many of
which are marked in the code or the tests.  I'll continue working on
that.  But I wanted to see if I can get some feedback on the test
structure, so I don't have to keep changing it around later.

I'd be very interested to see virtual generated columns working, as one of my past customers had a need to reclassify data in a partitioned table, and the ability to detach a partition, alter the virtual generated columns, and re-attach would have been great. In case you care, it was basically an "expired" flag, but the rules for what data "expired" varied by country of customer and level of service.

+ * Stored generated columns cannot work: They are computed after
+ * BEFORE triggers, but partition routing is done before all
+ * triggers.  Maybe virtual generated columns could be made to
+ * work, but then they would need to be handled as an expression
+ * below.

I'd say you nailed it with the test structure. The stored/virtual copy/split is the ideal way to approach this, which makes the diff very easy to understand.

+1 for not handling domain types yet.

 -- generation expression must be immutable
-CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision GENERATED ALWAYS AS (random()) STORED);
+CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision GENERATED ALWAYS AS (random()) VIRTUAL);

Does a VIRTUAL generated column have to be immutable? I can see where the STORED one has to be, but consider the following:

CREATE TABLE foo (
created_at timestamptz DEFAULT CURRENT_TIMESTAMP,
row_age interval GENERATED ALWAYS AS CURRENT_TIMESTAMP - created_at
);

 -- can't have generated column that is a child of normal column
 CREATE TABLE gtest_normal (a int, b int);
-CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS (a * 2) STORED) INHERITS (gtest_normal);  -- error
+CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) INHERITS (gtest_normal);  -- error

This is the barrier to the partitioning reorganization scheme I described above. Is there any hard rule why a child table couldn't have a generated column matching the parent's regular column? I can see where it might prevent indexing that column on the parent table, but is there some other dealbreaker or is this just a "it doesn't work yet" situation?

One last thing to keep in mind is that there are two special case expressions in the spec:

GENERATED ALWAYS AS ROW START
GENERATED ALWAYS AS ROW END

and we'll need to be able to fit those into the catalog. I'll start another thread for that unless you prefer I keep it here.

Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 29.04.24 10:23, Peter Eisentraut wrote:
> Here is a patch set to implement virtual generated columns.

> The main feature patch (0005 here) generally works but has a number of 
> open corner cases that need to be thought about and/or fixed, many of 
> which are marked in the code or the tests.  I'll continue working on 
> that.  But I wanted to see if I can get some feedback on the test 
> structure, so I don't have to keep changing it around later.

Here is an updated patch set.  It needed some rebasing, especially 
around the reverting of the catalogued not-null constraints.  I have 
also fixed up the various incomplete or "fixme" pieces of code mentioned 
above.  I have in most cases added "not supported yet" error messages 
for now, with the idea that some of these things can be added in later, 
as incremental features.

In particular, quoting from the commit message, the following are 
currently not supported (but could possibly be added as incremental 
features, some easier than others):

- index on virtual column
- expression index using a virtual column
- hence also no unique constraints on virtual columns
- not-null constraints on virtual columns
- (check constraints are supported)
- foreign key constraints on virtual columns
- extended statistics on virtual columns
- ALTER TABLE / SET EXPRESSION
- ALTER TABLE / DROP EXPRESSION
- virtual columns as trigger columns
- virtual column cannot have domain type

So, I think this basically works now, and the things that don't work 
should be appropriately prevented.  So if someone wants to test this and 
tell me what in fact doesn't work correctly, that would be helpful.

Attachment

Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 29.04.24 20:54, Corey Huinker wrote:
>       -- generation expression must be immutable
>     -CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision
>     GENERATED ALWAYS AS (random()) STORED);
>     +CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision
>     GENERATED ALWAYS AS (random()) VIRTUAL);
> 
> Does a VIRTUAL generated column have to be immutable? I can see where 
> the STORED one has to be, but consider the following:
> 
>     CREATE TABLE foo (
>     created_at timestamptz DEFAULT CURRENT_TIMESTAMP,
>     row_age interval GENERATED ALWAYS AS CURRENT_TIMESTAMP - created_at
>     );

I have been hesitant about this, but I'm now leaning toward that we 
could allow this.

>       -- can't have generated column that is a child of normal column
>       CREATE TABLE gtest_normal (a int, b int);
>     -CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS
>     (a * 2) STORED) INHERITS (gtest_normal);  -- error
>     +CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS
>     (a * 2) VIRTUAL) INHERITS (gtest_normal);  -- error
> 
> This is the barrier to the partitioning reorganization scheme I 
> described above. Is there any hard rule why a child table couldn't have 
> a generated column matching the parent's regular column? I can see where 
> it might prevent indexing that column on the parent table, but is there 
> some other dealbreaker or is this just a "it doesn't work yet" situation?

We had a quite a difficult time getting the inheritance business of 
stored generated columns working correctly.  I'm sticking to the 
well-trodden path here.  We can possibly expand this if someone wants to 
work out the details.

> One last thing to keep in mind is that there are two special case 
> expressions in the spec:
> 
>     GENERATED ALWAYS AS ROW START
>     GENERATED ALWAYS AS ROW END
> 
> and we'll need to be able to fit those into the catalog. I'll start 
> another thread for that unless you prefer I keep it here.

I think this is a separate feature.




Re: Virtual generated columns

From
Tomasz Rybak
Date:
On Wed, 2024-05-22 at 19:22 +0200, Peter Eisentraut wrote:
> On 29.04.24 10:23, Peter Eisentraut wrote:
> > Here is a patch set to implement virtual generated columns.
>
> > The main feature patch (0005 here) generally works but has a number
> > of
> > open corner cases that need to be thought about and/or fixed, many
> > of
> > which are marked in the code or the tests.  I'll continue working
> > on
> > that.  But I wanted to see if I can get some feedback on the test
> > structure, so I don't have to keep changing it around later.
>
> Here is an updated patch set.  It needed some rebasing, especially
> around the reverting of the catalogued not-null constraints.  I have
> also fixed up the various incomplete or "fixme" pieces of code
> mentioned
> above.  I have in most cases added "not supported yet" error messages
> for now, with the idea that some of these things can be added in
> later,
> as incremental features.
>

This is not (yet) full review.

Patches applied cleanly on 76618097a6c027ec603a3dd143f61098e3fb9794
from 2024-06-14.
I've run
./configure && make world-bin && make check && make check-world
on 0001, then 0001+0002, then 0001+0002+0003, up to applying
all 5 patches. All cases passed on Debian unstable on aarch64 (ARM64)
on gcc (Debian 13.2.0-25) 13.2.0.

v1-0001-Rename-regress-test-generated-to-generated_stored.patch:
no objections here, makes sense as preparation for future changes

v1-0002-Put-generated_stored-test-objects-in-a-schema.patch:
also no objections.
OTOH other tests (like publication.out, rowsecurity.out, stats_ext.out,
tablespace.out) are creating schemas and later dropping them - so here
it might also make sense to drop schema at the end of testing.

v1-0003-Remove-useless-initializations.patch:
All other cases (I checked directory src/backend/utils/cache)
calling MemoryContextAllocZero only initialize fields when they
are non-zero, so removing partial initialization with false brings
consistency to the code.

v1-0004-Remove-useless-code.patch:
Patch removes filling in of constraints from function
BuildDescForRelation. This function is only called from file
view.c and tablecmds.c (twice). In none of those cases
result->constr is used, so proposed change makes sense.
While I do not know code well, so might be wrong here,
I would apply this patch.

I haven't looked at the most important (and biggest) file yet,
v1-0005-Virtual-generated-columns.patch; hope to look at it
this week.
At the same I believe 0001-0004 can be applied - even backported
if it'll make maintenance of future changes easier. But that should
be commiter's decision.


Best regards

--
Tomasz Rybak, Debian Developer <serpent@debian.org>
GPG: A565 CE64 F866 A258 4DDC F9C7 ECB7 3E37 E887 AA8C

Attachment

Re: Virtual generated columns

From
jian he
Date:
On Thu, May 23, 2024 at 1:23 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 29.04.24 10:23, Peter Eisentraut wrote:
> > Here is a patch set to implement virtual generated columns.
>
> > The main feature patch (0005 here) generally works but has a number of
> > open corner cases that need to be thought about and/or fixed, many of
> > which are marked in the code or the tests.  I'll continue working on
> > that.  But I wanted to see if I can get some feedback on the test
> > structure, so I don't have to keep changing it around later.

the test structure you made ( generated_stored.sql,
generated_virtual.sq) looks ok to me.
but do we need to reset the search_path at the end of
generated_stored.sql, generated_virtual.sql?

most of the test tables didn't use much storage,
maybe not necessary to clean up (drop the test table) at the end of sql files.

>
> So, I think this basically works now, and the things that don't work
> should be appropriately prevented.  So if someone wants to test this and
> tell me what in fact doesn't work correctly, that would be helpful.


in https://www.postgresql.org/docs/current/catalog-pg-attrdef.html
>>>
The catalog pg_attrdef stores column default values. The main
information about columns is stored in pg_attribute. Only columns for
which a default value has been explicitly set will have an entry here.
>>
didn't mention generated columns related expressions.
Do we need to add something here? maybe a separate issue?


+ /*
+ * TODO: This could be done, but it would need a different implementation:
+ * no rewriting, but still need to recheck any constraints.
+ */
+ if (attTup->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("ALTER TABLE / SET EXPRESSION is not supported for virtual
generated columns"),
+ errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.",
+   colName, RelationGetRelationName(rel))));

minor typo, should be
+ errmsg("ALTER TABLE SET EXPRESSION is not supported for virtual
generated columns"),

insert/update/delete/merge returning have problems:
CREATE TABLE t2 (
a int ,
b int GENERATED ALWAYS AS (a * 2),
d int default 22);
insert into t2(a) select g from generate_series(1,10) g;

insert into t2 select 100 returning *, (select t2.b), t2.b = t2.a * 2;
update t2 set a = 12 returning *, (select t2.b), t2.b = t2.a * 2;
update t2 set a = 12 returning *, (select (select t2.b)), t2.b = t2.a * 2;
delete from t2 where t2.b = t2.a * 2 returning *, 1,((select t2.b));

currently all these query, error message is "unexpected virtual
generated column reference"
we expect above these query work?


issue with merge:
CREATE TABLE t0 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
insert into t0(a) select g from generate_series(1,10) g;
MERGE INTO t0 t USING t0 AS s ON 2 * t.a = s.b WHEN MATCHED THEN
DELETE returning *;

the above query returns zero rows, but for stored generated columns it
will return 10 rows.

in  transformMergeStmt(ParseState *pstate, MergeStmt *stmt)
add
`qry->hasGeneratedVirtual = pstate->p_hasGeneratedVirtual;`
before
`assign_query_collations(pstate, qry);`
solve the problem.



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 28.06.24 02:00, jian he wrote:
> inhttps://www.postgresql.org/docs/current/catalog-pg-attrdef.html
> The catalog pg_attrdef stores column default values. The main
> information about columns is stored in pg_attribute. Only columns for
> which a default value has been explicitly set will have an entry here.
> didn't mention generated columns related expressions.
> Do we need to add something here? maybe a separate issue?

Yes and yes.  I have committed a separate change to update the 
documentation here.



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 17.06.24 21:31, Tomasz Rybak wrote:
> v1-0001-Rename-regress-test-generated-to-generated_stored.patch:
> no objections here, makes sense as preparation for future changes
> 
> v1-0002-Put-generated_stored-test-objects-in-a-schema.patch:
> also no objections.
> OTOH other tests (like publication.out, rowsecurity.out, stats_ext.out,
> tablespace.out) are creating schemas and later dropping them - so here
> it might also make sense to drop schema at the end of testing.

The existing tests for generated columns don't drop what they create at 
the end, which can be useful for pg_upgrade testing for example.  So 
unless there are specific reasons to change it, I would leave that as is.

Other tests might have other reasons.  For example, publications or row 
security might interfere with many other tests.

> v1-0003-Remove-useless-initializations.patch:
> All other cases (I checked directory src/backend/utils/cache)
> calling MemoryContextAllocZero only initialize fields when they
> are non-zero, so removing partial initialization with false brings
> consistency to the code.
> 
> v1-0004-Remove-useless-code.patch:
> Patch removes filling in of constraints from function
> BuildDescForRelation. This function is only called from file
> view.c and tablecmds.c (twice). In none of those cases
> result->constr is used, so proposed change makes sense.
> While I do not know code well, so might be wrong here,
> I would apply this patch.

I have committed these two now.



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 28.06.24 02:00, jian he wrote:
> the test structure you made ( generated_stored.sql,
> generated_virtual.sq) looks ok to me.
> but do we need to reset the search_path at the end of
> generated_stored.sql, generated_virtual.sql?

No, the session ends at the end of the test file, so we don't need to 
reset session state.

> + /*
> + * TODO: This could be done, but it would need a different implementation:
> + * no rewriting, but still need to recheck any constraints.
> + */
> + if (attTup->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("ALTER TABLE / SET EXPRESSION is not supported for virtual
> generated columns"),
> + errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.",
> +   colName, RelationGetRelationName(rel))));
> 
> minor typo, should be
> + errmsg("ALTER TABLE SET EXPRESSION is not supported for virtual
> generated columns"),

This style "ALTER TABLE / something else" is also used for other error 
messages related to ALTER TABLE subcommands, so I am using the same here.

> insert/update/delete/merge returning have problems:
> CREATE TABLE t2 (
> a int ,
> b int GENERATED ALWAYS AS (a * 2),
> d int default 22);
> insert into t2(a) select g from generate_series(1,10) g;
> 
> insert into t2 select 100 returning *, (select t2.b), t2.b = t2.a * 2;
> update t2 set a = 12 returning *, (select t2.b), t2.b = t2.a * 2;
> update t2 set a = 12 returning *, (select (select t2.b)), t2.b = t2.a * 2;
> delete from t2 where t2.b = t2.a * 2 returning *, 1,((select t2.b));
> 
> currently all these query, error message is "unexpected virtual
> generated column reference"
> we expect above these query work?

Yes, this is a bug.  I'm looking into it.

> issue with merge:
> CREATE TABLE t0 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
> insert into t0(a) select g from generate_series(1,10) g;
> MERGE INTO t0 t USING t0 AS s ON 2 * t.a = s.b WHEN MATCHED THEN
> DELETE returning *;
> 
> the above query returns zero rows, but for stored generated columns it
> will return 10 rows.
> 
> in  transformMergeStmt(ParseState *pstate, MergeStmt *stmt)
> add
> `qry->hasGeneratedVirtual = pstate->p_hasGeneratedVirtual;`
> before
> `assign_query_collations(pstate, qry);`
> solve the problem.

Good catch.  Will fix.

Thanks for this review.  I will work on fixing the issues above and come 
back with a new patch set.




Re: Virtual generated columns

From
jian he
Date:
statistic related bug.
borrow examples from
https://www.postgresql.org/docs/current/sql-createstatistics.html

CREATE TABLE t3 (a   timestamp PRIMARY KEY, b timestamp GENERATED
ALWAYS AS (a) VIRTUAL);
CREATE STATISTICS s3 (ndistinct) ON b FROM t3;
INSERT INTO t3(a) SELECT i FROM generate_series('2020-01-01'::timestamp,
                                             '2020-12-31'::timestamp,
                                             '1 minute'::interval) s(i);
ANALYZE t3;
CREATE STATISTICS s3 (ndistinct) ON date_trunc('month', a),
date_trunc('day', b) FROM t3;
ANALYZE t3;
ERROR:  unexpected virtual generated column reference



--this is allowed
CREATE STATISTICS s5 ON (b + interval '1 day') FROM t3;
--this is not allowed. seems inconsistent?
CREATE STATISTICS s6 ON (b ) FROM t3;


in CreateStatistics(CreateStatsStmt *stmt)
we have

if (selem->name)
{
            if (attForm->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
                ereport(ERROR,
                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                         errmsg("statistics creation on virtual
generated columns is not supported")));
}
else if (IsA(selem->expr, Var)) /* column reference in parens */
{
            if (get_attgenerated(relid, var->varattno) ==
ATTRIBUTE_GENERATED_VIRTUAL)
                ereport(ERROR,
                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                         errmsg("statistics creation on virtual
generated columns is not supported")));
}
else                    /* expression */
{
...
}

you didn't make sure the last "else" branch is not related to virtual
generated columns



Re: Virtual generated columns

From
jian he
Date:
another bug?
drop table gtest12v;
CREATE TABLE gtest12v (a int PRIMARY KEY, b bigint, c int GENERATED
ALWAYS AS (b * 2) VIRTUAL);
insert into gtest12v (a,b) values (11,  22147483647);
table gtest12v;

insert ok, but select error:
ERROR:  integer out of range

should insert fail?



CREATE TABLE gtest12v (a int PRIMARY KEY, b bigint, c int GENERATED
ALWAYS AS (b * 2) VIRTUAL);
CREATE SEQUENCE sequence_testx OWNED BY gtest12v.c;

seems to work. But I am not sure if there are any corner cases that
make it not work.
just want to raise this issue.



Re: Virtual generated columns

From
jian he
Date:
drop table t3;
CREATE TABLE t3( b bigint, c int GENERATED ALWAYS AS (b * 2) VIRTUAL);
insert into t3 (b) values (22147483647);
ANALYZE t3;

for ANALYZE
since column c has no actual  storage, so it's not analyzable?
we need to change the function examine_attribute accordingly?


For the above example, for each insert row, we actually need to call
int84 to validate c value.
we probably need something similar to have ExecComputeStoredGenerated etc,
but we don't need to store it.



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 22.07.24 12:53, jian he wrote:
> another bug?
> drop table gtest12v;
> CREATE TABLE gtest12v (a int PRIMARY KEY, b bigint, c int GENERATED
> ALWAYS AS (b * 2) VIRTUAL);
> insert into gtest12v (a,b) values (11,  22147483647);
> table gtest12v;
> 
> insert ok, but select error:
> ERROR:  integer out of range
> 
> should insert fail?

I think this is the correct behavior.

There has been a previous discussion: 
https://www.postgresql.org/message-id/2e3d5147-16f8-af0f-00ab-4c72cafc896f%402ndquadrant.com

> CREATE TABLE gtest12v (a int PRIMARY KEY, b bigint, c int GENERATED
> ALWAYS AS (b * 2) VIRTUAL);
> CREATE SEQUENCE sequence_testx OWNED BY gtest12v.c;
> 
> seems to work. But I am not sure if there are any corner cases that
> make it not work.
> just want to raise this issue.

I don't think this matters.  You can make a sequence owned by any 
column, even if that column doesn't have a default that invokes the 
sequence.  So nonsensical setups are possible, but they are harmless.




Re: Virtual generated columns

From
Peter Eisentraut
Date:
Thank you for your extensive testing.  Here is a new patch set that has 
fixed all the issues you have reported (MERGE, sublinks, statistics, 
ANALYZE).

Attachment

Re: Virtual generated columns

From
Dean Rasheed
Date:
On Thu, 8 Aug 2024 at 07:23, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> Thank you for your extensive testing.  Here is a new patch set that has
> fixed all the issues you have reported (MERGE, sublinks, statistics,
> ANALYZE).

I had a quick look at this and found one issue, which is that it
doesn't properly deal with virtual generated columns in wholerow
attributes:

CREATE TABLE foo(a int, a2 int GENERATED ALWAYS AS (a*2) VIRTUAL);
INSERT INTO foo VALUES (1);
SELECT foo FROM foo;

 foo
------
 (1,)
(1 row)

Looking at the rewriter changes, it occurred to me that it could
perhaps be done more simply using ReplaceVarsFromTargetList() for each
RTE with virtual generated columns. That function already has the
required wholerow handling code, so there'd be less code duplication.
I think it might be better to do this from within fireRIRrules(), just
after RLS policies are applied, so it wouldn't need to worry about
CTEs and sublink subqueries. That would also make the
hasGeneratedVirtual flags unnecessary, since we'd already only be
doing the extra work for tables with virtual generated columns. That
would eliminate possible bugs caused by failing to set those flags.

Regards,
Dean



Re: Virtual generated columns

From
jian he
Date:
On Thu, Aug 8, 2024 at 2:23 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> Thank you for your extensive testing.  Here is a new patch set that has
> fixed all the issues you have reported (MERGE, sublinks, statistics,
> ANALYZE).

                    if (coldef->generated && restdef->generated &&
coldef->generated != restdef->generated)
                        ereport(ERROR,
                                (errcode(ERRCODE_INVALID_COLUMN_DEFINITION),
                                 errmsg("column \"%s\" inherits from
generated column of different kind",
                                        restdef->colname)));
the error message is not informal. maybe add errhint that
"column \"%s\" should be same as parent table's generated column kind:
%s", "virtual"|"stored"


 .../regress/expected/create_table_like.out    |  23 +-
 .../regress/expected/generated_stored.out     |  27 +-
 ...rated_stored.out => generated_virtual.out} | 835 +++++++++---------
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/sql/create_table_like.sql    |   2 +-
 src/test/regress/sql/generated_stored.sql     |  10 +-
 ...rated_stored.sql => generated_virtual.sql} | 301 ++++---
 src/test/subscription/t/011_generated.pl      |  38 +-
 55 files changed, 1280 insertions(+), 711 deletions(-)
 copy src/test/regress/expected/{generated_stored.out
generated_virtual.out} (69%)
 copy src/test/regress/sql/{generated_stored.sql => generated_virtual.sql} (72%)

I don't understand the "copy =>" part, I guess related to copy content
from stored to virtual.
anyway. some minor issue:

-- alter generation expression of parent and all its children altogether
ALTER TABLE gtest_parent ALTER COLUMN f3 SET EXPRESSION AS (f2 * 2);
\d gtest_parent
\d gtest_child
\d gtest_child2
\d gtest_child3
SELECT tableoid::regclass, * FROM gtest_parent ORDER BY 1, 2, 3;

The first line ALTER TABLE will fail for
src/test/regress/sql/generated_virtual.sql.
so no need
"""
\d gtest_parent
\d gtest_child
\d gtest_child2
\d gtest_child3
SELECT tableoid::regclass, * FROM gtest_parent ORDER BY 1, 2, 3;
"""

Similarly the following tests for gtest29 may aslo need change
-- ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION

since we cannot do ALTER TABLE SET EXPRESSION for virtual generated columns.


-- ALTER TABLE ... ALTER COLUMN
CREATE TABLE gtest27 (
    a int,
    b int,
    x int GENERATED ALWAYS AS ((a + b) * 2) VIRTUAL
);
INSERT INTO gtest27 (a, b) VALUES (3, 7), (4, 11);
ALTER TABLE gtest27 ALTER COLUMN a TYPE text;  -- error
ALTER TABLE gtest27 ALTER COLUMN x TYPE numeric;

will
ALTER TABLE gtest27 ALTER COLUMN a TYPE int4;
be a no-op?


do we need a document that virtual generated columns will use the
expression's collation.
see:
drop table if exists t5;
CREATE TABLE t5 (
    a text collate "C",
    b text collate "C" GENERATED ALWAYS AS (a collate case_insensitive) ,
    d int DEFAULT 22
);
INSERT INTO t5(a,d) values ('d1',28), ('D2',27), ('D1',26);
select * from t5 order by b asc, d asc;



+ /*
+ * TODO: Prevent virtual generated columns from having a
+ * domain type.  We would have to enforce domain constraints
+ * when columns underlying the generated column change.  This
+ * could possibly be implemented, but it's not.
+ *
+ * XXX If column->typeName is not set, then this column
+ * definition is probably a partition definition and will
+ * presumably get its pre-vetted type from elsewhere.  If that
+ * doesn't hold, maybe this check needs to be moved elsewhere.
+ */
+ if (column->generated == ATTRIBUTE_GENERATED_VIRTUAL && column->typeName)
+ {
+ Type ctype;
+
+ ctype = typenameType(cxt->pstate, column->typeName, NULL);
+ if (((Form_pg_type) GETSTRUCT(ctype))->typtype == TYPTYPE_DOMAIN)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("virtual generated column \"%s\" cannot have a domain type",
+ column->colname),
+ parser_errposition(cxt->pstate,
+ column->location)));
+ ReleaseSysCache(ctype);
+ }

create domain mydomain as int4;
create type mydomainrange as range(subtype=mydomain);
CREATE TABLE t3( b bigint, c mydomain GENERATED ALWAYS AS ('11') VIRTUAL);
CREATE TABLE t3( b bigint, c mydomainrange GENERATED ALWAYS AS
('[4,50)') VIRTUAL);
domain will error out, domain over range is ok, is this fine?



+      When <literal>VIRTUAL</literal> is specified, the column will be
+      computed when it is read, and it will not occupy any storage.  When
+      <literal>STORED</literal> is specified, the column will be computed on
+      write and will be stored on disk.  <literal>VIRTUAL</literal> is the
+      default.
drop table if exists t5;
CREATE TABLE t5 (
    a int,
    b text storage extended collate "C"  GENERATED ALWAYS AS (a::text
collate case_insensitive) ,
    d int DEFAULT 22
);
select reltoastrelid <> 0 as has_toast_table from pg_class where oid =
't5'::regclass;

if really no storage, should table t5 have an associated toast table or not?
also check ALTER TABLE variant:
alter table t5 alter b set storage extended;



Do we need to do something in ATExecSetStatistics for cases like:
ALTER TABLE t5 ALTER b SET STATISTICS 2000;
(b is a generated virtual column).
because of
examine_attribute
    if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
        return NULL;
i guess, this won't have a big impact.



There are some issues with changing virtual generated column type.
like:
drop table if exists another;
create table another (f4 int, f2 text, f3 text, f1 int GENERATED
ALWAYS AS (f4));
insert into another values(1, 'one', 'uno'), (2, 'two', 'due'),(3,
'three', 'tre');
alter table another
  alter f1 type text using f2 || ' and ' || f3 || ' more';
table another;

or
alter table another
  alter f1 type text using f2 || ' and ' || f3 || ' more',
  drop column f1;
ERROR:  column "f1" of relation "another" does not exist

These two command outputs seem not right.
the stored generated column which works as expected.


in src/test/regress/sql/alter_table.sql
-- We disallow changing table's row type if it's used for storage
create table at_tab1 (a int, b text);
create table at_tab2 (x int, y at_tab1);
alter table at_tab1 alter column b type varchar; -- fails
drop table at_tab2;

I think the above restriction should  apply to virtual generated columns too.
given in ATPrepAlterColumnType, not storage we still call
find_composite_type_dependencies

    if (!RELKIND_HAS_STORAGE(tab->relkind))
    {
        /*
         * For relations without storage, do this check now.  Regular tables
         * will check it later when the table is being rewritten.
         */
        find_composite_type_dependencies(rel->rd_rel->reltype, rel, NULL);
    }

so i think in ATPrepAlterColumnType, we should do:

    if (attTup->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
    {
        find_composite_type_dependencies(rel->rd_rel->reltype, rel, NULL);
    }
    else if (tab->relkind == RELKIND_RELATION ||
             tab->relkind == RELKIND_PARTITIONED_TABLE)
   {
   }
    else if (transform)
        ereport(ERROR,
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                 errmsg("\"%s\" is not a table",
                        RelationGetRelationName(rel))));

you may add following tests:
------------------------------------------------------------------------
create table at_tab1 (a int, b text GENERATED ALWAYS AS ('hello'), c text);
create table at_tab2 (x int, y at_tab1);
alter table at_tab1 alter column b type varchar; -- fails
drop table at_tab1, at_tab2;

-- Check it for a partitioned table, too
create table at_tab1 (a int, b text GENERATED ALWAYS AS ('hello'), c
text) partition by list(a);;
create table at_tab2 (x int, y at_tab1);
alter table at_tab1 alter column b type varchar; -- fails
drop table at_tab1, at_tab2;
---------------------------------------------------------------------------------



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 08.08.24 20:22, Dean Rasheed wrote:
> Looking at the rewriter changes, it occurred to me that it could
> perhaps be done more simply using ReplaceVarsFromTargetList() for each
> RTE with virtual generated columns. That function already has the
> required wholerow handling code, so there'd be less code duplication.

Hmm, I don't quite see how ReplaceVarsFromTargetList() could be used 
here.  It does have the wholerow logic that we need somehow, but other 
than that it seems to target something different?

> I think it might be better to do this from within fireRIRrules(), just
> after RLS policies are applied, so it wouldn't need to worry about
> CTEs and sublink subqueries. That would also make the
> hasGeneratedVirtual flags unnecessary, since we'd already only be
> doing the extra work for tables with virtual generated columns. That
> would eliminate possible bugs caused by failing to set those flags.

Yes, ideally, we'd piggy-back this into fireRIRrules().  One thing I'm 
missing is that if you're descending into subqueries, there is no link 
to the upper levels' range tables, which we need to lookup the 
pg_attribute entries of column referencing Vars.  That's why there is 
this whole custom walk with its own context data.  Maybe there is a way 
to do this already that I missed?




Re: Virtual generated columns

From
Dean Rasheed
Date:
On Wed, 21 Aug 2024 at 08:00, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 08.08.24 20:22, Dean Rasheed wrote:
> > Looking at the rewriter changes, it occurred to me that it could
> > perhaps be done more simply using ReplaceVarsFromTargetList() for each
> > RTE with virtual generated columns. That function already has the
> > required wholerow handling code, so there'd be less code duplication.
>
> Hmm, I don't quite see how ReplaceVarsFromTargetList() could be used
> here.  It does have the wholerow logic that we need somehow, but other
> than that it seems to target something different?
>

Well what I was thinking was that (in fireRIRrules()'s final loop over
relations in the rtable), if the relation had any virtual generated
columns, you'd build a targetlist containing a TLE for each one,
containing the generated expression. Then you could just call
ReplaceVarsFromTargetList() to replace any Vars in the query with the
corresponding generated expressions. That takes care of descending
into subqueries, adjusting varlevelsup, and expanding wholerow Vars
that might refer to the generated expression.

I also have half an eye on how this patch will interact with my patch
to support RETURNING OLD/NEW values. If you use
ReplaceVarsFromTargetList(), it should just do the right thing for
RETURNING OLD/NEW generated expressions.

> > I think it might be better to do this from within fireRIRrules(), just
> > after RLS policies are applied, so it wouldn't need to worry about
> > CTEs and sublink subqueries. That would also make the
> > hasGeneratedVirtual flags unnecessary, since we'd already only be
> > doing the extra work for tables with virtual generated columns. That
> > would eliminate possible bugs caused by failing to set those flags.
>
> Yes, ideally, we'd piggy-back this into fireRIRrules().  One thing I'm
> missing is that if you're descending into subqueries, there is no link
> to the upper levels' range tables, which we need to lookup the
> pg_attribute entries of column referencing Vars.  That's why there is
> this whole custom walk with its own context data.  Maybe there is a way
> to do this already that I missed?
>

That link to the upper levels' range tables wouldn't be needed because
essentially using ReplaceVarsFromTargetList() flips the whole thing
round: instead of traversing the tree looking for Var nodes that need
to be replaced (possibly from upper query levels), you build a list of
replacement expressions to be applied and apply them from the top,
descending into subqueries as needed.

Another argument for doing it that way round is to not add too many
extra cycles to the processing of existing queries that don't
reference generated expressions. ISTM that this patch is potentially
adding quite a lot of additional overhead -- it looks like, for every
Var in the tree, it's calling get_attgenerated(), which involves a
syscache lookup to see if that column is a generated expression (which
most won't be). Ideally, we should be trying to do the minimum amount
of extra work in the common case where there are no generated
expressions.

Looking ahead, I can also imagine that one day we might want to
support subqueries in generated expressions. That would require
recursive processing of generated expressions in the generated
expression's subquery, as well as applying RLS policies to the new
relations pulled in, and checks to guard against infinite recursion.
fireRIRrules() already has the infrastructure to support all of that,
so that feels like a much more natural place to do this.

Regards,
Dean



Re: Virtual generated columns

From
jian he
Date:
drop table if exists gtest_err_1 cascade;
CREATE TABLE gtest_err_1 (
a int PRIMARY KEY generated by default as identity,
b int GENERATED ALWAYS AS (22),
d int default 22);
create view gtest_err_1_v as select * from gtest_err_1;
SELECT events & 4 != 0 AS can_upd, events & 8 != 0 AS can_ins,events &
16 != 0 AS can_del
FROM pg_catalog.pg_relation_is_updatable('gtest_err_1_v'::regclass,
false) t(events);

insert into gtest_err_1_v(a,b, d) values ( 11, default,33) returning *;
should the above query, b return 22?
even b is  "b int default" will return 22.


drop table if exists comment_test cascade;
CREATE TABLE comment_test (
  id int,
  positive_col int  GENERATED ALWAYS AS (22) CHECK (positive_col > 0),
  positive_col1 int  GENERATED ALWAYS AS (22) stored CHECK (positive_col > 0) ,
  indexed_col int,
  CONSTRAINT comment_test_pk PRIMARY KEY (id));
CREATE INDEX comment_test_index ON comment_test(indexed_col);
ALTER TABLE comment_test ALTER COLUMN positive_col1 SET DATA TYPE text;
ALTER TABLE comment_test ALTER COLUMN positive_col SET DATA TYPE text;
the last query should work just fine?


drop table if exists def_test cascade;
create table def_test (
    c0    int4 GENERATED ALWAYS AS (22) stored,
    c1    int4 GENERATED ALWAYS AS (22),
    c2    text default 'initial_default'
);
alter table def_test alter column c1 set default 10;
ERROR:  column "c1" of relation "def_test" is a generated column
HINT:  Use ALTER TABLE ... ALTER COLUMN ... SET EXPRESSION instead.
alter table def_test alter column c1 drop default;
ERROR:  column "c1" of relation "def_test" is a generated column

Is the first error message hint wrong?
also the second error message (column x is a generated column) is not helpful.
here, we should just say that cannot set/drop default for virtual
generated column?



drop table if exists bar1, bar2;
create table bar1(a integer, b integer GENERATED ALWAYS AS (22))
partition by range (a);
create table bar2(a integer);
alter table bar2 add column b integer GENERATED ALWAYS AS (22) stored;
alter table bar1 attach partition bar2 default;
this works, which will make partitioned table and partition have
different kinds of generated column,
but this is not what we expected?

another variant:
CREATE TABLE list_parted (
a int NOT NULL,
b char(2) COLLATE "C",
c int GENERATED ALWAYS AS (22)
) PARTITION BY LIST (a);
CREATE TABLE parent (LIKE list_parted);
ALTER TABLE parent drop column c, add column c int GENERATED ALWAYS AS
(22) stored;
ALTER TABLE list_parted ATTACH PARTITION parent FOR VALUES IN (1);




drop table if exists tp, tpp1, tpp2;
CREATE TABLE tp (a int NOT NULL,b text GENERATED ALWAYS AS (22),c
text) PARTITION BY LIST (a);
CREATE TABLE tpp1(a int NOT NULL, b text GENERATED ALWAYS AS (c
||'1000' ), c text);
ALTER TABLE tp ATTACH PARTITION tpp1 FOR VALUES IN (1);
insert into tp(a,b,c) values (1,default, 'hello') returning a,b,c;
insert into tpp1(a,b,c) values (1,default, 'hello') returning a,b,c;

select tableoid::regclass, * from tpp1;
select tableoid::regclass, * from tp;
the above two queries return different results, slightly unintuitive, i guess.
Do we need to mention it somewhere?



CREATE TABLE atnotnull1 ();
ALTER TABLE atnotnull1 ADD COLUMN c INT GENERATED ALWAYS AS (22), ADD
PRIMARY KEY (c);
ERROR:  not-null constraints are not supported on virtual generated columns
DETAIL:  Column "c" of relation "atnotnull1" is a virtual generated column.
I guess this error message is fine.

The last issue in the previous thread [1], ATPrepAlterColumnType
seems not addressed.

[1] https://postgr.es/m/CACJufxEGPYtFe79hbsMeOBOivfNnPRsw7Gjvk67m1x2MQggyiQ@mail.gmail.com



Re: Virtual generated columns

From
jian he
Date:
On Wed, Aug 21, 2024 at 6:52 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Wed, 21 Aug 2024 at 08:00, Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > On 08.08.24 20:22, Dean Rasheed wrote:
> > > Looking at the rewriter changes, it occurred to me that it could
> > > perhaps be done more simply using ReplaceVarsFromTargetList() for each
> > > RTE with virtual generated columns. That function already has the
> > > required wholerow handling code, so there'd be less code duplication.
> >
> > Hmm, I don't quite see how ReplaceVarsFromTargetList() could be used
> > here.  It does have the wholerow logic that we need somehow, but other
> > than that it seems to target something different?
> >
>


> Well what I was thinking was that (in fireRIRrules()'s final loop over
> relations in the rtable), if the relation had any virtual generated
> columns, you'd build a targetlist containing a TLE for each one,
> containing the generated expression. Then you could just call
> ReplaceVarsFromTargetList() to replace any Vars in the query with the
> corresponding generated expressions. That takes care of descending
> into subqueries, adjusting varlevelsup, and expanding wholerow Vars
> that might refer to the generated expression.
>
> I also have half an eye on how this patch will interact with my patch
> to support RETURNING OLD/NEW values. If you use
> ReplaceVarsFromTargetList(), it should just do the right thing for
> RETURNING OLD/NEW generated expressions.
>
> > > I think it might be better to do this from within fireRIRrules(), just
> > > after RLS policies are applied, so it wouldn't need to worry about
> > > CTEs and sublink subqueries. That would also make the
> > > hasGeneratedVirtual flags unnecessary, since we'd already only be
> > > doing the extra work for tables with virtual generated columns. That
> > > would eliminate possible bugs caused by failing to set those flags.
> >
> > Yes, ideally, we'd piggy-back this into fireRIRrules().  One thing I'm
> > missing is that if you're descending into subqueries, there is no link
> > to the upper levels' range tables, which we need to lookup the
> > pg_attribute entries of column referencing Vars.  That's why there is
> > this whole custom walk with its own context data.  Maybe there is a way
> > to do this already that I missed?
> >
>
> That link to the upper levels' range tables wouldn't be needed because
> essentially using ReplaceVarsFromTargetList() flips the whole thing
> round: instead of traversing the tree looking for Var nodes that need
> to be replaced (possibly from upper query levels), you build a list of
> replacement expressions to be applied and apply them from the top,
> descending into subqueries as needed.
>

CREATE TABLE gtest1 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
INSERT INTO gtest1 VALUES (1,default), (2, DEFAULT);

select b from  (SELECT b FROM gtest1) sub;
here we only need to translate the second "b" to (a *2), not the first one.
but these two "b" query tree representation almost the same (varno,
varattno, varlevelsup)

I am not sure how ReplaceVarsFromTargetList can disambiguate this?
Currently v4-0001-Virtual-generated-columns.patch
works. because v4 properly tags the main query hasGeneratedVirtual to false,
and tag subquery's hasGeneratedVirtual to true.



Re: Virtual generated columns

From
Dean Rasheed
Date:
On Wed, 4 Sept 2024 at 09:40, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 21.08.24 12:51, Dean Rasheed wrote:
> >>
> > Well what I was thinking was that (in fireRIRrules()'s final loop over
> > relations in the rtable), if the relation had any virtual generated
> > columns, you'd build a targetlist containing a TLE for each one,
> > containing the generated expression. Then you could just call
> > ReplaceVarsFromTargetList() to replace any Vars in the query with the
> > corresponding generated expressions.
>
> Here is an implementation of this.  It's much nicer!  It also appears to
> fix all the additional test cases that have been presented.  (I haven't
> integrated them into the patch set yet.)
>
> I left the 0001 patch alone for now and put the new rewriting
> implementation into 0002.  (Unfortunately, the diff is kind of useless
> for visual inspection.)  Let me know if this matches what you had in
> mind, please.  Also, is this the right place in fireRIRrules()?

Yes, that's what I had in mind except that it has to be called from
the second loop in fireRIRrules(), after any RLS policies have been
added, because it's possible for a RLS policy expression to refer to
virtual generated columns. It's OK to do it in the same loop that
expands RLS policies, because such policies can only refer to columns
of the same relation, so once the RLS policies have been expanded for
a given relation, nothing else should get added to the query that can
refer to columns of that relation, at that query level, so at that
point it should be safe to expand virtual generated columns.

Regards,
Dean



Re: Virtual generated columns

From
jian he
Date:
On Wed, Sep 4, 2024 at 4:40 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
>
> Here is an implementation of this.  It's much nicer!  It also appears to
> fix all the additional test cases that have been presented.  (I haven't
> integrated them into the patch set yet.)
>
> I left the 0001 patch alone for now and put the new rewriting
> implementation into 0002.  (Unfortunately, the diff is kind of useless
> for visual inspection.)  Let me know if this matches what you had in
> mind, please.  Also, is this the right place in fireRIRrules()?

hi. some minor issues.

in get_dependent_generated_columns we can

            /* skip if not generated column */
            if (!TupleDescAttr(tupdesc, defval->adnum - 1)->attgenerated)
                continue;
change to
            /* skip if not generated stored column */
            if (!(TupleDescAttr(tupdesc, defval->adnum -
1)->attgenerated == ATTRIBUTE_GENERATED_STORED))
                continue;


in ExecInitStoredGenerated
"if ((tupdesc->constr && tupdesc->constr->has_generated_stored)))"
is true.
then later we finish the loop
(for (int i = 0; i < natts; i++) loop)

we can "Assert(ri_NumGeneratedNeeded > 0)"
so we can ensure once has_generated_stored flag is true,
then we should have at least one stored generated attribute.



similarly, in expand_generated_columns_internal
we can aslo add "Assert(list_length(tlist) > 0);"
above
node = ReplaceVarsFromTargetList(node, rt_index, 0, rte, tlist,
REPLACEVARS_CHANGE_VARNO, rt_index, NULL);



@@ -2290,7 +2291,9 @@ ExecBuildSlotValueDescription(Oid reloid,
if (table_perm || column_perm)
{
- if (slot->tts_isnull[i])
+ if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+ val = "virtual";
+ else if (slot->tts_isnull[i])
    val = "null";
else
{
Oid  foutoid;
bool typisvarlena;
getTypeOutputInfo(att->atttypid, &foutoid, &typisvarlena);
val = OidOutputFunctionCall(foutoid, slot->tts_values[i]);
}

we can add Assert here, if i understand it correctly, like
 if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
{
Assert(slot->tts_isnull[i]);
 val = "virtual";
}



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 04.09.24 12:33, Dean Rasheed wrote:
>> I left the 0001 patch alone for now and put the new rewriting
>> implementation into 0002.  (Unfortunately, the diff is kind of useless
>> for visual inspection.)  Let me know if this matches what you had in
>> mind, please.  Also, is this the right place in fireRIRrules()?
> Yes, that's what I had in mind except that it has to be called from
> the second loop in fireRIRrules(), after any RLS policies have been
> added, because it's possible for a RLS policy expression to refer to
> virtual generated columns. It's OK to do it in the same loop that
> expands RLS policies, because such policies can only refer to columns
> of the same relation, so once the RLS policies have been expanded for
> a given relation, nothing else should get added to the query that can
> refer to columns of that relation, at that query level, so at that
> point it should be safe to expand virtual generated columns.

If I move the code like that, then the postgres_fdw test fails.  So 
there is some additional interaction there that I need to study.



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 05.09.24 10:27, jian he wrote:
> On Wed, Sep 4, 2024 at 4:40 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>>
>> Here is an implementation of this.  It's much nicer!  It also appears to
>> fix all the additional test cases that have been presented.  (I haven't
>> integrated them into the patch set yet.)
>>
>> I left the 0001 patch alone for now and put the new rewriting
>> implementation into 0002.  (Unfortunately, the diff is kind of useless
>> for visual inspection.)  Let me know if this matches what you had in
>> mind, please.  Also, is this the right place in fireRIRrules()?
> 
> hi. some minor issues.
> 
> in get_dependent_generated_columns we can
> 
>              /* skip if not generated column */
>              if (!TupleDescAttr(tupdesc, defval->adnum - 1)->attgenerated)
>                  continue;
> change to
>              /* skip if not generated stored column */
>              if (!(TupleDescAttr(tupdesc, defval->adnum -
> 1)->attgenerated == ATTRIBUTE_GENERATED_STORED))
>                  continue;

I need to study more what to do with this function.  I'm not completely 
sure whether this should apply only to stored generated columns.

> in ExecInitStoredGenerated
> "if ((tupdesc->constr && tupdesc->constr->has_generated_stored)))"
> is true.
> then later we finish the loop
> (for (int i = 0; i < natts; i++) loop)
> 
> we can "Assert(ri_NumGeneratedNeeded > 0)"
> so we can ensure once has_generated_stored flag is true,
> then we should have at least one stored generated attribute.

This is technically correct, but this code isn't touched by this patch, 
so I don't think it belongs here.

> similarly, in expand_generated_columns_internal
> we can aslo add "Assert(list_length(tlist) > 0);"
> above
> node = ReplaceVarsFromTargetList(node, rt_index, 0, rte, tlist,
> REPLACEVARS_CHANGE_VARNO, rt_index, NULL);

Ok, I'll add that.

> @@ -2290,7 +2291,9 @@ ExecBuildSlotValueDescription(Oid reloid,
> if (table_perm || column_perm)
> {
> - if (slot->tts_isnull[i])
> + if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
> + val = "virtual";
> + else if (slot->tts_isnull[i])
>      val = "null";
> else
> {
> Oid  foutoid;
> bool typisvarlena;
> getTypeOutputInfo(att->atttypid, &foutoid, &typisvarlena);
> val = OidOutputFunctionCall(foutoid, slot->tts_values[i]);
> }
> 
> we can add Assert here, if i understand it correctly, like
>   if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
> {
> Assert(slot->tts_isnull[i]);
>   val = "virtual";
> }

Also technically correct, but I don't see what benefit this would bring. 
  The code guarded by that assert would not make use of the thing being 
asserted.




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 16.09.24 11:22, jian he wrote:
> in v7.
> 
> doc/src/sgml/ref/alter_table.sgml
> <phrase>and <replaceable
> class="parameter">column_constraint</replaceable> is:</phrase>
> 
> section need representation of:
> GENERATED ALWAYS AS ( <replaceable>generation_expr</replaceable> ) [VIRTUAL]

I have addressed this in patch v8.

> in RelationBuildTupleDesc(Relation relation)
> we need to add "constr->has_generated_virtual" for the following code?
> 
>      if (constr->has_not_null ||
>          constr->has_generated_stored ||
>          ndef > 0 ||
>          attrmiss ||
>          relation->rd_rel->relchecks > 0)

fixed in v8

> also seems there will be table_rewrite for adding virtual generated
> columns, but we can avoid that.
> The attached patch is the change and the tests.
> 
> i've put the tests in src/test/regress/sql/fast_default.sql,
> since it already has event triggers and trigger functions, we don't
> want to duplicate it.

Also added in v8.

Thanks!




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 18.09.24 04:38, jian he wrote:
> On Mon, Sep 16, 2024 at 5:22 PM jian he <jian.universality@gmail.com> wrote:
>>
>> in v7.
>>
> seems I am confused with the version number.
> 
> here, I attached another minor change in tests.
> 
> make
> ERROR:  invalid ON DELETE action for foreign key constraint containing
> generated column
> becomes
> ERROR:  foreign key constraints on virtual generated columns are not supported

I think the existing behavior is fine.  The first message is about 
something that is invalid anyway.  The second message is just that 
something is not supported yet.  If we end up implementing, then users 
will get the first message.

> change contrib/pageinspect/sql/page.sql
> expand information on t_infomask, t_bits information.

added to v8 patch

> change RelationBuildLocalRelation
> make the transient TupleDesc->TupleConstr three bool flags more accurate.

I don't think we need that.  At the time this is used, the generation 
expressions are not added to the table yet.  Note that stored generated 
columns are not dealt with here either.  If there is a bug, then we can 
fix it, but if not, then I'd rather keep the code simpler.




Re: Virtual generated columns

From
Dean Rasheed
Date:
On Tue, 5 Nov 2024 at 16:17, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> New patch version.

In expand_generated_columns_in_expr():

+        /*
+         * XXX For the benefit of triggers, make two passes, so it covers
+         * PRS2_OLD_VARNO and PRS2_NEW_VARNO.
+         */
+        node = expand_generated_columns_internal(node, rel, 1, rte);
+        node = expand_generated_columns_internal(node, rel, 2, rte);

It seems a bit messy to be doing these two passes in
expand_generated_columns_in_expr(), when it is only needed for
triggers. I think it was better the way it was in the v7 patch,
passing rt_index to expand_generated_columns_in_expr(), so that
TriggerEnabled() did this:

+            tgqual = (Node *)
expand_generated_columns_in_expr(tgqual, relinfo->ri_RelationDesc,
PRS2_OLD_VARNO);
+            tgqual = (Node *)
expand_generated_columns_in_expr(tgqual, relinfo->ri_RelationDesc,
PRS2_NEW_VARNO);

Regards,
Dean



Re: Virtual generated columns

From
Amit Kapila
Date:
On Tue, Nov 5, 2024 at 9:48 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 30.09.24 04:09, Peter Eisentraut wrote:
> > I'm attaching a consolidated patch here, so we have something up to date
> > on the record.  I haven't worked through all the other recent feedback
> > from Jian He yet; I'll do that next.
>
> New patch version.  I've gone through the whole thread again and looked
> at all the feedback and various bug reports and test cases and made sure
> they are all addressed in the latest patch version.  (I'll send some
> separate messages to respond to some individual messages, but I'm
> keeping the latest patch here.)
>

I have tried to analyze this patch's interaction with logical
replication. The patch allows virtual generated columns in row filters
and column lists. But for the column list, it doesn't seem to be
computing the correct value whereas for the row filter, it is working
due to the following change:

@@ -992,7 +993,7 @@ pgoutput_row_filter_init(PGOutputData *data, List
*publications,
  continue;

  foreach(lc, rfnodes[idx])
- filters = lappend(filters, stringToNode((char *) lfirst(lc)));
+ filters = lappend(filters,
expand_generated_columns_in_expr(stringToNode((char *) lfirst(lc)),
relation));

The possible idea to replicate virtual generated columns is to compute
the corresponding expression before sending the data to the client. If
we can allow it in the row filter than why not to publish it as well.
To allow updates, we need to ensure that the replica identity should
include all columns referenced by the generated expression. For
example, if the generated column is defined as generated always as (c1
+ c2), the replica identity must include both c1 and c2.

Now, if we can't support the replication of virtual generated columns
due to some reason then we can mention in docs for
publish_generated_columns that it is used only to replicate STORED
generated columns but if we can support it then the
publish_generated_columns can accept string values like 'stored',
'virtual', 'all'.

Thoughts?

--
With Regards,
Amit Kapila.



Re: Virtual generated columns

From
vignesh C
Date:
On Tue, 5 Nov 2024 at 21:48, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 30.09.24 04:09, Peter Eisentraut wrote:
> > I'm attaching a consolidated patch here, so we have something up to date
> > on the record.  I haven't worked through all the other recent feedback
> > from Jian He yet; I'll do that next.
>
> New patch version.  I've gone through the whole thread again and looked
> at all the feedback and various bug reports and test cases and made sure
> they are all addressed in the latest patch version.  (I'll send some
> separate messages to respond to some individual messages, but I'm
> keeping the latest patch here.)

The patch needs to be rebased due to a recent commit 14e87ffa5c5. I
have verified the behavior of logical replication of row filters on
the virtual generated column, and everything appears to be functioning
as expected. One suggestion would be to add a test case for the row
filter on a virtual generated column.

Regards,
Vignesh



Re: Virtual generated columns

From
jian he
Date:
On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> New patch version.  I've gone through the whole thread again and looked
> at all the feedback and various bug reports and test cases and made sure
> they are all addressed in the latest patch version.  (I'll send some
> separate messages to respond to some individual messages, but I'm
> keeping the latest patch here.)

just quickly note the not good error message before you rebase.

src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS (2) ;
ERROR:  unrecognized constraint subtype: 4
src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
(2) stored;
ERROR:  unrecognized constraint subtype: 4
src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
(2) virtual;
ERROR:  unrecognized constraint subtype: 4

reading gram.y, typedef struct Constraint seems cannot distinguish, we
are creating a domain or create table.
I cannot found a way to error out in gram.y.

so we have to error out at DefineDomain.



Re: Virtual generated columns

From
jian he
Date:
> On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > New patch version.  I've gone through the whole thread again and looked
> > at all the feedback and various bug reports and test cases and made sure
> > they are all addressed in the latest patch version.  (I'll send some
> > separate messages to respond to some individual messages, but I'm
> > keeping the latest patch here.)
>

RelationBuildPartitionKey
if (!isnull)
{
        char       *exprString;
        Node       *expr;
        exprString = TextDatumGetCString(datum);
        expr = stringToNode(exprString);
        pfree(exprString);
        expr = expand_generated_columns_in_expr(expr, relation);
}
no need expand_generated_columns_in_expr?
in ComputePartitionAttrs, we already forbidden generated columns to be
part of the partition key.



check_modified_virtual_generated, we can replace fastgetattr to
heap_attisnull? like:
            // bool        isnull;
            // fastgetattr(tuple, i + 1, tupdesc, &isnull);
            // if (!isnull)
            //     ereport(ERROR,
            //             (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
            //              errmsg("trigger modified virtual generated
column value")));
            if (!heap_attisnull(tuple, i+1, tupdesc))
                ereport(ERROR,
                        (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
                         errmsg("trigger modified virtual generated
column value")));



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 07.11.24 11:02, Dean Rasheed wrote:
> On Tue, 5 Nov 2024 at 16:17, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> New patch version.
> 
> In expand_generated_columns_in_expr():
> 
> +        /*
> +         * XXX For the benefit of triggers, make two passes, so it covers
> +         * PRS2_OLD_VARNO and PRS2_NEW_VARNO.
> +         */
> +        node = expand_generated_columns_internal(node, rel, 1, rte);
> +        node = expand_generated_columns_internal(node, rel, 2, rte);
> 
> It seems a bit messy to be doing these two passes in
> expand_generated_columns_in_expr(), when it is only needed for
> triggers. I think it was better the way it was in the v7 patch,
> passing rt_index to expand_generated_columns_in_expr(), so that
> TriggerEnabled() did this:
> 
> +            tgqual = (Node *)
> expand_generated_columns_in_expr(tgqual, relinfo->ri_RelationDesc,
> PRS2_OLD_VARNO);
> +            tgqual = (Node *)
> expand_generated_columns_in_expr(tgqual, relinfo->ri_RelationDesc,
> PRS2_NEW_VARNO);

Yeah, I put it back that way in v9.




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 11.11.24 12:37, jian he wrote:
> On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> New patch version.  I've gone through the whole thread again and looked
>> at all the feedback and various bug reports and test cases and made sure
>> they are all addressed in the latest patch version.  (I'll send some
>> separate messages to respond to some individual messages, but I'm
>> keeping the latest patch here.)
> 
> just quickly note the not good error message before you rebase.
> 
> src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS (2) ;
> ERROR:  unrecognized constraint subtype: 4
> src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
> (2) stored;
> ERROR:  unrecognized constraint subtype: 4
> src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
> (2) virtual;
> ERROR:  unrecognized constraint subtype: 4
> 
> reading gram.y, typedef struct Constraint seems cannot distinguish, we
> are creating a domain or create table.
> I cannot found a way to error out in gram.y.
> 
> so we have to error out at DefineDomain.

This appears to be a very old problem independent of this patch.  I'll 
take a look at fixing it.




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 12.11.24 09:49, jian he wrote:
>> On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> RelationBuildPartitionKey
> if (!isnull)
> {
>          char       *exprString;
>          Node       *expr;
>          exprString = TextDatumGetCString(datum);
>          expr = stringToNode(exprString);
>          pfree(exprString);
>          expr = expand_generated_columns_in_expr(expr, relation);
> }
> no need expand_generated_columns_in_expr?
> in ComputePartitionAttrs, we already forbidden generated columns to be
> part of the partition key.

True.  I have removed this extra code in v9.

> check_modified_virtual_generated, we can replace fastgetattr to
> heap_attisnull? like:
>              // bool        isnull;
>              // fastgetattr(tuple, i + 1, tupdesc, &isnull);
>              // if (!isnull)
>              //     ereport(ERROR,
>              //             (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
>              //              errmsg("trigger modified virtual generated
> column value")));
>              if (!heap_attisnull(tuple, i+1, tupdesc))
>                  ereport(ERROR,
>                          (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
>                           errmsg("trigger modified virtual generated
> column value")));

I don't know.  fastgetattr() is supposed to be "fast". ;-)  It's all 
inline functions, so maybe that is actually correct.  I don't have a 
strong opinion either way.




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 11.11.24 06:51, vignesh C wrote:
> The patch needs to be rebased due to a recent commit 14e87ffa5c5.

done in v9

> I
> have verified the behavior of logical replication of row filters on
> the virtual generated column, and everything appears to be functioning
> as expected. One suggestion would be to add a test case for the row
> filter on a virtual generated column.

Yes, I just need a find a good place to put it into 
src/test/subscription/t/028_row_filter.pl.  It's very long. ;-)




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 10.11.24 04:16, Amit Kapila wrote:
> The possible idea to replicate virtual generated columns is to compute
> the corresponding expression before sending the data to the client. If
> we can allow it in the row filter than why not to publish it as well.

Row filters have pretty strong restrictions for what kind of operations 
they can contain.  Applying those restrictions to virtual generated 
columns would probably not make that feature very useful.  (You want to 
use virtual columns for expressions that are too cumbersome to write out 
by hand every time.)

Moreover, we would have to implement some elaborate cross-checks if a 
table gets added to a publication.  How would that work?  "Can't add 
table x to publication because it contains a virtual generated column 
with a non-simple expression"?  With row filters, this is less of a 
problem, because the row filter a property of the publication.




Re: Virtual generated columns

From
Alvaro Herrera
Date:
On 2024-Nov-12, Peter Eisentraut wrote:

> On 12.11.24 09:49, jian he wrote:
> > > On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote:

> > check_modified_virtual_generated, we can replace fastgetattr to
> > heap_attisnull? like:
> >              // bool        isnull;
> >              // fastgetattr(tuple, i + 1, tupdesc, &isnull);
> >              // if (!isnull)
> >              //     ereport(ERROR,
> >              //             (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
> >              //              errmsg("trigger modified virtual generated
> > column value")));
> >              if (!heap_attisnull(tuple, i+1, tupdesc))
> >                  ereport(ERROR,
> >                          (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
> >                           errmsg("trigger modified virtual generated
> > column value")));
> 
> I don't know.  fastgetattr() is supposed to be "fast". ;-)  It's all inline
> functions, so maybe that is actually correct.  I don't have a strong opinion
> either way.

I think Jian is right: if you're only interested in the isnull bit, then
heap_attisnull is more appropriate, because it doesn't have to decode
("deform") the tuple before giving you the answer; it knows the answer
by checking just the nulls bitmap.  With fastgetattr you still fetch the
value from the data bytes, even though your function doesn't care about
it.  That's probably even measurable for wide tuples if the generated
attrs are at the end, which sounds common.


Personally I dislike using 0-based loops for attribute numbers, which
are 1-based.  For peace of mind, I'd write this as

   for (AttrNumber i = 1; i <= tupdesc->natts; i++)
   {
       if (TupleDescAttr(tupdesc, i - 1)->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
       {
           bool        isnull;

           fastgetattr(tuple, i, tupdesc, &isnull); // heap_attisnull here actually

I'm kind of annoyed that TupleDescAttr() was made to refer to array
indexes rather than attribute numbers, but by the time I realized it had
happened, it was too late.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)



Re: Virtual generated columns

From
jian he
Date:
in
transformColumnDefinition
we can add parser_errposition for the error report.
        if (column->is_not_null && column->generated ==
ATTRIBUTE_GENERATED_VIRTUAL)
            ereport(ERROR,
                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                     errmsg("not-null constraints are not supported on
virtual generated columns"),
                     parser_errposition(cxt->pstate,
constraint->location)));
sometimes, it points to the word "generated", sometimes "not". I guess
this should be fine.
example:
create table t13 (a int, b bool generated always as ((true )) VIRTUAL not null);
create table t13 (a int, b bool not null generated always as ((true )) VIRTUAL);


These 3 functions will call StoreRelNotNull to store the not-null constraint.
StoreConstraints
AddRelationNotNullConstraints
AddRelationNewConstraints

we can disallow not-null on virtual generated columns via these 3 functions.
I guess we don't want to add more complexity to AddRelationNotNullConstraints.
we can do it in StoreRelNotNull.
like:
@@ -2185,8 +2196,19 @@ StoreRelNotNull(Relation rel, const char
*nnname, AttrNumber attnum,
 {
        Oid                     constrOid;
+       TupleDesc       tupdesc;
+       Form_pg_attribute att;
        Assert(attnum > InvalidAttrNumber);
+       tupdesc = RelationGetDescr(rel);
+       att             = TupleDescAttr(tupdesc, attnum - 1);
+
+       if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+               ereport(ERROR,
+                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                errmsg("not-null constraints are not
supported on virtual generated columns"),
+                                errdetail("Column \"%s\" of relation
\"%s\" is a virtual generated column.",
+
NameStr(att->attname), RelationGetRelationName(rel))));

related tests:
create table t12(b int, a int generated always as (11) virtual,
constraint nn not null a);
create table t12(b int, constraint nn not null a, a int generated
always as (11) virtual);

drop table if exists t14;
create table t14(b int, a int generated always as (11) virtual);
alter table t14 add constraint nn not null a;
alter table t14 add constraint nn not null a no inherit;



Re: Virtual generated columns

From
jian he
Date:
On Wed, Nov 13, 2024 at 11:30 AM jian he <jian.universality@gmail.com> wrote:
>
> These 3 functions will call StoreRelNotNull to store the not-null constraint.
> StoreConstraints
> AddRelationNotNullConstraints
> AddRelationNewConstraints
>
> we can disallow not-null on virtual generated columns via these 3 functions.
> I guess we don't want to add more complexity to AddRelationNotNullConstraints.
> we can do it in StoreRelNotNull.

inspired by not-null and check check_modified_virtual_generated again.

in plpgsql_exec_trigger, we can:
        /*
         * In BEFORE trigger, stored generated columns are not computed yet,
         * so make them null in the NEW row.  (Only needed in UPDATE branch;
         * in the INSERT case, they are already null, but in UPDATE, the field
         * still contains the old value.)  Alternatively, we could construct a
         * whole new row structure without the generated columns, but this way
         * seems more efficient and potentially less confusing.
         */
        if (tupdesc->constr && tupdesc->constr->has_generated_stored &&
            TRIGGER_FIRED_BEFORE(trigdata->tg_event))
        {
            for (int i = 0; i < tupdesc->natts; i++)
            {
                if (TupleDescAttr(tupdesc, i)->attgenerated ==
ATTRIBUTE_GENERATED_STORED ||
                    TupleDescAttr(tupdesc, i)->attgenerated ==
ATTRIBUTE_GENERATED_VIRTUAL)
                    expanded_record_set_field_internal(rec_new->erh,
                                                       i + 1,
                                                       (Datum) 0,
                                                       true,    /* isnull */
                                                       false, false);
            }
        }
then we don't need check_modified_virtual_generated at all.

this will align with the stored generated column behave for
BEFORE UPDATE/INSERT FOR EACH ROW trigger. that is
you are free to assign the virtual generated column any value,
but at the plpgsql_exec_trigger, we will rewrite it to null.


also i understand correctly.
later if we want to implement virtual generated column with not-null then
check_modified_virtual_generated needs to be removed?



Re: Virtual generated columns

From
Amit Kapila
Date:
On Tue, Nov 12, 2024 at 9:47 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 10.11.24 04:16, Amit Kapila wrote:
> > The possible idea to replicate virtual generated columns is to compute
> > the corresponding expression before sending the data to the client. If
> > we can allow it in the row filter than why not to publish it as well.
>
> Row filters have pretty strong restrictions for what kind of operations
> they can contain.  Applying those restrictions to virtual generated
> columns would probably not make that feature very useful.  (You want to
> use virtual columns for expressions that are too cumbersome to write out
> by hand every time.)
>

From this paragraph, it sounds like you are saying we can't support
virtual columns in row filters. But the patch already works (not
checked all possible cases). For example,

postgres=# CREATE TABLE gtest1 (a int PRIMARY KEY, b int GENERATED
ALWAYS AS (a * 2) VIRTUAL);
CREATE TABLE
postgres=# create publication pub2 for table gtest1 WHERE (b > 5);
CREATE PUBLICATION

After this, Insert also adheres to this row filter. I haven't tested
it in any further detail but its basic usage in row filters works.

> Moreover, we would have to implement some elaborate cross-checks if a
> table gets added to a publication.  How would that work?  "Can't add
> table x to publication because it contains a virtual generated column
> with a non-simple expression"?  With row filters, this is less of a
> problem, because the row filter a property of the publication.
>

Because virtual generated columns work in row filters, so I thought it
could follow the rules for column lists as well. If the virtual column
doesn't adhere to the rules of the row filter then it shouldn't even
work there. My response was based on the theory that the expression
for virtual columns could be computed during logical decoding. So,
let's first clarify that before discussing this point further.

--
With Regards,
Amit Kapila.



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 14.11.24 10:46, Amit Kapila wrote:
>> Moreover, we would have to implement some elaborate cross-checks if a
>> table gets added to a publication.  How would that work?  "Can't add
>> table x to publication because it contains a virtual generated column
>> with a non-simple expression"?  With row filters, this is less of a
>> problem, because the row filter a property of the publication.
>>
> Because virtual generated columns work in row filters, so I thought it
> could follow the rules for column lists as well. If the virtual column
> doesn't adhere to the rules of the row filter then it shouldn't even
> work there. My response was based on the theory that the expression
> for virtual columns could be computed during logical decoding. So,
> let's first clarify that before discussing this point further.

Row filter expressions have restrictions that virtual columns do not 
have.  For example, row filter expressions cannot use user-defined 
functions.  If you have a virtual column that uses a user-defined 
function and then you create a row filter using that virtual column, you 
get an error when you create the publication.  (This does not work 
correctly in the posted patches, but it will in v10 that I will post 
shortly.)  This behavior is ok, I think, you get the error when you 
write the faulty expression, and it's straightforward to implement.

Now let's say that we implement what you suggest that we compute virtual 
columns during logical decoding.  Then we presumably need similar 
restrictions, like not allowing user-defined functions.

Firstly, I don't know if that would be such a good restriction.  For row 
filters, that's maybe ok, but for virtual columns, you want to be able 
to write complex and interesting expressions, otherwise you wouldn't 
need a virtual column.

And secondly, we'd then need to implement logic to check that you can't 
add a table with a virtual column with a user-defined function to a 
publication.  This would happen not when you write the expression but 
only later when you operate on the table or publication.  So it's 
already a dubious user experience.

And the number of combinations and scenarios that you'd need to check 
there is immense.  (Not just CREATE PUBLICATION and ALTER PUBLICATION, 
but also CREATE TABLE when a FOR ALL TABLES publication exists, ALTER 
TABLE when new columns are added, new partitions are attached, and so 
on.)  Maybe someone wants to work on that, but that's more than I am 
currently signed up for.  And given the first point, I'm not sure if 
it's even such a useful feature.

I think, for the first iteration of this virtual generated columns 
feature, the publish_generated_columns option should just not apply to 
it.  Whether that means renaming the option or just documenting this is 
something for discussion.




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 12.11.24 17:50, Alvaro Herrera wrote:
>> On 12.11.24 09:49, jian he wrote:
>>>> On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> 
>>> check_modified_virtual_generated, we can replace fastgetattr to
>>> heap_attisnull? like:
>>>               // bool        isnull;
>>>               // fastgetattr(tuple, i + 1, tupdesc, &isnull);
>>>               // if (!isnull)
>>>               //     ereport(ERROR,
>>>               //             (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
>>>               //              errmsg("trigger modified virtual generated
>>> column value")));
>>>               if (!heap_attisnull(tuple, i+1, tupdesc))
>>>                   ereport(ERROR,
>>>                           (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
>>>                            errmsg("trigger modified virtual generated
>>> column value")));
>>
>> I don't know.  fastgetattr() is supposed to be "fast". ;-)  It's all inline
>> functions, so maybe that is actually correct.  I don't have a strong opinion
>> either way.
> 
> I think Jian is right: if you're only interested in the isnull bit, then
> heap_attisnull is more appropriate, because it doesn't have to decode
> ("deform") the tuple before giving you the answer; it knows the answer
> by checking just the nulls bitmap.  With fastgetattr you still fetch the
> value from the data bytes, even though your function doesn't care about
> it.  That's probably even measurable for wide tuples if the generated
> attrs are at the end, which sounds common.

Ok, I have fixed that in v10.

> Personally I dislike using 0-based loops for attribute numbers, which
> are 1-based.  For peace of mind, I'd write this as
> 
>     for (AttrNumber i = 1; i <= tupdesc->natts; i++)
>     {
>         if (TupleDescAttr(tupdesc, i - 1)->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
>         {
>             bool        isnull;
> 
>             fastgetattr(tuple, i, tupdesc, &isnull); // heap_attisnull here actually
> 
> I'm kind of annoyed that TupleDescAttr() was made to refer to array
> indexes rather than attribute numbers, but by the time I realized it had
> happened, it was too late.

Yes, this is unfortunately a constant source of confusion.




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 13.11.24 04:30, jian he wrote:
> in
> transformColumnDefinition
> we can add parser_errposition for the error report.
>          if (column->is_not_null && column->generated ==
> ATTRIBUTE_GENERATED_VIRTUAL)
>              ereport(ERROR,
>                      (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                       errmsg("not-null constraints are not supported on
> virtual generated columns"),
>                       parser_errposition(cxt->pstate,
> constraint->location)));
> sometimes, it points to the word "generated", sometimes "not". I guess
> this should be fine.
> example:
> create table t13 (a int, b bool generated always as ((true )) VIRTUAL not null);
> create table t13 (a int, b bool not null generated always as ((true )) VIRTUAL);

Ok, done in v10.

> These 3 functions will call StoreRelNotNull to store the not-null constraint.
> StoreConstraints
> AddRelationNotNullConstraints
> AddRelationNewConstraints

> related tests:
> create table t12(b int, a int generated always as (11) virtual,
> constraint nn not null a);
> create table t12(b int, constraint nn not null a, a int generated
> always as (11) virtual);
> 
> drop table if exists t14;
> create table t14(b int, a int generated always as (11) virtual);
> alter table t14 add constraint nn not null a;
> alter table t14 add constraint nn not null a no inherit;

Ok, I have added the missing checks and added these test cases to v10.

I didn't put the checks in StoreRelNotNull(), I think that is too late 
in the process.  (It's already trying to store it.  The checking should 
come earlier.)  I put the checks into AddRelationNewConstraints() and 
AddRelationNotNullConstraints(), which already have similar checks.




Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 14.11.24 09:48, jian he wrote:
> inspired by not-null and check check_modified_virtual_generated again.
> 
> in plpgsql_exec_trigger, we can:
>          /*
>           * In BEFORE trigger, stored generated columns are not computed yet,
>           * so make them null in the NEW row.  (Only needed in UPDATE branch;
>           * in the INSERT case, they are already null, but in UPDATE, the field
>           * still contains the old value.)  Alternatively, we could construct a
>           * whole new row structure without the generated columns, but this way
>           * seems more efficient and potentially less confusing.
>           */
>          if (tupdesc->constr && tupdesc->constr->has_generated_stored &&
>              TRIGGER_FIRED_BEFORE(trigdata->tg_event))
>          {
>              for (int i = 0; i < tupdesc->natts; i++)
>              {
>                  if (TupleDescAttr(tupdesc, i)->attgenerated ==
> ATTRIBUTE_GENERATED_STORED ||
>                      TupleDescAttr(tupdesc, i)->attgenerated ==
> ATTRIBUTE_GENERATED_VIRTUAL)
>                      expanded_record_set_field_internal(rec_new->erh,
>                                                         i + 1,
>                                                         (Datum) 0,
>                                                         true,    /* isnull */
>                                                         false, false);
>              }
>          }
> then we don't need check_modified_virtual_generated at all.
> 
> this will align with the stored generated column behave for
> BEFORE UPDATE/INSERT FOR EACH ROW trigger. that is
> you are free to assign the virtual generated column any value,
> but at the plpgsql_exec_trigger, we will rewrite it to null.
> 
> also i understand correctly.
> later if we want to implement virtual generated column with not-null then
> check_modified_virtual_generated needs to be removed?

The purpose of check_modified_virtual_generated() for trigger functions 
written in C.  The prevent someone from inserting real values into the 
trigger tuples, because they would then be processed by the rest of the 
system, which would be incorrect.

Higher-level languages such as plpgsql should handle that themselves, by 
preventing setting generated columns in trigger functions.  The presence 
of check_modified_virtual_generated() is still a backstop for those, but 
shouldn't really be necessary.



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 12.11.24 17:10, Peter Eisentraut wrote:
> On 11.11.24 06:51, vignesh C wrote:
>> The patch needs to be rebased due to a recent commit 14e87ffa5c5.
> 
> done in v9
> 
>> I
>> have verified the behavior of logical replication of row filters on
>> the virtual generated column, and everything appears to be functioning
>> as expected. One suggestion would be to add a test case for the row
>> filter on a virtual generated column.
> 
> Yes, I just need a find a good place to put it into src/test/ 
> subscription/t/028_row_filter.pl.  It's very long. ;-)

I have added tests in the v10 patch.




Re: Virtual generated columns

From
Amit Kapila
Date:
On Fri, Nov 29, 2024 at 3:16 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 14.11.24 10:46, Amit Kapila wrote:
> >> Moreover, we would have to implement some elaborate cross-checks if a
> >> table gets added to a publication.  How would that work?  "Can't add
> >> table x to publication because it contains a virtual generated column
> >> with a non-simple expression"?  With row filters, this is less of a
> >> problem, because the row filter a property of the publication.
> >>
> > Because virtual generated columns work in row filters, so I thought it
> > could follow the rules for column lists as well. If the virtual column
> > doesn't adhere to the rules of the row filter then it shouldn't even
> > work there. My response was based on the theory that the expression
> > for virtual columns could be computed during logical decoding. So,
> > let's first clarify that before discussing this point further.
>
> Row filter expressions have restrictions that virtual columns do not
> have.  For example, row filter expressions cannot use user-defined
> functions.  If you have a virtual column that uses a user-defined
> function and then you create a row filter using that virtual column, you
> get an error when you create the publication.  (This does not work
> correctly in the posted patches, but it will in v10 that I will post
> shortly.)  This behavior is ok, I think, you get the error when you
> write the faulty expression, and it's straightforward to implement.
>

Fair enough but the same argument applies to the column list. I mean
to say based on the same theory, users will get the ERROR when an
unsupported virtual column type will be used in column the list.

> Now let's say that we implement what you suggest that we compute virtual
> columns during logical decoding.  Then we presumably need similar
> restrictions, like not allowing user-defined functions.
>
> Firstly, I don't know if that would be such a good restriction.  For row
> filters, that's maybe ok, but for virtual columns, you want to be able
> to write complex and interesting expressions, otherwise you wouldn't
> need a virtual column.
>
> And secondly, we'd then need to implement logic to check that you can't
> add a table with a virtual column with a user-defined function to a
> publication.  This would happen not when you write the expression but
> only later when you operate on the table or publication.  So it's
> already a dubious user experience.
>
> And the number of combinations and scenarios that you'd need to check
> there is immense.  (Not just CREATE PUBLICATION and ALTER PUBLICATION,
> but also CREATE TABLE when a FOR ALL TABLES publication exists, ALTER
> TABLE when new columns are added, new partitions are attached, and so
> on.)  Maybe someone wants to work on that, but that's more than I am
> currently signed up for.  And given the first point, I'm not sure if
> it's even such a useful feature.
>
> I think, for the first iteration of this virtual generated columns
> feature, the publish_generated_columns option should just not apply to
> it.
>

Ok. But as mentioned above, we should consider it for the column list.

>
  Whether that means renaming the option or just documenting this is
> something for discussion.
>

We can go either way. Say, if we just document it and in the future,
if we want to support it for virtual columns then we need to introduce
another boolean option like publish_generated_virtual_columns. The
other possibility is that we change publish_generated_columns to enum
or string and allow values 's' (stored), 'v' (virtual), and 'n'
(none). Now, only 's' and 'n' will be supported. In the future, if one
wishes to add support for virtual columns, we have a provision to
extend the existing option.

--
With Regards,
Amit Kapila.



Re: Virtual generated columns

From
Peter Eisentraut
Date:
On 28.11.24 10:35, Peter Eisentraut wrote:
> On 12.11.24 17:08, Peter Eisentraut wrote:
>> On 11.11.24 12:37, jian he wrote:
>>> On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut 
>>> <peter@eisentraut.org> wrote:
>>>>
>>>> New patch version.  I've gone through the whole thread again and looked
>>>> at all the feedback and various bug reports and test cases and made 
>>>> sure
>>>> they are all addressed in the latest patch version.  (I'll send some
>>>> separate messages to respond to some individual messages, but I'm
>>>> keeping the latest patch here.)
>>>
>>> just quickly note the not good error message before you rebase.
>>>
>>> src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS 
>>> (2) ;
>>> ERROR:  unrecognized constraint subtype: 4
>>> src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
>>> (2) stored;
>>> ERROR:  unrecognized constraint subtype: 4
>>> src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
>>> (2) virtual;
>>> ERROR:  unrecognized constraint subtype: 4
>>>
>>> reading gram.y, typedef struct Constraint seems cannot distinguish, we
>>> are creating a domain or create table.
>>> I cannot found a way to error out in gram.y.
>>>
>>> so we have to error out at DefineDomain.
>>
>> This appears to be a very old problem independent of this patch.  I'll 
>> take a look at fixing it.
> 
> Here is a patch.
> 
> I'm on the fence about taking out the default case.  It does catch the 
> missing enum values, and I suppose if the struct arrives in 
> DefineDomain() with a corrupted contype value that is none of the enum 
> values, then we'd just do nothing with it.  Maybe go ahead with this, 
> but for backpatching leave the default case in place?

I have committed this, just to master for now.