Thread: [HACKERS] generated columns

[HACKERS] generated columns

From
Peter Eisentraut
Date:
Here is another attempt to implement generated columns.  This is a
well-known SQL-standard feature, also available for instance in DB2,
MySQL, Oracle.  A quick example:

  CREATE TABLE t1 (
    ...,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm * 2.54)
  );

(This is not related to the recent identity columns feature, other than
the similar syntax and some overlap internally.)

In previous discussions, it has often been a source of confusion whether
these generated columns are supposed to be computed on insert/update and
stored, or computed when read.  The SQL standard is not explicit, but
appears to lean toward stored.  DB2 stores.  Oracle computes on read.
MySQL supports both.  So I target implementing both.  This makes sense:
Both regular views and materialized views have their uses, too.  For the
syntax, I use the MySQL/Oracle syntax of appending [VIRTUAL|STORED].  In
this patch, only VIRTUAL is fully implemented.  I also have STORED kind
of working, but it wasn't fully baked, so I haven't included it here.

Known bugs:

- pg_dump produces a warning about a dependency loop when dumping these.
 Will need to be fixed at some point, but it doesn't prevent anything
from working right now.

Open design issues:

- COPY behavior: Currently, generated columns are automatically omitted
if there is no column list, and prohibited if specified explicitly.
When stored generated columns are implemented, they could be copied out.
 Some user options might be possible here.

- Catalog storage: I store the generation expression in pg_attrdef, like
a default.  For the most part, this works well.  It is not clear,
however, what pg_attribute.atthasdef should say.  Half the code thinks
that atthasdef means "there is something in pg_attrdef", the other half
thinks "column has a DEFAULT expression".  Currently, I'm going with the
former interpretation, because that is wired in quite deeply and things
start to crash if you violate it, but then code that wants to know
whether a column has a traditional DEFAULT expression needs to check
atthasdef && !attgenerated or something like that.

Missing/future functionality:

- STORED variant

- various ALTER TABLE variants

- index support (and related constraint support)

These can be added later once the basics are nailed down.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] generated columns

From
Greg Stark
Date:
On 31 August 2017 at 05:16, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Here is another attempt to implement generated columns.  This is a
> well-known SQL-standard feature, also available for instance in DB2,
> MySQL, Oracle.  A quick example:
>
>   CREATE TABLE t1 (
>     ...,
>     height_cm numeric,
>     height_in numeric GENERATED ALWAYS AS (height_cm * 2.54)
>   );

I only recently discovered we actually already have this feature. Kind of.
stark=# CREATE TABLE t1 (height_cm numeric);
CREATE TABLE
Time: 38.066 ms
stark***=# create function height_in(t t1) returns numeric language
'sql' as 'select t.height_cm * 2.54' ;
CREATE FUNCTION
Time: 1.216 ms
stark***=# insert into t1 values (2);
INSERT 0 1
Time: 10.170 ms
stark***=# select t1.height_cm, t1.height_in from t1;
┌───────────┬───────────┐
│ height_cm │ height_in │
├───────────┼───────────┤
│         2 │      5.08 │
└───────────┴───────────┘
(1 row)

Time: 1.997 ms

Yours looks better :)

-- 
greg

Re: [HACKERS] generated columns

From
Jaime Casanova
Date:
On 30 August 2017 at 23:16, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Here is another attempt to implement generated columns.  This is a
> well-known SQL-standard feature, also available for instance in DB2,
> MySQL, Oracle.
>
[...]
>
> In previous discussions, it has often been a source of confusion whether
> these generated columns are supposed to be computed on insert/update and
> stored, or computed when read.  The SQL standard is not explicit, but
> appears to lean toward stored.  DB2 stores.  Oracle computes on read.
> MySQL supports both.  So I target implementing both.  This makes sense:
> Both regular views and materialized views have their uses, too.  For the
> syntax, I use the MySQL/Oracle syntax of appending [VIRTUAL|STORED].  In
> this patch, only VIRTUAL is fully implemented.  I also have STORED kind
> of working, but it wasn't fully baked, so I haven't included it here.
>

Hi,

It applies and compiles without problems, it passes regression tests
and it does what it claims to do:

During my own tests, though, i found some problems:

-- UPDATEing the column, this is at least weird

postgres=# update t1 set height_in = 15;
ERROR:  column "height_in" can only be updated to DEFAULT
DETAIL:  Column "height_in" is a generated column.
postgres=# update t1 set height_in = default;
UPDATE 1


-- In a view it doesn't show any value

postgres=# create view v1 as select * from t1;
CREATE VIEW
postgres=# insert into t1(height_cm) values (10);
INSERT 0 1
postgres=# select * from t1;  id   | height_cm | height_in
--------+-----------+-----------198000 |        10 |     25.40
(1 row)

postgres=# select * from v1;  id   | height_cm | height_in
--------+-----------+-----------198000 |        10 |
(1 row)


-- In a inherits/partition tree, the default gets malformed

postgres=# create table t1_1 () inherits (t1);
CREATE TABLE
postgres=# \d t1_1                            Table "public.t1_1" Column   |  Type   | Collation | Nullable |
Default
 
-----------+---------+-----------+----------+--------------------------------id        | integer |           | not null
|nextval('t1_id_seq'::regclass)height_cm | numeric |           |          |height_in | numeric |           |          |
height_cm* 2.54
 
Inherits: t1

postgres=# insert into t1_1 values (11);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Jaime Casanova
Date:
On 10 September 2017 at 00:08, Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
>
> During my own tests, though, i found some problems:
>

a few more tests:

create table t1 (id serial,height_cm int,height_in int generated always as (height_cm * 10)
) ;


"""
postgres=# alter table t1 alter height_cm type numeric;
ERROR:  unexpected object depending on column: table t1 column height_in
"""
should i drop the column and recreate it after the fact? this seems
more annoying than the same problem with views (drop view & recreate),
specially after you implement STORED


"""
postgres=# alter table t1 alter height_in type numeric;
ERROR:  found unexpected dependency type 'a'
"""
uh!?


also is interesting that in triggers, both before and after, the
column has a null. that seems reasonable in a before trigger but not
in an after trigger
"""
create function f_trg1() returns trigger as $$ begin    raise notice '%', new.height_in;    return new; end
$$ language plpgsql;

create trigger trg1 before insert on t1
for each row execute procedure f_trg1();

postgres=# insert into t1 values(default, 100);
NOTICE:  <NULL>
INSERT 0 1

create trigger trg2 after insert on t1
for each row execute procedure f_trg1();

postgres=# insert into t1 values(default, 100);
NOTICE:  <NULL>
NOTICE:  <NULL>
INSERT 0 1
"""

the default value shouldn't be dropped.
"""
postgres=# alter table t1 alter height_in drop default;
ALTER TABLE
postgres=# \d t1                             Table "public.t1" Column   |  Type   | Collation | Nullable |
Default
----------------+---------+-----------+----------+--------------------------------id             | integer |
|not null |
 
nextval('t1_id_seq'::regclass)height_cm | integer |           |          |height_in   | integer |           |
|generated always as ()
 
"""
-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Serge Rielau
Date:

On Sep 12, 2017, at 12:35 PM, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote:

also is interesting that in triggers, both before and after, the
column has a null. that seems reasonable in a before trigger but not
in an after trigger
Why is a NULL reasonable for before triggers?
If I create a table with a column with default and I omit that column on INSERT
Is the column value also NULL in the before trigger? (I hope not)

BTW, the original idea behind generated columns was to materialize them.
Reason being to avoid expensive computations of frequently used expressions 
(and to support indexing in the absence of indexes with expressions)
 
You may find the following amusing:

Cheers
Serge Rielau


Re: [HACKERS] generated columns

From
Simon Riggs
Date:
On 31 August 2017 at 05:16, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Here is another attempt to implement generated columns.  This is a
> well-known SQL-standard feature, also available for instance in DB2,
> MySQL, Oracle.  A quick example:
>
>   CREATE TABLE t1 (
>     ...,
>     height_cm numeric,
>     height_in numeric GENERATED ALWAYS AS (height_cm * 2.54)
>   );

Cool

> - pg_dump produces a warning about a dependency loop when dumping these.
>  Will need to be fixed at some point, but it doesn't prevent anything
> from working right now.
>
> Open design issues:
>
> - COPY behavior: Currently, generated columns are automatically omitted
> if there is no column list, and prohibited if specified explicitly.
> When stored generated columns are implemented, they could be copied out.
>  Some user options might be possible here.

If the values are generated immutably there would be no value in
including them in a dump. If you did dump them then they couldn't be
reloaded without error, so again, no point in dumping them.

COPY (SELECT...) already allows you options to include or exclude any
columns you wish, so I don't see the need for special handling here.

IMHO, COPY TO would exclude generated columns of either kind, ensuring
that the reload would just work.

> - Catalog storage: I store the generation expression in pg_attrdef, like
> a default.  For the most part, this works well.  It is not clear,
> however, what pg_attribute.atthasdef should say.  Half the code thinks
> that atthasdef means "there is something in pg_attrdef", the other half
> thinks "column has a DEFAULT expression".  Currently, I'm going with the
> former interpretation, because that is wired in quite deeply and things
> start to crash if you violate it, but then code that wants to know
> whether a column has a traditional DEFAULT expression needs to check
> atthasdef && !attgenerated or something like that.
>
> Missing/future functionality:
>
> - STORED variant

For me, this option would be the main feature. Presumably if STORED
then we wouldn't need the functions to be immutable, making it easier
to have columns like last_update_timestamp or last_update_username
etc..

I think an option to decide whether the default is STORED or VIRTUAL
would be useful.

> - various ALTER TABLE variants

Adding a column with GENERATED STORED would always be a full table rewrite.
Hmm, I wonder if its worth having a mixed mode: stored for new rows,
only virtual for existing rows; that way we could add GENERATED
columns easily.

> - index support (and related constraint support)

Presumably you can't index a VIRTUAL column. Or at least I don't think
its worth spending time trying to make it work.

> These can be added later once the basics are nailed down.

I imagine that if a column is generated then it is not possible to
have column level INSERT | UPDATE | DELETE privs on it. The generation
happens automatically as part of the write action if stored, or not
until select for virtual. It should be possible to have column level
SELECT privs.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Andreas Karlsson
Date:
On 09/13/2017 04:04 AM, Simon Riggs wrote:
> On 31 August 2017 at 05:16, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> - index support (and related constraint support)
> 
> Presumably you can't index a VIRTUAL column. Or at least I don't think
> its worth spending time trying to make it work.

I think end users would be surprised if one can index STORED columns and 
expressions but not VIRTUAL columns. So unless it is a huge project I 
would say it is worth it.

Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Simon Riggs
Date:
On 13 September 2017 at 09:09, Andreas Karlsson <andreas@proxel.se> wrote:
> On 09/13/2017 04:04 AM, Simon Riggs wrote:
>>
>> On 31 August 2017 at 05:16, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>>
>>> - index support (and related constraint support)
>>
>>
>> Presumably you can't index a VIRTUAL column. Or at least I don't think
>> its worth spending time trying to make it work.
>
>
> I think end users would be surprised if one can index STORED columns and
> expressions but not VIRTUAL columns. So unless it is a huge project I would
> say it is worth it.

It must be stored in the index certainly. I guess virtual is similar
to expression indexes then.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Robert Haas
Date:
On Tue, Sep 12, 2017 at 10:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I think an option to decide whether the default is STORED or VIRTUAL
> would be useful.

That seems like it could be a bit of a foot-gun.  For example, an
extension author who uses generated columns will have to be careful to
always specify one or the other, because they don't know what the
default will be on the system where it's deployed.  Similarly for an
author of a portable application.  I think it'll create fewer
headaches if we just pick a default and stick with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
David Fetter
Date:
On Wed, Sep 13, 2017 at 10:09:37AM +0200, Andreas Karlsson wrote:
> On 09/13/2017 04:04 AM, Simon Riggs wrote:
> >On 31 August 2017 at 05:16, Peter Eisentraut
> ><peter.eisentraut@2ndquadrant.com> wrote:
> >>- index support (and related constraint support)
> >
> >Presumably you can't index a VIRTUAL column. Or at least I don't
> >think its worth spending time trying to make it work.
> 
> I think end users would be surprised if one can index STORED columns
> and expressions but not VIRTUAL columns. So unless it is a huge
> project I would say it is worth it.

So long as the expression on the normal columns was immutable, it's
fit for an expressional index, as is any immutable function composed
with it.

What am I missing?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Daniel Gustafsson
Date:
> On 12 Sep 2017, at 21:35, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote:
>
> On 10 September 2017 at 00:08, Jaime Casanova
> <jaime.casanova@2ndquadrant.com> wrote:
>>
>> During my own tests, though, i found some problems:
>
> a few more tests:
>
> create table t1 (
> id serial,
> height_cm int,
> height_in int generated always as (height_cm * 10)
> ) ;
>
>
> """
> postgres=# alter table t1 alter height_cm type numeric;
> ERROR:  unexpected object depending on column: table t1 column height_in
> """
> should i drop the column and recreate it after the fact? this seems
> more annoying than the same problem with views (drop view & recreate),
> specially after you implement STORED
>
>
> """
> postgres=# alter table t1 alter height_in type numeric;
> ERROR:  found unexpected dependency type 'a'
> """
> uh!?
>
>
> also is interesting that in triggers, both before and after, the
> column has a null. that seems reasonable in a before trigger but not
> in an after trigger
> """
> create function f_trg1() returns trigger as $$
>  begin
>     raise notice '%', new.height_in;
>     return new;
>  end
> $$ language plpgsql;
>
> create trigger trg1 before insert on t1
> for each row execute procedure f_trg1();
>
> postgres=# insert into t1 values(default, 100);
> NOTICE:  <NULL>
> INSERT 0 1
>
> create trigger trg2 after insert on t1
> for each row execute procedure f_trg1();
>
> postgres=# insert into t1 values(default, 100);
> NOTICE:  <NULL>
> NOTICE:  <NULL>
> INSERT 0 1
> """
>
> the default value shouldn't be dropped.
> """
> postgres=# alter table t1 alter height_in drop default;
> ALTER TABLE
> postgres=# \d t1
>                              Table "public.t1"
>  Column   |  Type   | Collation | Nullable |            Default
> ----------------+---------+-----------+----------+--------------------------------
> id             | integer |           | not null |
> nextval('t1_id_seq'::regclass)
> height_cm | integer |           |          |
> height_in   | integer |           |          | generated always as ()
> “""

Based on this review, and the errors noted in upthread in the previous review,
I’m marking this Returned with feedback.  When an updated version of the patch
is ready, please re-submit it to an upcoming commitfest.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Nico Williams
Date:
On Thu, Aug 31, 2017 at 12:16:43AM -0400, Peter Eisentraut wrote:
> In previous discussions, it has often been a source of confusion whether
> these generated columns are supposed to be computed on insert/update and
> stored, or computed when read.  The SQL standard is not explicit, but
> appears to lean toward stored.  DB2 stores.  Oracle computes on read.

Question: How would one know the difference between storing computed         columns vs. computing them on read?

Answer?:  Performance.  If the computation is slow, then you'll really         notice on read.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Adam Brusselback
Date:
I know that for my use-cases, having both options available would be very appreciated.  The vast majority of the computed columns I would use in my database would be okay to compute on read.  But there are for sure some which would be performance prohibitive to have compute on read, so i'd rather have those stored.

So for me, i'd rather default to compute on read, as long storing the pre-computed value is an option when necessary.

Just my $0.02
Thanks,
-Adam

Re: [HACKERS] generated columns

From
Nico Williams
Date:
On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
> I know that for my use-cases, having both options available would be very
> appreciated.  The vast majority of the computed columns I would use in my
> database would be okay to compute on read.  But there are for sure some
> which would be performance prohibitive to have compute on read, so i'd
> rather have those stored.
> 
> So for me, i'd rather default to compute on read, as long storing the
> pre-computed value is an option when necessary.

Sure, I agree.  I was just wondering whether there might be any other
difference besides performance characteristics.  The answer to that is,
I think, "no".


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Tom Lane
Date:
Nico Williams <nico@cryptonector.com> writes:
> On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
>> So for me, i'd rather default to compute on read, as long storing the
>> pre-computed value is an option when necessary.

> Sure, I agree.  I was just wondering whether there might be any other
> difference besides performance characteristics.  The answer to that is,
> I think, "no".

What about non-immutable functions in the generation expression?
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
David Fetter
Date:
On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote:
> Nico Williams <nico@cryptonector.com> writes:
> > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
> >> So for me, i'd rather default to compute on read, as long storing the
> >> pre-computed value is an option when necessary.
> 
> > Sure, I agree.  I was just wondering whether there might be any other
> > difference besides performance characteristics.  The answer to that is,
> > I think, "no".
> 
> What about non-immutable functions in the generation expression?

Assuming they're permitted, which...well, I could make a case, they
should be mutually exclusive with the cached option.

I guess documenting the behavior in the manual would suffice, tempting
as it would be to include a NOTICE when the table goes from having 0
or more generated columns all of which are immutable to having at
least one that's not.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Nico Williams
Date:
On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote:
> Nico Williams <nico@cryptonector.com> writes:
> > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
> >> So for me, i'd rather default to compute on read, as long storing the
> >> pre-computed value is an option when necessary.
> 
> > Sure, I agree.  I was just wondering whether there might be any other
> > difference besides performance characteristics.  The answer to that is,
> > I think, "no".
> 
> What about non-immutable functions in the generation expression?

Aha, thanks!  Yes, that would be noticeable.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Nico Williams
Date:
So yes, distinguishing stored vs. not stored computed columns is useful,
especially if the expression can refer to other columns of the same row,
though not only then.

Examples:
     -- useful only if stored (assuming these never get updated)     inserted_at TIMESTAMP WITHOUT TIME ZONE AS
(clock_timestamp())
     -- useful only if stored     uuid uuid AS (uuid_generate_v4())
     -- useful only if stored     who_done_it TEXT (current_user)
     -- useful especially if not stored     user_at_host TEXT (user || '@' || host)
     -- useful if stored     original_user_at_host TEXT (user || '@' || host)

I assume once set, a stored computed column cannot be updated, though
maybe being able to allow this would be ok.

Obviously all of this can be done with triggers and VIEWs...  The most
useful case is where a computed column is NOT stored, because it saves
you having to have a table and a view, while support for the stored case
merely saves you having to have triggers.  Of course, triggers for
computing columns are rather verbose, so not having to write those would
be convenient.

Similarly with RLS.  RLS is not strictly necessary since VIEWs and
TRIGGERs allow one to accomplish much the same results, but it's a lot
of work to get that right, while RLS makes most policies very pithy.
(RLS for *update* policies, however, still can't refer to NEW and OLD,
so one still has to resort to triggers for updates in many cases).

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Greg Stark
Date:
There are some unanswered questions with column grants too. Do we
allow granting access to a calculated column which accesses columns
the user doesn't have access to?

If so then this is a suitable substitute for using updateable views to
handle things like granting users access to things like password
hashes or personal data with details censored without giving them
access to the unhashed password or full personal info.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 9/12/17 15:35, Jaime Casanova wrote:
> On 10 September 2017 at 00:08, Jaime Casanova
> <jaime.casanova@2ndquadrant.com> wrote:
>>
>> During my own tests, though, i found some problems:

Here is an updated patch that should address the problems you have found.

> also is interesting that in triggers, both before and after, the
> column has a null. that seems reasonable in a before trigger but not
> in an after trigger

Logically, you are correct.  But it seems excessive to compute all
virtual columns for every trigger.  I don't know how to consolidate
that, especially with the current trigger API that lets
you look more or less directly into the tuple.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Re: [HACKERS] generated columns

From
Joe Conway
Date:
On 12/27/2017 09:31 AM, Peter Eisentraut wrote:
> On 9/12/17 15:35, Jaime Casanova wrote:
>> On 10 September 2017 at 00:08, Jaime Casanova
>> <jaime.casanova@2ndquadrant.com> wrote:
>>>
>>> During my own tests, though, i found some problems:
>
> Here is an updated patch that should address the problems you have found.

In the commit message it says:

  "The plan to is implement two kinds of generated columns:
   virtual (computed on read) and stored (computed on write).  This
   patch only implements the virtual kind, leaving stubs to implement
   the stored kind later."

and in the patch itself:

+<para>
+ The generation expression can refer to other columns in the table, but
+ not other generated columns.  Any functions and operators used must be
+ immutable.  References to other tables are not allowed.
+</para>

Question -- when the "stored" kind of generated column is implemented,
will the immutable restriction be relaxed? I would like, for example, be
able to have a stored generated column that executes now() whenever the
row is written/rewritten.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: [HACKERS] Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 12/30/17 16:04, Joe Conway wrote:
> +<para>
> + The generation expression can refer to other columns in the table, but
> + not other generated columns.  Any functions and operators used must be
> + immutable.  References to other tables are not allowed.
> +</para>
> 
> Question -- when the "stored" kind of generated column is implemented,
> will the immutable restriction be relaxed? I would like, for example, be
> able to have a stored generated column that executes now() whenever the
> row is written/rewritten.

That restriction is from the SQL standard, and I think it will stay.
The virtual vs. stored choice is an optimization, but not meant to
affect semantics.  For example, you might want to automatically
substitute a precomputed generated column into an expression, but that
will become complicated and confusing if the expression is not
deterministic.

Another problem with your example is that a stored generated column
would only be updated if a column it depends on is updated.  So a column
whose generation expression is just now() would never get updated.

Maybe some of this could be relaxed at some point, but we would have to
think it through carefully.  For now, a trigger would still be the best
implementation for your use case, I think.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Re: [HACKERS] generated columns

From
Joe Conway
Date:
On 12/31/2017 09:38 AM, Peter Eisentraut wrote:
> On 12/30/17 16:04, Joe Conway wrote:
>> +<para>
>> + The generation expression can refer to other columns in the table, but
>> + not other generated columns.  Any functions and operators used must be
>> + immutable.  References to other tables are not allowed.
>> +</para>
>>
>> Question -- when the "stored" kind of generated column is implemented,
>> will the immutable restriction be relaxed? I would like, for example, be
>> able to have a stored generated column that executes now() whenever the
>> row is written/rewritten.

<snip>

> Maybe some of this could be relaxed at some point, but we would have to
> think it through carefully.  For now, a trigger would still be the best
> implementation for your use case, I think.

Sure, but generated column behavior in general can be implemented via
trigger.

Anyway, I have seen requests for change data capture
(https://en.wikipedia.org/wiki/Change_data_capture) in Postgres which is
apparently available in our competition without requiring the use of
triggers. Perhaps that is yet a different feature, but I was hopeful
that this mechanism could be used to achieve it.

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: [HACKERS] Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 12/31/17 12:54, Joe Conway wrote:
> Anyway, I have seen requests for change data capture
> (https://en.wikipedia.org/wiki/Change_data_capture) in Postgres which is
> apparently available in our competition without requiring the use of
> triggers. Perhaps that is yet a different feature, but I was hopeful
> that this mechanism could be used to achieve it.

I think logical decoding provides CDC.  The generated columns feature
doesn't have much to do with that, in my mind.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Wed, Dec 27, 2017 at 12:31:05PM -0500, Peter Eisentraut wrote:
> On 9/12/17 15:35, Jaime Casanova wrote:
>> On 10 September 2017 at 00:08, Jaime Casanova
>> <jaime.casanova@2ndquadrant.com> wrote:
>>>
>>> During my own tests, though, i found some problems:
>
> Here is an updated patch that should address the problems you have
> found.

Could you rebase the patch? I am planning to review it but there are
conflicts in genbki.pl.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 1/16/18 01:35, Michael Paquier wrote:
> On Wed, Dec 27, 2017 at 12:31:05PM -0500, Peter Eisentraut wrote:
>> On 9/12/17 15:35, Jaime Casanova wrote:
>>> On 10 September 2017 at 00:08, Jaime Casanova
>>> <jaime.casanova@2ndquadrant.com> wrote:
>>>>
>>>> During my own tests, though, i found some problems:
>>
>> Here is an updated patch that should address the problems you have
>> found.
> 
> Could you rebase the patch? I am planning to review it but there are
> conflicts in genbki.pl.

Here you go.  Those changes actually meant that genbki.pl doesn't need
to be touched by this patch at all, so that's a small win.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Tue, Jan 16, 2018 at 09:55:16AM -0500, Peter Eisentraut wrote:
> Here you go.  Those changes actually meant that genbki.pl doesn't need
> to be touched by this patch at all, so that's a small win.

Thanks for the updated version. I have spent some time looking at what
you are proposing here.

Instead of leaving bits for a feature that may or may not be
implemented, have you considered just blocking STORED at parsing level
and remove those bits? There is little point in keeping the equivalent
of dead code in the tree. So I would suggest a patch simplification:
- Drop the VIRTUAL/STORED parsing from the grammar for now.
- Define VIRTUAL as the default for the future.
This way, if support for STORED is added in the future the grammar can
just be extended. This is actually implied in pg_dump.c. And +1 for the
catalog format you are proposing.

=# CREATE TABLE gen_1 (a int, b int GENERATED ALWAYS AS (a * 2)
VIRTUAL);
CREATE TABLE
=# insert into gen_1 values (2000000000);
INSERT 0 1
=# select * from gen_1 ;
ERROR:  22003: integer out of range
Overflow checks do not happen when inserting, which makes the following
SELECT to fail. This could be confusing for the user because the row has
been inserted. Perhaps some evaluation of virtual tuples at insert phase
should happen. The existing behavior makes sense as well as virtual
values are only evaluated when read, so a test would be at least
welcome. Does the SQL spec mention the matter? How do other systems
handle such cases? CHECK constraints can be combined, still..

The last patch crashes for typed tables:
=# CREATE TYPE itest_type AS (f1 integer, f2 text, f3 bigint);
CREATE TYPE
=# CREATE TABLE itest12 OF itest_type (f1 WITH OPTIONS GENERATED ALWAYS
AS (f2 *2));
[... boom ...]
And for partitions:
=# CREATE TABLE itest_parent (f1 date NOT NULL, f2 text, f3 bigint)
   PARTITION BY RANGE (f1);
CREATE TABLE
=# CREATE TABLE itest_child PARTITION OF itest_parent (f3 WITH OPTIONS
   GENERATED ALWAYS AS (f3)) FOR VALUES FROM ('2016-07-01') TO
   ('2016-08-01');
[... boom ...]
Like what we did in 005ac298, I would suggest to throw
ERRCODE_FEATURE_NOT_SUPPORTED. Please also add some tests.

+SELECT a, c FROM gtest12;  -- FIXME: should be allowed
+ERROR:  permission denied for function gf1
[...]
+ALTER TABLE gtest27 ALTER COLUMN b DROP DEFAULT;  -- FIXME
+ERROR:  column "b" of relation "gtest27" is a generated column
Any fixes for those?

+    if (get_attgenerated(relationId, attno))
+        ereport(ERROR,
+            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+             errmsg("index creation on generated columns is not supported")));
Shouldn't such messages mention explicitely "virtually"-generated
columns? For stored columns the support would not be complicated in this
case.

=# create table ac (a int, b text generated always as (substr(a::text, 0, 3)));
CREATE TABLE
=# alter table ac alter COLUMN a type text;
ERROR:  42601: cannot alter type of a column used by a generated column
DETAIL:  Column "a" is used by generated column "b".
In this case ALTER TABLE could have worked. No complain from me to
disable that as a first step though.

+UPDATE gtest1 SET b = DEFAULT WHERE a = 1;
+UPDATE gtest1 SET b = 11 WHERE a = 1;  -- error
+ERROR:  column "b" can only be updated to DEFAULT
+DETAIL:  Column "b" is a generated column.
I see... This is consistent with the behavior of INSERT where DEFAULT
can be used and the INSERT succeeds.

+-- domains
+CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10);
+CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED
ALWAYS AS (a * 2));  -- prohibited
+ERROR:  virtual generated column "b" cannot have a domain type
CHECK constraints can be used, so I find this restriction confusing.

No test coverage for DELETE triggers?

+UPDATE gtest26 SET a = a * -2;
+INFO:  gtest1: old = (-2,)
+INFO:  gtest1: new = (4,)
+INFO:  gtest3: old = (-2,)
+INFO:  gtest3: new = (4,)
+INFO:  gtest4: old = (3,)
+INFO:  gtest4: new = (-6,)
OLD and NEW values for generated columns don't show up. Am I missing
something or they should be available?


@@ -15526,13 +15553,6 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
-    if (has_default)
-        appendPQExpBuffer(q, " DEFAULT %s",
-                  tbinfo->attrdefs[j]->adef_expr);
-
-    if (has_notnull)
-        appendPQExpBufferStr(q, " NOT NULL");
Why does the ordering of actions need to be changed here?

[nit_mode]
+    if (attgenerated)
+        /*
+         * Generated column: Dropping anything that the generation expression
+         * refers to automatically drops the generated column.
+         */
+        recordDependencyOnSingleRelExpr(&colobject, expr, RelationGetRelid(rel),
+                                        DEPENDENCY_AUTO,
+                                        DEPENDENCY_AUTO, false);
Please use brackers here if you use a comment in the if() block...
[/nit_mode]

COPY as you are proposing looks sensible to me. I am not sure about any
options though as it is possible to enforce the selection of generated
columns using COPY (SELECT).

Per my tests, generated columns can work with column system attributes
(xmax, xmin, etc.). Some tests could be added.

+    /*
+     * Generated columns don't use the attnotnull field but use a full CHECK
+     * constraint instead.  We could implement here that it finds that CHECK
+     * constraint and drops it, which is kind of what the SQL standard would
+     * require anyway, but that would be quite a bit more work.
+     */
+    if (((Form_pg_attribute) GETSTRUCT(tuple))->attgenerated)
+        ereport(ERROR,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                 errmsg("cannot use DROP NOT NULL on generated column \"%s\"",
+                        colName)));
And the point of storing NOT NULL constraints as CHECK constraints shows
up again... :( It would be nice to mention as well at the top of
ATExecSetNotNull() that a full-fledge switch could help generated
columns as well.

-    if (tab->relkind == RELKIND_RELATION ||
-        tab->relkind == RELKIND_PARTITIONED_TABLE)
+    if ((tab->relkind == RELKIND_RELATION ||
+         tab->relkind == RELKIND_PARTITIONED_TABLE) &&
+        get_attgenerated(RelationGetRelid(rel), attnum) != ATTRIBUTE_GENERATE
I think that you should store the result of get_attgenerated() and reuse
it multiple times.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 1/19/18 00:18, Michael Paquier wrote:
> Instead of leaving bits for a feature that may or may not be
> implemented, have you considered just blocking STORED at parsing level
> and remove those bits? There is little point in keeping the equivalent
> of dead code in the tree. So I would suggest a patch simplification:
> - Drop the VIRTUAL/STORED parsing from the grammar for now.
> - Define VIRTUAL as the default for the future.

fixed

> =# CREATE TABLE gen_1 (a int, b int GENERATED ALWAYS AS (a * 2)
> VIRTUAL);
> CREATE TABLE
> =# insert into gen_1 values (2000000000);
> INSERT 0 1
> =# select * from gen_1 ;
> ERROR:  22003: integer out of range
> Overflow checks do not happen when inserting, which makes the following
> SELECT to fail. This could be confusing for the user because the row has
> been inserted. Perhaps some evaluation of virtual tuples at insert phase
> should happen. The existing behavior makes sense as well as virtual
> values are only evaluated when read, so a test would be at least
> welcome.

added test

> Does the SQL spec mention the matter? How do other systems
> handle such cases?

In Oracle you get the same overflow error.

> CHECK constraints can be combined, still..

I think you can compare this to a view.  A view can produce overflow
errors on read.  But a CHECK constraint is more like a CHECK option on a
view that checks as values are put into the view.

> The last patch crashes for typed tables:
> =# CREATE TYPE itest_type AS (f1 integer, f2 text, f3 bigint);
> CREATE TYPE
> =# CREATE TABLE itest12 OF itest_type (f1 WITH OPTIONS GENERATED ALWAYS
> AS (f2 *2));
> [... boom ...]
> And for partitions:
> =# CREATE TABLE itest_parent (f1 date NOT NULL, f2 text, f3 bigint)
>    PARTITION BY RANGE (f1);
> CREATE TABLE
> =# CREATE TABLE itest_child PARTITION OF itest_parent (f3 WITH OPTIONS
>    GENERATED ALWAYS AS (f3)) FOR VALUES FROM ('2016-07-01') TO
>    ('2016-08-01');
> [... boom ...]
> Like what we did in 005ac298, I would suggest to throw
> ERRCODE_FEATURE_NOT_SUPPORTED. Please also add some tests.

done

> +SELECT a, c FROM gtest12;  -- FIXME: should be allowed
> +ERROR:  permission denied for function gf1

This is quite hard to fix and I would like to leave this for a future
release.

> +ALTER TABLE gtest27 ALTER COLUMN b DROP DEFAULT;  -- FIXME
> +ERROR:  column "b" of relation "gtest27" is a generated column

That FIXME seems to have been a mistake.  I have removed it.

> +    if (get_attgenerated(relationId, attno))
> +        ereport(ERROR,
> +            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +             errmsg("index creation on generated columns is not supported")));
> Shouldn't such messages mention explicitely "virtually"-generated
> columns? For stored columns the support would not be complicated in this
> case.

done

> +-- domains
> +CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10);
> +CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED
> ALWAYS AS (a * 2));  -- prohibited
> +ERROR:  virtual generated column "b" cannot have a domain type
> CHECK constraints can be used, so I find this restriction confusing.

We currently don't have infrastructure to support this.  We run all
CHECK constraints whenever a row is changed, so that is easy.  But we
don't have facilities to recheck the domain constraint in column b when
column a is updated.  This could be done, but it's extra work.

> No test coverage for DELETE triggers?

done

> +UPDATE gtest26 SET a = a * -2;
> +INFO:  gtest1: old = (-2,)
> +INFO:  gtest1: new = (4,)
> +INFO:  gtest3: old = (-2,)
> +INFO:  gtest3: new = (4,)
> +INFO:  gtest4: old = (3,)
> +INFO:  gtest4: new = (-6,)
> OLD and NEW values for generated columns don't show up. Am I missing
> something or they should be available?

This was already discussed a few times in the thread.  I don't know what
a good solution is.

I have in this patch added facilties to handle this better in other PLs.
 So the virtual generated column doesn't show up there in the trigger
data.  This is possible because we copy the trigger data from the
internal structures into language-specific hashes/dictionaries/etc.

In PL/pgSQL, this is a bit more difficult, because we handle the table's
row type there.  We can't just "hide" the generated column when looking
at the row type.  Actually, we could do it quite easily, but that would
probably raise other weirdnesses.

This also raises a question how a row type with generated columns should
behave.  I think a generation expression is a property of a table, so it
does not apply in a row type.  (Just like a default is a property of a
table and does not apply in row types.)

> Please use brackers here if you use a comment in the if() block...
> [/nit_mode]

done

> COPY as you are proposing looks sensible to me. I am not sure about any
> options though as it is possible to enforce the selection of generated
> columns using COPY (SELECT).

removed that comment

> Per my tests, generated columns can work with column system attributes
> (xmax, xmin, etc.). Some tests could be added.

Hard to test that, because the results would be nondeterministic.

> -    if (tab->relkind == RELKIND_RELATION ||
> -        tab->relkind == RELKIND_PARTITIONED_TABLE)
> +    if ((tab->relkind == RELKIND_RELATION ||
> +         tab->relkind == RELKIND_PARTITIONED_TABLE) &&
> +        get_attgenerated(RelationGetRelid(rel), attnum) != ATTRIBUTE_GENERATE
> I think that you should store the result of get_attgenerated() and reuse
> it multiple times.

I don't see where that would apply.  I think the hunks you are seeing
belong to different functions.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] generated columns

From
Robert Haas
Date:
On Thu, Jan 25, 2018 at 10:26 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>> Does the SQL spec mention the matter? How do other systems
>> handle such cases?
>
> In Oracle you get the same overflow error.

That seems awful.  If a user says "SELECT * FROM tab" and it fails,
how are they supposed to recover, or even understand what the problem
is?  I think we should really try to at least generate an errcontext
here:

ERROR:  integer out of range
CONTEXT: while generating virtual column "b"

And maybe a hint, too, like "try excluding this column".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 1/26/18 12:46, Robert Haas wrote:
> On Thu, Jan 25, 2018 at 10:26 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>> Does the SQL spec mention the matter? How do other systems
>>> handle such cases?
>>
>> In Oracle you get the same overflow error.
> 
> That seems awful.  If a user says "SELECT * FROM tab" and it fails,
> how are they supposed to recover, or even understand what the problem
> is?  I think we should really try to at least generate an errcontext
> here:
> 
> ERROR:  integer out of range
> CONTEXT: while generating virtual column "b"
> 
> And maybe a hint, too, like "try excluding this column".

This is expanded in the rewriter, so there is no context like that.
This is exactly how views work, e.g.,

create table t1 (id int, length int);
create view v1 as select id, length * 1000000000 as length_in_nanometers
from t1;
insert into t1 values (1, 5);
select * from v1;
ERROR:  integer out of range

I think this is not a problem in practice.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Sat, Jan 27, 2018 at 05:05:14PM -0500, Peter Eisentraut wrote:
> This is expanded in the rewriter, so there is no context like that.
> This is exactly how views work, e.g.,
>
> create table t1 (id int, length int);
> create view v1 as select id, length * 1000000000 as length_in_nanometers
> from t1;
> insert into t1 values (1, 5);
> select * from v1;
> ERROR:  integer out of range
>
> I think this is not a problem in practice.

Yeah, I tend to have the same opinion while doing a second pass on the
patch proposed on this thread.  You could more context when using STORED
columns, but for VIRTUAL that does not make such sense as the handling
of values is close to views.  That's the same handling for materialized
views as well, you don't get any context when facing an overflow when
either creating the matview or refreshing it.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Thu, Jan 25, 2018 at 10:26:55PM -0500, Peter Eisentraut wrote:
> On 1/19/18 00:18, Michael Paquier wrote:
>> +SELECT a, c FROM gtest12;  -- FIXME: should be allowed
>> +ERROR:  permission denied for function gf1
>
> This is quite hard to fix and I would like to leave this for a future
> release.

I have been looking at that case more closely again, and on the contrary
I would advocate that your patch is doing the *right* thing.  In short,
if the generation expression uses a function and the user has only been
granted access to read the values, it seems to me that it we should
require that this user also has the right to execute the function. Would
that be too user-unfriendly?  I think that this could avoid mistakes
about giving access to unwanted functions when willing to just give a
SELECT right as the function could be doing more operations.

>> +-- domains
>> +CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10);
>> +CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED
>> ALWAYS AS (a * 2));  -- prohibited
>> +ERROR:  virtual generated column "b" cannot have a domain type
>> CHECK constraints can be used, so I find this restriction confusing.
>
> We currently don't have infrastructure to support this.  We run all
> CHECK constraints whenever a row is changed, so that is easy.  But we
> don't have facilities to recheck the domain constraint in column b when
> column a is updated.  This could be done, but it's extra work.

Okay, let's leave with this restriction for now.

>> OLD and NEW values for generated columns don't show up. Am I missing
>> something or they should be available?
>
> This was already discussed a few times in the thread.  I don't know what
> a good solution is.
>
> I have in this patch added facilties to handle this better in other PLs.
>  So the virtual generated column doesn't show up there in the trigger
> data.  This is possible because we copy the trigger data from the
> internal structures into language-specific hashes/dictionaries/etc.
>
> In PL/pgSQL, this is a bit more difficult, because we handle the table's
> row type there.  We can't just "hide" the generated column when looking
> at the row type.  Actually, we could do it quite easily, but that would
> probably raise other weirdnesses.
>
> This also raises a question how a row type with generated columns should
> behave.  I think a generation expression is a property of a table, so it
> does not apply in a row type.  (Just like a default is a property of a
> table and does not apply in row types.)

Hm.  Identity columns and default columns are part of rowtypes. STORED
columns can alsobe part of a row type with little effort, so in order to
be consistent with the all the existing behaviors, it seems to me that
virtually-generated columns should be part of it as well.  I have
compiled in the attached SQL file how things behave with your
patch.  Only virtually-generated columns show a blank value.

A empty value is unhelpful for the user, which brings a couple of
possible approaches:
1) Make virtual columns part of a row type, which would make it
consistent with all the other types.
2) For plpgsql, if all rows from OLD or NEW are requested, then print all
columns except the ones virtually-generated. If a virtual column is
directly requested, then issue an error.

I would warmly welcome 1) as this would make all behaviors consistent
with the other PLs and other types of generated columns.  I would
imagine that users would find weird the current inconsistency as well.

>> Per my tests, generated columns can work with column system attributes
>> (xmax, xmin, etc.). Some tests could be added.
>
> Hard to test that, because the results would be nondeterministic.

tableoid can be deterministic if compared with data from pg_class:
=# CREATE TABLE aa (a int PRIMARY KEY, b int GENERATED ALWAYS AS (tableoid));
CREATE TABLE
=# INSERT INTO aa VALUES (1);
INSERT
=# SELECT aa.a from pg_class c, aa where c.oid = b;
 a
---
 1
(1 row)

The point is just to stress code paths where the attribute number is
negative.

>> -    if (tab->relkind == RELKIND_RELATION ||
>> -        tab->relkind == RELKIND_PARTITIONED_TABLE)
>> +    if ((tab->relkind == RELKIND_RELATION ||
>> +         tab->relkind == RELKIND_PARTITIONED_TABLE) &&
>> +        get_attgenerated(RelationGetRelid(rel), attnum) != ATTRIBUTE_GENERATE
>> I think that you should store the result of get_attgenerated() and reuse
>> it multiple times.
>
> I don't see where that would apply.  I think the hunks you are seeing
> belong to different functions.

Looks like I messed up ATPrepAlterColumnType() and ATExecColumnDefault()
while reading the previous version.  Sorry for the useless noise.

+    /*
+     * Foreign keys on generated columns are not yet implemented.
+     */
+    for (i = 0; i < numpks; i++)
+    {
+        if (get_attgenerated(RelationGetRelid(pkrel), pkattnum[i]))
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("foreign key constraints referencing generated columns are not supported")));
+    }
It would be nice to test this code path.

The commit fest is ending, perhaps this should be moved to the next one?
The handling of row types for virtual columns is a must-fix in my
opinion.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Wed, Jan 31, 2018 at 10:18:04PM +0900, Michael Paquier wrote:
> On Thu, Jan 25, 2018 at 10:26:55PM -0500, Peter Eisentraut wrote:
>> On 1/19/18 00:18, Michael Paquier wrote:
>>> +SELECT a, c FROM gtest12;  -- FIXME: should be allowed
>>> +ERROR:  permission denied for function gf1
>>
>> This is quite hard to fix and I would like to leave this for a future
>> release.
>
> I have been looking at that case more closely again, and on the contrary
> I would advocate that your patch is doing the *right* thing.  In short,
> if the generation expression uses a function and the user has only been
> granted access to read the values, it seems to me that it we should
> require that this user also has the right to execute the function. Would
> that be too user-unfriendly?  I think that this could avoid mistakes
> about giving access to unwanted functions when willing to just give a
> SELECT right as the function could be doing more operations.

Attached is the SQL file I used with test cases for the review.  I
forgot to attach it yesterday.

> Hm.  Identity columns and default columns are part of rowtypes. STORED
> columns can alsobe part of a row type with little effort, so in order to
> be consistent with the all the existing behaviors, it seems to me that
> virtually-generated columns should be part of it as well.  I have
> compiled in the attached SQL file how things behave with your
> patch.  Only virtually-generated columns show a blank value.

The tests used are attached.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 1/31/18 08:18, Michael Paquier wrote:
>> This also raises a question how a row type with generated columns should
>> behave.  I think a generation expression is a property of a table, so it
>> does not apply in a row type.  (Just like a default is a property of a
>> table and does not apply in row types.)
> 
> Hm.  Identity columns and default columns are part of rowtypes. STORED
> columns can alsobe part of a row type with little effort, so in order to
> be consistent with the all the existing behaviors, it seems to me that
> virtually-generated columns should be part of it as well.  I have
> compiled in the attached SQL file how things behave with your
> patch.  Only virtually-generated columns show a blank value.
> 
> A empty value is unhelpful for the user, which brings a couple of
> possible approaches:
> 1) Make virtual columns part of a row type, which would make it
> consistent with all the other types.

That would be nice.  I'm going to study this some more to see what can
be done.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote:
> That would be nice.  I'm going to study this some more to see what can
> be done.

Thanks for the update.  Note: Peter has moved the patch to next CF.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote:
> That would be nice.  I'm going to study this some more to see what can
> be done.

By the way, cannot we consider just doing stored generated columns as a
first cut?  Both virtual and stored columns have their use cases, but
stored values have less complication and support actually a larger set
of features, including rowtypes, index and constraint support.  So it
seems to me that if something goes into v11 then stored columns would be
a better choice at this stage of the development cycle.  Other DBMSs
support stored values by default as well, and your v1 patch had a large
portion of the work done if I recall correctly.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2/1/18 21:25, Michael Paquier wrote:
> On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote:
>> That would be nice.  I'm going to study this some more to see what can
>> be done.
> 
> Thanks for the update.  Note: Peter has moved the patch to next CF.

I didn't get to updating this patch, so I'm closing it in this CF.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 05/03/2018 20:46, Peter Eisentraut wrote:
> On 2/1/18 21:25, Michael Paquier wrote:
>> On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote:
>>> That would be nice.  I'm going to study this some more to see what can
>>> be done.
>>
>> Thanks for the update.  Note: Peter has moved the patch to next CF.
> 
> I didn't get to updating this patch, so I'm closing it in this CF.

Attached is a new version of this patch.

Old news:

This is a well-known SQL-standard feature, also available for instance
in DB2, MySQL, Oracle.  A quick example:

  CREATE TABLE t1 (
    ...,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm * 2.54)
  );

It supports both computed-on-write and computed-on-read variants, using
the keywords STORED and VIRTUAL respectively.

New news:

Everything works more or less.  Both STORED and VIRTUAL fully work now.
I've done some refactoring to reduce the surface area of the patch.  I
also added some caching in the tuple descriptor's "const" area to avoid
some performance overhead if no generated columns are used.

There are some open questions about which I'll start separate subthreads
for discussion.

One thing I'd like reviewed now is the catalog representation.  There
are a couple of possible options, but changing them would have fairly
deep code impact so it would help to get that settled soon.

The general idea is that a generation expression is similar to a
default, just applied at different times.  So the actual generation
expression is stored in pg_attrdef.  The actual question is the
representation in pg_attribute.  Options:

1. (current implementation) New column attgenerated contains 's' for
STORED, 'v' for VIRTUAL, '\0' for nothing.  atthasdef means "there is
something in pg_attrdef for this column".  So a generated column would
have atthasdef = true, and attgenerated = s/v.  A traditional default
would have atthasdef = true and attgenerated = '\0'.  The advantage is
that this is easiest to implement and the internal representation is the
most useful and straightforward.  The disadvantage is that old client
code that wants to detect whether a column has a default would need to
be changed (otherwise it would interpret a generated column as having a
default value instead).

2. Alternative: A generated column has attgenerated = s/v but atthasdef
= false, so that atthasdef means specifically "column has a default".
Then a column would have a pg_attrdef entry for either attgenerated !=
'\0' or atthasdef = true.  (Both couldn't be the case at the same time.)
 The advantage is that client code wouldn't need to be changed.  But
it's also possible that there is client code that just does a left join
of pg_attribute and pg_attrdef without looking at atthasdef, so that
would still be broken.  The disadvantage is that the internal
implementation would get considerably ugly.  Most notably, the tuple
descriptor would probably still look like #1, so there would have to be
a conversion somewhere between variant #1 and #2.  Or we'd have to
duplicate all the tuple descriptor access code to keep that separate.
There would be a lot of redundancy.

3. Radical alternative: Collapse everything into one new column.  We
could combine atthasdef and attgenerated and even attidentity into a new
column.  (Only one of the three can be the case.)  This would give
client code a clean break, which may or may not be good.  The
implementation would be uglier than #1 but probably cleaner than #2.  We
could also get 4 bytes back per pg_attribute row.

I'm happy with the current choice #1, but it's worth thinking about.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] generated columns

From
Simon Riggs
Date:
On Wed, 27 Dec 2017 at 17:31, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 9/12/17 15:35, Jaime Casanova wrote:
> On 10 September 2017 at 00:08, Jaime Casanova
> <jaime.casanova@2ndquadrant.com> wrote:
>>
>> During my own tests, though, i found some problems:

Here is an updated patch that should address the problems you have found.

> also is interesting that in triggers, both before and after, the
> column has a null. that seems reasonable in a before trigger but not
> in an after trigger

Logically, you are correct.  But it seems excessive to compute all
virtual columns for every trigger.  I don't know how to consolidate
that, especially with the current trigger API that lets
you look more or less directly into the tuple.

I wasn't sure where this thought about after triggers ended up.

Presumably stored values can just be read from storage, so no overhead in after triggers?

Having the stored values show as NULL would be OK for virtual ones. But what do we do if the column is NOT NULL? Do we still have nulls then?

It would be useful to have a way to generate the values, if desired. Not sure how hard that is.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Erik Rijkers
Date:
On 2018-10-30 09:35, Peter Eisentraut wrote:

> [v5-0001-Generated-columns.patch ]

Hi,

I couldn't get this to apply to current head.

I tried:

patch --dry-run --ignore-whitespace -p 0 -F 5 < 
v5-0001-Generated-columns.patch

and varied both -p and -F paramaters to no avail. Am I doing it wrong?


------- 8< -------
$ patch --ignore-whitespace -p 0 -F 5 < v5-0001-Generated-columns.patch
(Stripping trailing CRs from patch; use --binary to disable.)
can't find file to patch at input line 81
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|From dae07c731d80021bf78b8d89a8eb14408dbd023a Mon Sep 17 00:00:00 2001
|From: Peter Eisentraut <peter_e@gmx.net>
|Date: Mon, 29 Oct 2018 17:46:12 +0100
|Subject: [PATCH v5] Generated columns
[...]
| src/test/regress/parallel_schedule            |   2 +-
| src/test/regress/serial_schedule              |   1 +
| src/test/regress/sql/create_table_like.sql    |  14 +
| src/test/regress/sql/generated.sql            | 408 ++++++++++
| 60 files changed, 2731 insertions(+), 92 deletions(-)
| create mode 100644 src/test/regress/expected/generated.out
| create mode 100644 src/test/regress/sql/generated.sql
|
|diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
|index 9edba96fab..567913c3b6 100644
|--- a/doc/src/sgml/catalogs.sgml
|+++ b/doc/src/sgml/catalogs.sgml
--------------------------
File to patch:
------- 8< -------


Thanks,

Erik Rijkes


Re: [HACKERS] generated columns

From
Sergei Kornilov
Date:
Hi

> patch --dry-run --ignore-whitespace -p 0 -F 5 <
> v5-0001-Generated-columns.patch
>
> and varied both -p and -F paramaters to no avail. Am I doing it wrong?
I am able apply patch by command
patch -p1 < v5-0001-Generated-columns.patch
or by "git apply v5-0001-Generated-columns.patch", but only till commit d5eec4eefde70414c9929b32c411cb4f0900a2a9 (Add
pg_partition_treeto display information about partitions)
 

Unfortunately patch does not applied to current HEAD. Cfbot noticed this too:
http://cfbot.cputube.org/patch_20_1443.log

regards, Sergei


Re: [HACKERS] generated columns

From
Sergei Kornilov
Date:
Hi

I applied this patch on top 2fe42baf7c1ad96b5f9eb898161e258315298351 commit and found a bug while adding STORED
column:

postgres=# create table test(i int);
CREATE TABLE
postgres=# insert into test values (1),(2);
INSERT 0 2
postgres=# alter table test add column gen_stored integer GENERATED ALWAYS AS ((i * 2)) STORED;
ALTER TABLE
postgres=# alter table test add column gen_virt integer GENERATED ALWAYS AS ((i * 2));
ALTER TABLE
postgres=# table test;
 i | gen_stored | gen_virt 
---+------------+----------
 1 |            |        2
 2 |            |        4

Virtual columns was calculated on table read and its ok, but stored column does not update table data.

regards, Sergei


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 30/10/2018 15:19, Erik Rijkers wrote:
> On 2018-10-30 09:35, Peter Eisentraut wrote:
> 
>> [v5-0001-Generated-columns.patch ]
> 
> Hi,
> 
> I couldn't get this to apply to current head.
> 
> I tried:
> 
> patch --dry-run --ignore-whitespace -p 0 -F 5 < 
> v5-0001-Generated-columns.patch
> 
> and varied both -p and -F paramaters to no avail. Am I doing it wrong?

You need -p1.

Or use git apply or git am.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 30/10/2018 15:39, Sergei Kornilov wrote:
>> patch --dry-run --ignore-whitespace -p 0 -F 5 <
>> v5-0001-Generated-columns.patch
>>
>> and varied both -p and -F paramaters to no avail. Am I doing it wrong?
> I am able apply patch by command
> patch -p1 < v5-0001-Generated-columns.patch
> or by "git apply v5-0001-Generated-columns.patch", but only till commit d5eec4eefde70414c9929b32c411cb4f0900a2a9 (Add
pg_partition_treeto display information about partitions)
 
> 
> Unfortunately patch does not applied to current HEAD. Cfbot noticed this too:
http://cfbot.cputube.org/patch_20_1443.log

Here's another one.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Tue, Oct 30, 2018 at 09:35:18AM +0100, Peter Eisentraut wrote:
> Attached is a new version of this patch.

Thanks Peter for sending a new patch.  I am still assigned as a
reviewer, and still plan to look at it in more details.

> It supports both computed-on-write and computed-on-read variants, using
> the keywords STORED and VIRTUAL respectively.

It is actually good to see that you are tackling both problems now.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Erikjan Rijkers
Date:
On 2018-10-30 16:14, Sergei Kornilov wrote:
> Hi
> 
> I applied this patch on top 2fe42baf7c1ad96b5f9eb898161e258315298351
> commit and found a bug while adding STORED column:
> 
> postgres=# create table test(i int);
> CREATE TABLE
> postgres=# insert into test values (1),(2);
> INSERT 0 2
> postgres=# alter table test add column gen_stored integer GENERATED
> ALWAYS AS ((i * 2)) STORED;
> ALTER TABLE
> postgres=# alter table test add column gen_virt integer GENERATED
> ALWAYS AS ((i * 2));
> ALTER TABLE
> postgres=# table test;
>  i | gen_stored | gen_virt
> ---+------------+----------
>  1 |            |        2
>  2 |            |        4
> 
> Virtual columns was calculated on table read and its ok, but stored
> column does not update table data.

This workaround is possible:

update test set i = i where gen_stored is null returning *;
  i | gen_stored | gen_virt
---+------------+----------
  1 |          2 |        2
  2 |          4 |        4
(2 rows)

table test ;
  i | gen_stored | gen_virt
---+------------+----------
  3 |          6 |        6
  4 |          8 |        8
  1 |          2 |        2
  2 |          4 |        4
(4 rows)


Hm, well, I suppose it's still a bug...


I have also noticed that logical replication isn't possible on tables 
with a generated column.  That's a shame but I suppsoe that is as 
expected.


Erik Rijkers

> 
> regards, Sergei


Re: [HACKERS] generated columns

From
Simon Riggs
Date:
On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers <er@xs4all.nl> wrote:
 
I have also noticed that logical replication isn't possible on tables
with a generated column.  That's a shame but I suppsoe that is as
expected.

Couldn't see anything like that in the patch. Presumably unintended consequence. The generated value needs to be in WAL, so decoding it should be trivial.

Virtual columns wouldn't need to be replicated.

I guess we might choose to replicate generated cols as a value, or leave them out and let them be generated on the downstream side. The default should be to just treat them as a value.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Erik Rijkers
Date:
On 2018-10-31 09:15, Simon Riggs wrote:
> On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers <er@xs4all.nl> wrote:
> 
> 
>> I have also noticed that logical replication isn't possible on tables
>> with a generated column.  That's a shame but I suppsoe that is as
>> expected.
>> 
> 
> Couldn't see anything like that in the patch. Presumably unintended
> consequence. The generated value needs to be in WAL, so decoding it 
> should
> be trivial.
> 

These log messages occur on attempting at logical replication:

( table t1 has no generated columns; replicates fine.
   table t2 has one generated column; replication fails: see below )

LOG:  database system is ready to accept connections
LOG:  logical replication apply worker for subscription "sub1" has 
started
LOG:  logical replication table synchronization worker for subscription 
"sub1", table "t1" has started
LOG:  logical replication table synchronization worker for subscription 
"sub1", table "t2" has started
LOG:  logical replication table synchronization worker for subscription 
"sub1", table "t1" has finished
ERROR:  column "i2" is a generated column
DETAIL:  Generated columns cannot be used in COPY.
LOG:  background worker "logical replication worker" (PID 22252) exited 
with exit code 1



> Virtual columns wouldn't need to be replicated.
> 
> I guess we might choose to replicate generated cols as a value, or 
> leave
> them out and let them be generated on the downstream side. The default
> should be to just treat them as a value.
> 
> --
> Simon Riggs                http://www.2ndQuadrant.com/
> <http://www.2ndquadrant.com/>
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Simon Riggs
Date:
On Wed, 31 Oct 2018 at 08:29, Erik Rijkers <er@xs4all.nl> wrote:
On 2018-10-31 09:15, Simon Riggs wrote:
> On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers <er@xs4all.nl> wrote:
>
>
>> I have also noticed that logical replication isn't possible on tables
>> with a generated column.  That's a shame but I suppsoe that is as
>> expected.
>>
>
> Couldn't see anything like that in the patch. Presumably unintended
> consequence. The generated value needs to be in WAL, so decoding it
> should
> be trivial.
>

These log messages occur on attempting at logical replication:

( table t1 has no generated columns; replicates fine.
   table t2 has one generated column; replication fails: see below )

LOG:  database system is ready to accept connections
LOG:  logical replication apply worker for subscription "sub1" has
started
LOG:  logical replication table synchronization worker for subscription
"sub1", table "t1" has started
LOG:  logical replication table synchronization worker for subscription
"sub1", table "t2" has started
LOG:  logical replication table synchronization worker for subscription
"sub1", table "t1" has finished
ERROR:  column "i2" is a generated column
DETAIL:  Generated columns cannot be used in COPY.
LOG:  background worker "logical replication worker" (PID 22252) exited
with exit code 1

OK, so the problem is COPY.

Which means we have an issue with restore. We need to be able to pg_dump a table with generated columns, then restore it afterwards. More generally, we need to be able to handle data that has already been generated - the "generate" idea should apply to new data not existing data.

Sounds like we need to do an ALTER TABLE ... GENERATE ALWAYS after the table has been re-created and re-loaded, so that both logical replication and dump/restore would work.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Sergei Kornilov
Date:
Hi

> OK, so the problem is COPY.
>
> Which means we have an issue with restore. We need to be able to pg_dump a table with generated columns, then restore
itafterwards. More generally, we need to be able to handle data that has already been generated - the "generate" idea
shouldapply to new data not existing data.
 
pg_dump was fixed in published patch (i check this yesterday). It does not COPY generated columns values, only
non-generated.

regards, Sergei


Re: [HACKERS] generated columns

From
John Naylor
Date:
On 10/30/18, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> 3. Radical alternative: Collapse everything into one new column.  We
> could combine atthasdef and attgenerated and even attidentity into a new
> column.  (Only one of the three can be the case.)  This would give
> client code a clean break, which may or may not be good.  The
> implementation would be uglier than #1 but probably cleaner than #2.  We
> could also get 4 bytes back per pg_attribute row.

Thinking about the distinction between 'stored' and 'virtual', I
immediately thought of atthasmissing. In a way it indicates that there
is a virtual default value. It seems the level of materialization is
an orthogonal concept to the kind of value we have. What if
attgenerated had

d = normal default value
i = identity default
a = identity always
c = generated column

and in addition there was an attmaterialized column with

s = stored
v = virtual

In this scheme,
-Normal attribute: '\0' + 's'
-Default value: 'd' + 's'
-Fast default: 'd' + 'v' until the table is rewritten
-Identity column: 'i'/'a' + 's'
-Generated column: 'c' + 's'/'v'

This way, we'd still end up with 1 fewer column (2 new ones minus
atthasdef, attidentity, and atthasmissing).

A bit crazier, what if "d = dropped" was another allowed value in
attmaterialized -- we could then get rid of attisdropped as well. That
has obvious disadvantages, but the broader idea is that this design
may have use cases we haven't thought of yet.

Thoughts?

-John Naylor


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 30/10/2018 16:14, Sergei Kornilov wrote:
> Hi
> 
> I applied this patch on top 2fe42baf7c1ad96b5f9eb898161e258315298351 commit and found a bug while adding STORED
column:
> 
> postgres=# create table test(i int);
> CREATE TABLE
> postgres=# insert into test values (1),(2);
> INSERT 0 2
> postgres=# alter table test add column gen_stored integer GENERATED ALWAYS AS ((i * 2)) STORED;
> ALTER TABLE
> postgres=# alter table test add column gen_virt integer GENERATED ALWAYS AS ((i * 2));
> ALTER TABLE
> postgres=# table test;
>  i | gen_stored | gen_virt 
> ---+------------+----------
>  1 |            |        2
>  2 |            |        4
> 
> Virtual columns was calculated on table read and its ok, but stored column does not update table data.

This is a small bug that I will fix in my next update.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 31/10/2018 08:58, Erikjan Rijkers wrote:
> I have also noticed that logical replication isn't possible on tables 
> with a generated column.  That's a shame but I suppsoe that is as 
> expected.

This is an issue we need to discuss.  How should this work?

The simplest solution would be to exclude generated columns from the
replication stream altogether.

Similar considerations also apply to foreign tables.  What is the
meaning of a stored generated column on a foreign table?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Simon Riggs
Date:
On Tue, 6 Nov 2018 at 04:31, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 31/10/2018 08:58, Erikjan Rijkers wrote:
> I have also noticed that logical replication isn't possible on tables
> with a generated column.  That's a shame but I suppsoe that is as
> expected.

This is an issue we need to discuss.  How should this work?

The simplest solution would be to exclude generated columns from the
replication stream altogether.

IMHO...

Virtual generated columns need not be WAL-logged, or sent.

Stored generated columns should be treated just like we'd treat a column value added by a trigger.

e.g. if we had a Timestamp column called LastUpdateTimestamp we would want to send that value
 
Similar considerations also apply to foreign tables.  What is the
meaning of a stored generated column on a foreign table?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 06/11/2018 14:28, Simon Riggs wrote:
>     The simplest solution would be to exclude generated columns from the
>     replication stream altogether.
> 
> 
> IMHO...
> 
> Virtual generated columns need not be WAL-logged, or sent.

right

> Stored generated columns should be treated just like we'd treat a column
> value added by a trigger.
> 
> e.g. if we had a Timestamp column called LastUpdateTimestamp we would
> want to send that value

Generated columns cannot have volatile expression results in them, so
this case cannot happen.

Also, we don't know whether the generation expression on the target is
the same (or even if it looks the same, consider locale issues etc.), so
we need to recompute the generated columns on the target anyway, so it's
pointless to send the already computed stored values.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Simon Riggs
Date:
On Tue, 6 Nov 2018 at 13:16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
 
> Stored generated columns should be treated just like we'd treat a column
> value added by a trigger.
>
> e.g. if we had a Timestamp column called LastUpdateTimestamp we would
> want to send that value

Generated columns cannot have volatile expression results in them, so
this case cannot happen.

Also, we don't know whether the generation expression on the target is
the same (or even if it looks the same, consider locale issues etc.), so
we need to recompute the generated columns on the target anyway, so it's
pointless to send the already computed stored values.

Makes sense. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Tue, Oct 30, 2018 at 09:35:18AM +0100, Peter Eisentraut wrote:
> 1. (current implementation) New column attgenerated contains 's' for
> STORED, 'v' for VIRTUAL, '\0' for nothing.  atthasdef means "there is
> something in pg_attrdef for this column".
>
> 2. Alternative: A generated column has attgenerated = s/v but atthasdef
> = false, so that atthasdef means specifically "column has a default".
> Then a column would have a pg_attrdef entry for either attgenerated !=
> '\0' or atthasdef = true.
>
> 3. Radical alternative: Collapse everything into one new column.  We
> could combine atthasdef and attgenerated and even attidentity into a new
> column.  (Only one of the three can be the case.)  This would give
> client code a clean break, which may or may not be good.  The
> implementation would be uglier than #1 but probably cleaner than #2.  We
> could also get 4 bytes back per pg_attribute row.
>
> I'm happy with the current choice #1, but it's worth thinking about.

#3 looks very appealing in my opinion as those columns have no overlap,
so it would take five possible values:
- generated always
- generated by default
- default value
- stored expression
- virtual expression

Obviously this requires a first patch to combine the catalog
representation of the existing columns atthasdef and attidentity.

+                   ereport(ERROR,
+                           (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                            errmsg("generated colums are not supported
on typed tables")));
s/colums/columns/.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Robert Haas
Date:
On Tue, Oct 30, 2018 at 4:35 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> 1. (current implementation) New column attgenerated contains 's' for
> STORED, 'v' for VIRTUAL, '\0' for nothing.  atthasdef means "there is
> something in pg_attrdef for this column".  So a generated column would
> have atthasdef = true, and attgenerated = s/v.  A traditional default
> would have atthasdef = true and attgenerated = '\0'.  The advantage is
> that this is easiest to implement and the internal representation is the
> most useful and straightforward.  The disadvantage is that old client
> code that wants to detect whether a column has a default would need to
> be changed (otherwise it would interpret a generated column as having a
> default value instead).
>
> 2. Alternative: A generated column has attgenerated = s/v but atthasdef
> = false, so that atthasdef means specifically "column has a default".
> Then a column would have a pg_attrdef entry for either attgenerated !=
> '\0' or atthasdef = true.  (Both couldn't be the case at the same time.)
>  The advantage is that client code wouldn't need to be changed.  But
> it's also possible that there is client code that just does a left join
> of pg_attribute and pg_attrdef without looking at atthasdef, so that
> would still be broken.  The disadvantage is that the internal
> implementation would get considerably ugly.  Most notably, the tuple
> descriptor would probably still look like #1, so there would have to be
> a conversion somewhere between variant #1 and #2.  Or we'd have to
> duplicate all the tuple descriptor access code to keep that separate.
> There would be a lot of redundancy.
>
> 3. Radical alternative: Collapse everything into one new column.  We
> could combine atthasdef and attgenerated and even attidentity into a new
> column.  (Only one of the three can be the case.)  This would give
> client code a clean break, which may or may not be good.  The
> implementation would be uglier than #1 but probably cleaner than #2.  We
> could also get 4 bytes back per pg_attribute row.
>
> I'm happy with the current choice #1, but it's worth thinking about.

I don't have a strong position on 1 vs. 2 vs. 3, but I do think it
would be nicer not to use '\0' as a column value.  I'd suggest you use
'n' or '0' or '-' or some other printable character instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] generated columns

From
Corey Huinker
Date:
> 3. Radical alternative: Collapse everything into one new column.  We
> could combine atthasdef and attgenerated and even attidentity into a new
> column.  (Only one of the three can be the case.)  This would give
> client code a clean break, which may or may not be good.  The
> implementation would be uglier than #1 but probably cleaner than #2.  We
> could also get 4 bytes back per pg_attribute row.
>
> I'm happy with the current choice #1, but it's worth thinking about.

#3 looks very appealing in my opinion as those columns have no overlap,
so it would take five possible values:

Could the removed columns live on...as generated-always columns?

Re: [HACKERS] generated columns

From
Alvaro Herrera
Date:
Disclaimer:  I had never seen this patch before.  I did not participate
in this feature design.  I did not discuss this review with the author
or anybody in 2ndQuadrant.  I do not have any particular affective bonds
with its author.  I did not receive payment nor goods in exchange for
this review.  I encourage others to review this patch, and all other
patches in the current commitfest and all future commitfests.


Now that the TupleTableSlot work has landed, the API of
ExecComputeStoredGenerated is clearly inadequate.  This should be
adjusted to work with the slot only, and not generate a heap tuple at
all -- if the calling code needs the heap tuple, have that code generate
that from the slot.  (Example problem: ExecConstraints runs using the
slot, not the heap tuple.)

The pg_dump tests verify a virtual generated column, but not a virtual
stored column.  It'd be good to have one for the latter.  The tables in
new test "generated" appear not to be dropped, which is good to test
pg_upgrade as well as pg_dump; I'd add a comment that this is on
purpose, lest someone else add DROP lines there later.  I think some
tests for logical replication would be good as well.

It's unclear why you made generated columns on partitions unsupported.
I'd fix the limitation if possible, but if not, at least document it.
(I particularly notice that partition key is marked as unsupported in
the regression test.  Consider partitioning on a SERIAL column, this is
clearly something that users will expect to work.)

About your catalog representation question:

On 2018-Oct-30, Peter Eisentraut wrote:

> 1. (current implementation) New column attgenerated contains 's' for
> STORED, 'v' for VIRTUAL, '\0' for nothing.  atthasdef means "there is
> something in pg_attrdef for this column".  So a generated column would
> have atthasdef = true, and attgenerated = s/v.  A traditional default
> would have atthasdef = true and attgenerated = '\0'.  The advantage is
> that this is easiest to implement and the internal representation is the
> most useful and straightforward.  The disadvantage is that old client
> code that wants to detect whether a column has a default would need to
> be changed (otherwise it would interpret a generated column as having a
> default value instead).

I think this is a reasonable implementation.  Client code that is
confused by this will obviously have to be updated, but if it isn't, the
bug is not serious.  I would clearly not go to the extreme trouble that
#2 poses just to avoid this problem.

That said, I think your "radical alternative" #3 is better.  Maybe we
want to avoid multiple compatibility breaks, so we'd go with 3 for pg12
instead of doing 1 now and changing it again later.

Like Robert, I would use something other than '\0' anyhow.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 15/11/2018 15:10, Robert Haas wrote:
> I don't have a strong position on 1 vs. 2 vs. 3, but I do think it
> would be nicer not to use '\0' as a column value.  I'd suggest you use
> 'n' or '0' or '-' or some other printable character instead.

I had carefully considered this when attidentity was added.  Using '\0'
allows you to use this column as a boolean in C code, which is often
convenient.  Also, there are numerous places where a pg_attribute form
or a tuple descriptor is initialized to all zeroes, which works well for
most fields, and adding one exception like this would create a lot of
extra work and bloat the patch and create potential for future
instability.  Also note that a C char '\0' is represented as '' (empty
string) in SQL, so this also creates a natural representation in SQL.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 19/11/2018 19:54, Alvaro Herrera wrote:
> It's unclear why you made generated columns on partitions unsupported.
> I'd fix the limitation if possible, but if not, at least document it.

This is explained here:

+       /*
+        * Generated columns in partition key expressions:
+        *
+        * - Stored generated columns cannot work: They are computed
+        *   after BEFORE triggers, but partition routing is done
+        *   before all triggers.
+        *
+        * - Virtual generated columns could work.  But there is a
+        *   problem when dropping such a table: Dropping a table
+        *   calls relation_open(), which causes partition keys to be
+        *   constructed for the partcache, but at that point the
+        *   generation expression is already deleted (through
+        *   dependencies), so this will fail.  So if you remove the
+        *   restriction below, things will appear to work, but you
+        *   can't drop the table. :-(
+        */

> (I particularly notice that partition key is marked as unsupported in
> the regression test.  Consider partitioning on a SERIAL column, this is
> clearly something that users will expect to work.)

A serial column is not a generated column.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 06/11/2018 13:27, Peter Eisentraut wrote:
> This is a small bug that I will fix in my next update.

Time for another update.  Lot's of rebasing, many things fixed,
including the ADD COLUMN bug you found, replication, foreign tables,
better caching, some corner cases in trigger behavior, more
documentation.  This addresses everything I've had in my notes, so it's
functionally and logically complete from my perspective.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] generated columns

From
Pavel Stehule
Date:
Hi

pá 11. 1. 2019 v 9:31 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
On 06/11/2018 13:27, Peter Eisentraut wrote:
> This is a small bug that I will fix in my next update.

Time for another update.  Lot's of rebasing, many things fixed,
including the ADD COLUMN bug you found, replication, foreign tables,
better caching, some corner cases in trigger behavior, more
documentation.  This addresses everything I've had in my notes, so it's
functionally and logically complete from my perspective.

I am looking on this patch - it is great feature.

The documentation contains paragraph

+      The generation expression can only use immutable functions and cannot
+      use subqueries or reference anything other than the current row in any
+      way.

It is necessary for stored columns?

I tested it with pseudo constant - current_timestamp, session_user. But current_database() is disallowed.

on second hand, this is strange

postgres=# create table foo3 (inserted text generated always as (current_timestamp) virtual);
CREATE TABLE

Regards

Pavel

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 11/01/2019 16:22, Pavel Stehule wrote:
> The documentation contains paragraph
> 
> +      The generation expression can only use immutable functions and cannot
> +      use subqueries or reference anything other than the current row
> in any
> +      way.
> 
> It is necessary for stored columns?

See here:
https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f914ca@2ndquadrant.com

> I tested it with pseudo constant - current_timestamp, session_user. But
> current_database() is disallowed.
> 
> on second hand, this is strange
> 
> postgres=# create table foo3 (inserted text generated always as
> (current_timestamp) virtual);
> CREATE TABLE

Ah, the volatility checking needs some improvements.  I'll address that
in the next patch version.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Pavel Stehule
Date:


ne 13. 1. 2019 v 10:43 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
On 11/01/2019 16:22, Pavel Stehule wrote:
> The documentation contains paragraph
>
> +      The generation expression can only use immutable functions and cannot
> +      use subqueries or reference anything other than the current row
> in any
> +      way.
>
> It is necessary for stored columns?

See here:
https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f914ca@2ndquadrant.com

I understand - it is logical. But it is sad, so this feature is not complete. The benefit is not too big - against functional indexes or views. But it can be first step.



> I tested it with pseudo constant - current_timestamp, session_user. But
> current_database() is disallowed.
>
> on second hand, this is strange
>
> postgres=# create table foo3 (inserted text generated always as
> (current_timestamp) virtual);
> CREATE TABLE

Ah, the volatility checking needs some improvements.  I'll address that
in the next patch version.

ok
 
--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Sun, Jan 13, 2019 at 03:31:23PM +0100, Pavel Stehule wrote:
> ne 13. 1. 2019 v 10:43 odesílatel Peter Eisentraut <
> peter.eisentraut@2ndquadrant.com> napsal:
>> See here:
>>
>> https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f914ca@2ndquadrant.com
>
> I understand - it is logical. But it is sad, so this feature is not
> complete. The benefit is not too big - against functional indexes or views.
> But it can be first step.

I wouldn't say that.  Volatibility restrictions based on immutable
functions apply to many concepts similar like expression pushdowns to
make for deterministic results.  The SQL spec takes things on the safe
side.

>> Ah, the volatility checking needs some improvements.  I'll address that
>> in the next patch version.
>
> ok

The same problem happens for stored and virtual columns.

The latest patch has a small header conflict at the top of
rewriteHandler.c which is simple enough to fix.

It would be nice to add a test with composite types, say something
like:
=# create type double_int as (a int, b int);
CREATE TYPE
=# create table double_tab (a int,
  b double_int GENERATED ALWAYS AS ((a * 2, a * 3)) stored,
  c double_int GENERATED ALWAYS AS ((a * 4, a * 5)) virtual);
CREATE TABLE
=# insert into double_tab values (1), (6);
INSERT 0 2
=# select * from double_tab ;
 a |    b    |    c
---+---------+---------
 1 | (2,3)   | (4,5)
 6 | (12,18) | (24,30)
(2 rows)

Glad to see that typed tables are handled and forbidden, and the
trigger definition looks sane to me.

+-- partitioned table
+CREATE TABLE gtest_parent (f1 date NOT NULL, f2 bigint, f3 bigint
GENERATED ALWAYS AS (f2 * 2)) PARTITION BY RANGE (f1);
+CREATE TABLE gtest_child PARTITION OF gtest_parent FOR VALUES FROM
('2016-07-01') TO ('2016-08-01');
+INSERT INTO gtest_parent (f1, f2) VALUES ('2016-07-15', 1);
In my experience, having tests which handle multiple layers of
partitions is never a bad thing.

+    /*
+     * Changing the type of a column that is used by a
+     * generated column is not allowed by SQL standard.
+     * It might be doable with some thinking and effort.
Just mentioning the part about the SQL standard seems fine to me.

+   bool        has_generated_stored;
+   bool        has_generated_virtual;
 } TupleConstr;
Could have been more simple to use a char as representation here.

Using NULL as generation expression results in a crash when selecting
the relation created:
=# CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (NULL));
CREATE TABLE
=# select * from gtest_err_1;
server closed the connection unexpectedly

=# CREATE TABLE gtest_err_2 (a int PRIMARY KEY, b int NOT NULL
GENERATED ALWAYS AS (NULL));
CREATE TABLE
A NOT NULL column can use NULL as generated result :)

+   The view <literal>column_column_usage</literal> identifies all
generated
"column_column_usage" is redundant.  Could it be possible to come up
with a better name?

When testing a bulk INSERT into a table which has a stored generated
column, memory keeps growing in size linearly, which does not seem
normal to me.  If inserting more tuples than what I tested (I stopped
at 10M because of lack of time), it seems to me that this could result
in OOMs.  I would have expected the memory usage to be steady.

+    /* ignore virtual generated columns; they are always null here */
+    if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+        continue;
Looking for an assertion or a sanity check of some kind?

+       if (pstate->p_expr_kind == EXPR_KIND_GENERATED_COLUMN &&
+           attnum < InvalidAttrNumber && attnum != TableOidAttributeNumber)
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+                    errmsg("cannot use system column \"%s\" in column generation expression",
+                           colname),
+                    parser_errposition(pstate, location)));
There is a test using xmon, you may want one with tableoid.

+/*
+ * Thin wrapper around libpq to obtain server version.
+ */
+static int
+libpqrcv_server_version(WalReceiverConn *conn)
This should be introduced in separate patch in my opinion (needed
afterwards for logirep).

I have not gone through the PL part of the changes yet, except
plpgsql.

What about the catalog representation of attgenerated?  Would it merge
with attidentity & co?  Or not?
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Pavel Stehule
Date:


út 15. 1. 2019 v 8:14 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Sun, Jan 13, 2019 at 03:31:23PM +0100, Pavel Stehule wrote:
> ne 13. 1. 2019 v 10:43 odesílatel Peter Eisentraut <
> peter.eisentraut@2ndquadrant.com> napsal:
>> See here:
>>
>> https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f914ca@2ndquadrant.com
>
> I understand - it is logical. But it is sad, so this feature is not
> complete. The benefit is not too big - against functional indexes or views.
> But it can be first step.

I wouldn't say that.  Volatibility restrictions based on immutable
functions apply to many concepts similar like expression pushdowns to
make for deterministic results.  The SQL spec takes things on the safe
side.

I would to have a mechanism for safe replacement of triggers of type

if TG_TYPE = 'INSERT' THEN
  NEW.inserted := CURRENT_TIMESTAMP;
ELSE IF TG_TYPE = 'UPDATE' THEN
  NEW.updated := CURRENT_TIMESTAMP;
 ..

But I understand, so current SQL spec design is safe.

Regards

Pavel



Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 15/01/2019 08:18, Pavel Stehule wrote:
> I would to have a mechanism for safe replacement of triggers of type
> 
> if TG_TYPE = 'INSERT' THEN
>   NEW.inserted := CURRENT_TIMESTAMP;
> ELSE IF TG_TYPE = 'UPDATE' THEN
>   NEW.updated := CURRENT_TIMESTAMP;
>  ..

That kind of use is probably better addressed with a temporal facility.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Pavel Stehule
Date:


st 16. 1. 2019 v 9:26 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
On 15/01/2019 08:18, Pavel Stehule wrote:
> I would to have a mechanism for safe replacement of triggers of type
>
> if TG_TYPE = 'INSERT' THEN
>   NEW.inserted := CURRENT_TIMESTAMP;
> ELSE IF TG_TYPE = 'UPDATE' THEN
>   NEW.updated := CURRENT_TIMESTAMP;
>  ..

That kind of use is probably better addressed with a temporal facility.

yes. I am looking for this functionality in Postgres

Pavel


--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 15/01/2019 08:13, Michael Paquier wrote:
> When testing a bulk INSERT into a table which has a stored generated
> column, memory keeps growing in size linearly, which does not seem
> normal to me.  If inserting more tuples than what I tested (I stopped
> at 10M because of lack of time), it seems to me that this could result
> in OOMs.  I would have expected the memory usage to be steady.

What are you executing exactly?  One INSERT command with many rows?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Robert Haas
Date:
On Thu, Nov 22, 2018 at 10:16 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 15/11/2018 15:10, Robert Haas wrote:
> > I don't have a strong position on 1 vs. 2 vs. 3, but I do think it
> > would be nicer not to use '\0' as a column value.  I'd suggest you use
> > 'n' or '0' or '-' or some other printable character instead.
>
> I had carefully considered this when attidentity was added.  Using '\0'
> allows you to use this column as a boolean in C code, which is often
> convenient.  Also, there are numerous places where a pg_attribute form
> or a tuple descriptor is initialized to all zeroes, which works well for
> most fields, and adding one exception like this would create a lot of
> extra work and bloat the patch and create potential for future
> instability.  Also note that a C char '\0' is represented as '' (empty
> string) in SQL, so this also creates a natural representation in SQL.

I'm not really convinced.  I think that the stdbool work you've been
doing shows that blurring the line between char and bool is a bad
idea.  And I believe that on general principle, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] generated columns

From
Erik Rijkers
Date:
I couldn't compile v7-0001 and I am testing with the older v6-0001 (of 
which I still had an instance).

So the problem below may have been fixed already.

If you add a generated column to a file_fdw foreign table, it works OK 
wih VIRTUAL (the default) but with STORED it adds an empty column, 
silently.  I would say it would make more sense to get an error.


Erik Rijkers



Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Wed, Jan 16, 2019 at 02:14:41PM +0100, Peter Eisentraut wrote:
> On 15/01/2019 08:13, Michael Paquier wrote:
>> When testing a bulk INSERT into a table which has a stored generated
>> column, memory keeps growing in size linearly, which does not seem
>> normal to me.  If inserting more tuples than what I tested (I stopped
>> at 10M because of lack of time), it seems to me that this could result
>> in OOMs.  I would have expected the memory usage to be steady.
>
> What are you executing exactly?  One INSERT command with many rows?

Yes, something like that grows the memory and CPU usage rather
linearly:
CREATE TABLE tab (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab VALUES (generate_series(1,100000000));
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 16/01/2019 22:40, Erik Rijkers wrote:
> I couldn't compile v7-0001 and I am testing with the older v6-0001 (of 
> which I still had an instance).
> 
> So the problem below may have been fixed already.
> 
> If you add a generated column to a file_fdw foreign table, it works OK 
> wih VIRTUAL (the default) but with STORED it adds an empty column, 
> silently.  I would say it would make more sense to get an error.

Yes, v7 has addressed foreign-data wrappers.  (I only tested with
postgres_fdw.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Thu, Jan 17, 2019 at 10:12:26AM +0900, Michael Paquier wrote:
> Yes, something like that grows the memory and CPU usage rather
> linearly:
> CREATE TABLE tab (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
> INSERT INTO tab VALUES (generate_series(1,100000000));

The latest patch set got plenty of feedback not addressed yet, so I am
marking it as returned with feedback.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-02-01 04:48, Michael Paquier wrote:
> On Thu, Jan 17, 2019 at 10:12:26AM +0900, Michael Paquier wrote:
>> Yes, something like that grows the memory and CPU usage rather
>> linearly:
>> CREATE TABLE tab (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
>> INSERT INTO tab VALUES (generate_series(1,100000000));
> 
> The latest patch set got plenty of feedback not addressed yet, so I am
> marking it as returned with feedback.

Here is an updated patch which should address the review comments in the
latest round.

My perspective on this patch is:

The stored generated column part is pretty solid.  It can target PG12.

The virtual generated column part is still a bit iffy.  I'm still
finding places here and there where virtual columns are not being
expanded correctly.  Maybe it needs more refactoring.  One big unsolved
issue is how the storage of such columns should work.  Right now, they
are stored as nulls.  That works fine, but what I suppose we'd really
want is to not store them at all.  That, however, creates all kinds of
complications in the planner if target lists have non-matching lengths
or the resnos don't match up.  I haven't figured out how to do this cleanly.

So I'm thinking if we can get agreement on the stored columns, I can cut
out the virtual column stuff for PG12.  That should be fairly easy.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-01-15 08:13, Michael Paquier wrote:
>>> Ah, the volatility checking needs some improvements.  I'll address that
>>> in the next patch version.
>>
>> ok
> 
> The same problem happens for stored and virtual columns.

This is fixed in v8.

> It would be nice to add a test with composite types, say something
> like:
> =# create type double_int as (a int, b int);
> CREATE TYPE
> =# create table double_tab (a int,
>   b double_int GENERATED ALWAYS AS ((a * 2, a * 3)) stored,
>   c double_int GENERATED ALWAYS AS ((a * 4, a * 5)) virtual);
> CREATE TABLE
> =# insert into double_tab values (1), (6);
> INSERT 0 2
> =# select * from double_tab ;
>  a |    b    |    c
> ---+---------+---------
>  1 | (2,3)   | (4,5)
>  6 | (12,18) | (24,30)
> (2 rows)

I added that.

> +   bool        has_generated_stored;
> +   bool        has_generated_virtual;
>  } TupleConstr;
> Could have been more simple to use a char as representation here.

Seems confusing if both apply at the same time.

> Using NULL as generation expression results in a crash when selecting
> the relation created:
> =# CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (NULL));
> CREATE TABLE
> =# select * from gtest_err_1;
> server closed the connection unexpectedly

This is fixed in v8.

> +   The view <literal>column_column_usage</literal> identifies all
> generated
> "column_column_usage" is redundant.  Could it be possible to come up
> with a better name?

This is specified in the SQL stnadard.

> When testing a bulk INSERT into a table which has a stored generated
> column, memory keeps growing in size linearly, which does not seem
> normal to me.

This was a silly coding error.  It's fixed in v8.

> +/*
> + * Thin wrapper around libpq to obtain server version.
> + */
> +static int
> +libpqrcv_server_version(WalReceiverConn *conn)
> This should be introduced in separate patch in my opinion (needed
> afterwards for logirep).

Yes, it could be committed separately.

> What about the catalog representation of attgenerated?  Would it merge
> with attidentity & co?  Or not?

I think the way it is now seems best.  The other options that were
discussed are also plausible, but that the discussions did not reveal
any overwhelming arguments for a a change.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Mon, Feb 25, 2019 at 09:51:52PM +0100, Peter Eisentraut wrote:
> On 2019-01-15 08:13, Michael Paquier wrote:
>> +   bool        has_generated_stored;
>> +   bool        has_generated_virtual;
>>  } TupleConstr;
>> Could have been more simple to use a char as representation here.
>
> Seems confusing if both apply at the same time.

Ouch, I see.  The flags count for all attributes.  I missed that in a
previous read of the patch.  Yeah, two booleans make sense.

>> When testing a bulk INSERT into a table which has a stored generated
>> column, memory keeps growing in size linearly, which does not seem
>> normal to me.
>
> This was a silly coding error.  It's fixed in v8.

Thanks, this one looks fine.

>> +/*
>> + * Thin wrapper around libpq to obtain server version.
>> + */
>> +static int
>> +libpqrcv_server_version(WalReceiverConn *conn)
>> This should be introduced in separate patch in my opinion (needed
>> afterwards for logirep).
>
> Yes, it could be committed separately.

I would split that one and I think that it could go in.  If you wish
to keep things grouped that's fine by me as well.

>> What about the catalog representation of attgenerated?  Would it merge
>> with attidentity & co?  Or not?
>
> I think the way it is now seems best.  The other options that were
> discussed are also plausible, but that the discussions did not reveal
> any overwhelming arguments for a a change.

Hm.  Does the SQL standard mention more features which could be merged
with stored values, virtual values, default expressions and identity
columns?  I don't know the last trends in this area but I am wondering
if there are any other column specific, expression-like features like
that associated to a column.  That could give more strength with
having one column in pg_attribute to govern them all.  Well, assuming
that something else is implemented of course.  That's a lot of
assumptions, and it's not like the current implementation is wrong
either.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Mon, Feb 25, 2019 at 09:46:35PM +0100, Peter Eisentraut wrote:
> The virtual generated column part is still a bit iffy.  I'm still
> finding places here and there where virtual columns are not being
> expanded correctly.  Maybe it needs more refactoring.  One big unsolved
> issue is how the storage of such columns should work.  Right now, they
> are stored as nulls.  That works fine, but what I suppose we'd really
> want is to not store them at all.  That, however, creates all kinds of
> complications in the planner if target lists have non-matching lengths
> or the resnos don't match up.  I haven't figured out how to do this
> cleanly.

Hmm.  Not storing virtual columns looks like the correct concept to
me instead of storing them as NULL.

> So I'm thinking if we can get agreement on the stored columns, I can cut
> out the virtual column stuff for PG12.  That should be fairly easy.

The shape of what is used for stored columns looks fine to me.

+   if (attgenerated)
+   {
+       /*
+        * Generated column: Dropping anything that the generation expression
+        * refers to automatically drops the generated column.
+        */
+       recordDependencyOnSingleRelExpr(&colobject, expr, RelationGetRelid(rel),
+                                       DEPENDENCY_AUTO,
+                                       DEPENDENCY_AUTO, false);
+   }
A CCI is not necessary I think here, still the recent thread about
automatic dependencies with identity columns had a much similar
pattern...

+           else if (generated[0] == ATTRIBUTE_GENERATED_VIRTUAL)
+               default_str = psprintf("generated always as (%s)", PQgetvalue(res, i, attrdef_col));
Nit: I would add VIRTUAL instead of relying on the default option.

Another thing I was thinking about: could it be possible to add a
sanity check in sanity_check.sql so as a column more that one
field in attidentity, attgenerated and atthasdef set at the same time?
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-02-26 06:30, Michael Paquier wrote:
> +   if (attgenerated)
> +   {
> +       /*
> +        * Generated column: Dropping anything that the generation expression
> +        * refers to automatically drops the generated column.
> +        */
> +       recordDependencyOnSingleRelExpr(&colobject, expr, RelationGetRelid(rel),
> +                                       DEPENDENCY_AUTO,
> +                                       DEPENDENCY_AUTO, false);
> +   }
> A CCI is not necessary I think here, still the recent thread about
> automatic dependencies with identity columns had a much similar
> pattern...

Yeah, worth taking another look.

> 
> +           else if (generated[0] == ATTRIBUTE_GENERATED_VIRTUAL)
> +               default_str = psprintf("generated always as (%s)", PQgetvalue(res, i, attrdef_col));
> Nit: I would add VIRTUAL instead of relying on the default option.

I suppose we'll decide that when the virtual columns are actually added.
 I see your point.

> Another thing I was thinking about: could it be possible to add a
> sanity check in sanity_check.sql so as a column more that one
> field in attidentity, attgenerated and atthasdef set at the same time?

There is something like that at the top of
src/test/regress/sql/generated.sql.  I can expand that.  But it only
covers the tests.  For run time checks, you'd want something like
pg_catcheck.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-02-26 06:12, Michael Paquier wrote:
> Hm.  Does the SQL standard mention more features which could be merged
> with stored values, virtual values, default expressions and identity
> columns?  I don't know the last trends in this area but I am wondering
> if there are any other column specific, expression-like features like
> that associated to a column.  That could give more strength with
> having one column in pg_attribute to govern them all.  Well, assuming
> that something else is implemented of course.  That's a lot of
> assumptions, and it's not like the current implementation is wrong
> either.

The system-versioned tables feature might be the most adjacent one.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Wed, Feb 27, 2019 at 08:05:02AM +0100, Peter Eisentraut wrote:
> There is something like that at the top of
> src/test/regress/sql/generated.sql.  I can expand that.

I think you mean identity.sql.  I would think that this would live
better with some other sanity checks.  I am fine with your final
decision on the matter.

> But it only covers the tests.  For run time checks, you'd want
> something like pg_catcheck.

Sure.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-01-15 08:13, Michael Paquier wrote:
> +/*
> + * Thin wrapper around libpq to obtain server version.
> + */
> +static int
> +libpqrcv_server_version(WalReceiverConn *conn)
> This should be introduced in separate patch in my opinion (needed
> afterwards for logirep).

I have committed this separately.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
Here is an updated patch with just the "stored" functionality, as discussed.

The actual functionality is much smaller now, contained in the executor.
 Everything else is mostly DDL support, trigger handling, and some
frontend stuff.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] generated columns

From
Pavel Stehule
Date:
Hi

po 18. 3. 2019 v 8:35 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
Here is an updated patch with just the "stored" functionality, as discussed.

The actual functionality is much smaller now, contained in the executor.
 Everything else is mostly DDL support, trigger handling, and some
frontend stuff.

probably I found a bug

create table foo(id int, name text);
insert into foo values(1, 'aaa');
alter table foo add column name_upper text generated always as (upper(name)) stored;
update foo set name = 'bbb' where id = 1; -- ok

alter table foo drop column name_upper;
update foo set name = 'bbbx' where id = 1; -- ok

alter table foo add column name_upper text generated always as (upper(name)) stored;
update foo set name = 'bbbxx' where id = 1; -- error

postgres=# update foo set name = 'bbbxx' where id = 1; -- error
ERROR:  no generation expression found for column number 3 of table "foo"

Regards

Pavel








--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Mon, Mar 18, 2019 at 03:14:09PM +0100, Pavel Stehule wrote:
> postgres=# update foo set name = 'bbbxx' where id = 1; -- error
> ERROR:  no generation expression found for column number 3 of table
> "foo"

Yes I can see the problem after adding a generated column and dropping
it on an INSERT query.

I have read through the code once.

+       if (relid && attnum && get_attgenerated(relid, attnum))
Better to use OidIsValid here?

+       (walrcv_server_version(wrconn) >= 120000 ? "AND a.attgenerated = ''" : ""),
I think that it is better to always have version-related references
stored as defines.

+CREATE TABLE gtest22a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED UNIQUE);
+CREATE TABLE gtest22b (a int, b int GENERATED ALWAYS AS (a * 2) STORED, PRIMARY KEY (a, b));
Some tests for unique constraints with a generated column should be in
place?

It would be nice to have extra tests for forbidden expression types
on generated columns especially SRF, subquery and aggregates/window
functions.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-03-20 03:51, Michael Paquier wrote:
> On Mon, Mar 18, 2019 at 03:14:09PM +0100, Pavel Stehule wrote:
>> postgres=# update foo set name = 'bbbxx' where id = 1; -- error
>> ERROR:  no generation expression found for column number 3 of table
>> "foo"
> 
> Yes I can see the problem after adding a generated column and dropping
> it on an INSERT query.

fixed

> +       if (relid && attnum && get_attgenerated(relid, attnum))
> Better to use OidIsValid here?

fixed

> +       (walrcv_server_version(wrconn) >= 120000 ? "AND a.attgenerated = ''" : ""),
> I think that it is better to always have version-related references
> stored as defines.

A valid idea, but I don't see it widely done (see psql, pg_dump).

> +CREATE TABLE gtest22a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED UNIQUE);
> +CREATE TABLE gtest22b (a int, b int GENERATED ALWAYS AS (a * 2) STORED, PRIMARY KEY (a, b));
> Some tests for unique constraints with a generated column should be in
> place?

done

> It would be nice to have extra tests for forbidden expression types
> on generated columns especially SRF, subquery and aggregates/window
> functions.

done

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-03-26 20:50, Pavel Stehule wrote:
> It is great feature and I'll mark this feature as ready for commit

Committed, thanks.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

From
Pavel Stehule
Date:


so 30. 3. 2019 v 9:03 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
On 2019-03-26 20:50, Pavel Stehule wrote:
> It is great feature and I'll mark this feature as ready for commit

Committed, thanks.

great feature, it reduce some necessity of triggers

Regards

Pavel



--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

From
Justin Pryzby
Date:
On Sat, Mar 30, 2019 at 09:03:03AM +0100, Peter Eisentraut wrote:
> On 2019-03-26 20:50, Pavel Stehule wrote:
> > It is great feature and I'll mark this feature as ready for commit
> 
> Committed, thanks.

create_table.sgml now has this:

https://www.postgresql.org/docs/devel/sql-createtable.html#id-1.9.3.85.6.2.18.1.2
+     <para>
+      The keyword <literal>STORED</literal> is required to signify that the
+      column will be computed on write and will be stored on disk.  default.
+     </para>

What does "default." mean ?

Also, this is working but not documented as valid:
postgres=# CREATE TABLE t (j int, i int GENERATED BY DEFAULT AS (j*j+1) STORED);

Justin



Re: [HACKERS] generated columns

From
Erik Rijkers
Date:
On 2019-01-16 22:40, Erik Rijkers wrote:
> 
> If you add a generated column to a file_fdw foreign table, it works OK
> wih VIRTUAL (the default) but with STORED it adds an empty column,
> silently.  I would say it would make more sense to get an error.

VIRTUAL is gone, but that other issue is still there:  STORED in a 
file_fdw foreign table still silently creates the column which then 
turns out to be useless on SELECT, with an error like:

"ERROR:  column some_column_name is a generated column
DETAIL:  Generated columns cannot be used in COPY."

Maybe it'd be possible to get an error earlier, i.e., while trying to 
create such a useless column?


thanks,

Erik Rijkers









Re: [HACKERS] generated columns

From
Jeff Janes
Date:
On Sat, Mar 30, 2019 at 4:03 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-26 20:50, Pavel Stehule wrote:
> It is great feature and I'll mark this feature as ready for commit

Committed, thanks.

I can't do a same-major-version pg_upgrade across this commit, as the new pg_dump is trying to dump a nonexistent attgenerated column from the old database.  Is that acceptable collateral damage?  I thought we usually avoided that breakage within the dev branch, but I don't know how we do it.

Cheers,

Jeff

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-03-30 10:24, Justin Pryzby wrote:
> create_table.sgml now has this:
> 
> https://www.postgresql.org/docs/devel/sql-createtable.html#id-1.9.3.85.6.2.18.1.2
> +     <para>
> +      The keyword <literal>STORED</literal> is required to signify that the
> +      column will be computed on write and will be stored on disk.  default.
> +     </para>
> 
> What does "default." mean ?

Typo, fixed.

> Also, this is working but not documented as valid:
> postgres=# CREATE TABLE t (j int, i int GENERATED BY DEFAULT AS (j*j+1) STORED);

Fixed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-03-31 05:49, Erik Rijkers wrote:
> STORED in a 
> file_fdw foreign table still silently creates the column which then 
> turns out to be useless on SELECT, with an error like:
> 
> "ERROR:  column some_column_name is a generated column
> DETAIL:  Generated columns cannot be used in COPY."
> 
> Maybe it'd be possible to get an error earlier, i.e., while trying to 
> create such a useless column?

I'll look into it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-03-31 15:22, Jeff Janes wrote:
> I can't do a same-major-version pg_upgrade across this commit, as the
> new pg_dump is trying to dump a nonexistent attgenerated column from the
> old database.  Is that acceptable collateral damage?  I thought we
> usually avoided that breakage within the dev branch, but I don't know
> how we do it.

We don't.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Mon, Apr 01, 2019 at 10:53:27AM +0200, Peter Eisentraut wrote:
> On 2019-03-31 15:22, Jeff Janes wrote:
>> I can't do a same-major-version pg_upgrade across this commit, as the
>> new pg_dump is trying to dump a nonexistent attgenerated column from the
>> old database.  Is that acceptable collateral damage?  I thought we
>> usually avoided that breakage within the dev branch, but I don't know
>> how we do it.
>
> We don't.

Really?  I thought on the contrary that it should work.  This reminds
me of commit 59a884e9, which has been discussed here:
https://www.postgresql.org/message-id/20160212001846.GB29511@momjian.us
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Michael Paquier
Date:
On Mon, Apr 01, 2019 at 10:44:05PM +0900, Michael Paquier wrote:
> Really?  I thought on the contrary that it should work.  This reminds
> me of commit 59a884e9, which has been discussed here:
> https://www.postgresql.org/message-id/20160212001846.GB29511@momjian.us

And this remark makes little sense as we are taking about incompatible
dump formats with pg_dump.  My apologies for the useless noise.

/me runs fast and hides, fast.
--
Michael

Attachment

Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-04-01 10:52, Peter Eisentraut wrote:
> On 2019-03-31 05:49, Erik Rijkers wrote:
>> STORED in a 
>> file_fdw foreign table still silently creates the column which then 
>> turns out to be useless on SELECT, with an error like:
>>
>> "ERROR:  column some_column_name is a generated column
>> DETAIL:  Generated columns cannot be used in COPY."
>>
>> Maybe it'd be possible to get an error earlier, i.e., while trying to 
>> create such a useless column?
> 
> I'll look into it.

I've been trying to create a test case for file_fdw for this, but I'm
not getting your result.  Can you send a complete test case?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

From
Erik Rijkers
Date:
On 2019-04-02 14:43, Peter Eisentraut wrote:
> On 2019-04-01 10:52, Peter Eisentraut wrote:
>> On 2019-03-31 05:49, Erik Rijkers wrote:
>>> STORED in a
>>> file_fdw foreign table still silently creates the column which then
>>> turns out to be useless on SELECT, with an error like:
>>> 
>>> "ERROR:  column some_column_name is a generated column
>>> DETAIL:  Generated columns cannot be used in COPY."
>>> 
>>> Maybe it'd be possible to get an error earlier, i.e., while trying to
>>> create such a useless column?
>> 
>> I'll look into it.
> 
> I've been trying to create a test case for file_fdw for this, but I'm
> not getting your result.  Can you send a complete test case?


Ah, I had not noticed before: with an asterisk ('select * from table' ) 
one gets no error, just empty values.

An actual error seems to occur when one mentions the 
generated-column-name explicitly in the select-list.

select "id", "Ratio Log2 GEN" from <file_fdw foreign table>;
"
ERROR:  column "Ratio Log2 GEN" is a generated column
DETAIL:  Generated columns cannot be used in COPY.
"

That's from a quick test here at work; maybe that gives you enough info.

If that doesn't make it repeatable (for you) I'll make a more complete 
example this evening (from home).




Re: [HACKERS] generated columns

From
Erik Rijkers
Date:
On 2019-04-02 15:36, Erik Rijkers wrote:
> On 2019-04-02 14:43, Peter Eisentraut wrote:
>> On 2019-04-01 10:52, Peter Eisentraut wrote:
>>> On 2019-03-31 05:49, Erik Rijkers wrote:
>>>> STORED in a
>>>> file_fdw foreign table still silently creates the column which then
>>>> turns out to be useless on SELECT, with an error like:
>>>> 
>>>> "ERROR:  column some_column_name is a generated column
>>>> DETAIL:  Generated columns cannot be used in COPY."
>>>> 
>>>> Maybe it'd be possible to get an error earlier, i.e., while trying 
>>>> to
>>>> create such a useless column?
>>> 
>>> I'll look into it.
>> 
>> I've been trying to create a test case for file_fdw for this, but I'm
>> not getting your result.  Can you send a complete test case?

attached is run_ft.sh  which creates a text file:  /tmp/pg_head.txt
then sets it up as a foreign table, and adds a generated column.

Then selects a succesful select, followed by a error-producing select.

Some selects are succesful but some fail.  I'm not sure why it sometimes 
fails  (it's not just the explicitness of the generated-column-name like 
I suggested earlier).



My output of run_ft.sh is below.


$ ./run_ft.sh
create schema if not exists "tmp";
CREATE SCHEMA
create server if not exists "tmpserver" foreign data wrapper file_fdw;
CREATE SERVER
drop   foreign table if exists tmp.pg_head cascade;
DROP FOREIGN TABLE
create foreign table           tmp.pg_head (
     "Gene"                      text,
     "Ratio H/L normalized Exp1" numeric
)
server tmpserver
options (
     delimiter E'\t'
   , format 'csv'
   , header 'TRUE'
   , filename      '/tmp/pg_head.txt'
);
CREATE FOREIGN TABLE
alter foreign table tmp.pg_head
    add column "Ratio H/L normalized Exp1 Log2 (Generated column)" 
numeric generated always as (case when "Ratio H/L normalized Exp1" > 0 
then log(2, "Ratio H/L normalized Exp1") else  null end) stored
;
ALTER FOREIGN TABLE
-- this is OK (although the generated-column values are all empty/null)
select
      "Gene"
    , "Ratio H/L normalized Exp1"
    , "Ratio H/L normalized Exp1 Log2 (Generated column)"
from tmp.pg_head
limit 3 ;
   Gene  | Ratio H/L normalized Exp1 | Ratio H/L normalized Exp1 Log2 
(Generated column)
--------+---------------------------+---------------------------------------------------
  Dhx9   |                       NaN |
  Gapdh  |                   0.42288 |
  Gm8797 |                   0.81352 |
(3 rows)

-- but this fails
select
     "Gene"
    , "Ratio H/L normalized Exp1 Log2 (Generated column)"
from tmp.pg_head
limit 3 ;
ERROR:  column "Ratio H/L normalized Exp1 Log2 (Generated column)" is a 
generated column
DETAIL:  Generated columns cannot be used in COPY.



Attachment

Re: [HACKERS] generated columns

From
Amit Langote
Date:
On 2019/04/03 3:54, Erik Rijkers wrote:
> create schema if not exists "tmp";
> CREATE SCHEMA
> create server if not exists "tmpserver" foreign data wrapper file_fdw;
> CREATE SERVER
> drop   foreign table if exists tmp.pg_head cascade;
> DROP FOREIGN TABLE
> create foreign table           tmp.pg_head (
>     "Gene"                      text,
>     "Ratio H/L normalized Exp1" numeric
> )
> server tmpserver
> options (
>     delimiter E'\t'
>   , format 'csv'
>   , header 'TRUE'
>   , filename      '/tmp/pg_head.txt'
> );
> CREATE FOREIGN TABLE
> alter foreign table tmp.pg_head
>    add column "Ratio H/L normalized Exp1 Log2 (Generated column)" numeric
> generated always as (case when "Ratio H/L normalized Exp1" > 0 then log(2,
> "Ratio H/L normalized Exp1") else  null end) stored
> ;
> ALTER FOREIGN TABLE
> -- this is OK (although the generated-column values are all empty/null)
> select
>      "Gene"
>    , "Ratio H/L normalized Exp1"
>    , "Ratio H/L normalized Exp1 Log2 (Generated column)"
> from tmp.pg_head
> limit 3 ;
>   Gene  | Ratio H/L normalized Exp1 | Ratio H/L normalized Exp1 Log2
> (Generated column)
> --------+---------------------------+---------------------------------------------------
> 
>  Dhx9   |                       NaN |
>  Gapdh  |                   0.42288 |
>  Gm8797 |                   0.81352 |
> (3 rows)
> 
> -- but this fails
> select
>     "Gene"
>    , "Ratio H/L normalized Exp1 Log2 (Generated column)"
> from tmp.pg_head
> limit 3 ;
> ERROR:  column "Ratio H/L normalized Exp1 Log2 (Generated column)" is a
> generated column
> DETAIL:  Generated columns cannot be used in COPY.

Fwiw, here's a scenario that fails similarly when using copy, which is
what file_fdw uses internally.

$ cat foo.csv
1

create table foo (a int, b int generated always as (a+1) stored);
copy foo (a, b) from '/tmp/foo.csv';
ERROR:  column "b" is a generated column
DETAIL:  Generated columns cannot be used in COPY.

whereas:

copy foo from '/tmp/foo.csv';
COPY 1

select * from foo;
 a │ b
───┼───
 1 │ 2
(1 row)

Erik said upthread that generated columns should really be disallowed for
(at least) file_fdw foreign tables [1], because (?) they don't support
inserts.  Changing copy.c to "fix the above seems out of the question.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/e61c597ac4541b77750594eea73a774c%40xs4all.nl




Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-04-02 20:54, Erik Rijkers wrote:
> attached is run_ft.sh  which creates a text file:  /tmp/pg_head.txt
> then sets it up as a foreign table, and adds a generated column.
> 
> Then selects a succesful select, followed by a error-producing select.

I have committed a fix for this.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

From
Amit Langote
Date:
On 2019/04/04 16:52, Peter Eisentraut wrote:
> On 2019-04-02 20:54, Erik Rijkers wrote:
>> attached is run_ft.sh  which creates a text file:  /tmp/pg_head.txt
>> then sets it up as a foreign table, and adds a generated column.
>>
>> Then selects a succesful select, followed by a error-producing select.
> 
> I have committed a fix for this.

+-- generated column tests
+CREATE FOREIGN TABLE gft1 (a int, b text, c text GENERATED ALWAYS AS
('foo') STORED) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter
',');
+SELECT a, c FROM gft1;
+ a |   c
+---+--------
+ 1 | _null_
+ 1 | _null_

Hmm, I'm afraid we might get bug reports if we go with this.  Why is it OK
to get null in this case when a user explicitly asked for 'foo'?

Thanks,
Amit




Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-04-04 11:42, Amit Langote wrote:
> Hmm, I'm afraid we might get bug reports if we go with this.  Why is it OK
> to get null in this case when a user explicitly asked for 'foo'?

The way stored generated columns work on foreign tables is that the
to-be-stored value is computed, then given to the foreign table handler,
which then has to store it, and then return it later when queried.
However, since the backing store of a foreign table is typically
modifiable directly by the user via other channels, it's possible to
create situations where actually stored data does not satisfy the
generation expression.  That is the case here.  I don't know of a
principled way to prevent that.  It's just one of the problems that can
happen when you store data in a foreign table:  You have very little
control over what data actually ends up being stored.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

From
Amit Langote
Date:
On Thu, Apr 4, 2019 at 8:01 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2019-04-04 11:42, Amit Langote wrote:
> > Hmm, I'm afraid we might get bug reports if we go with this.  Why is it OK
> > to get null in this case when a user explicitly asked for 'foo'?
>
> The way stored generated columns work on foreign tables is that the
> to-be-stored value is computed, then given to the foreign table handler,
> which then has to store it, and then return it later when queried.
> However, since the backing store of a foreign table is typically
> modifiable directly by the user via other channels, it's possible to
> create situations where actually stored data does not satisfy the
> generation expression.  That is the case here.  I don't know of a
> principled way to prevent that.  It's just one of the problems that can
> happen when you store data in a foreign table:  You have very little
> control over what data actually ends up being stored.

OK,  thanks for explaining.  We do allow DEFAULT to be specified on
foreign tables, although locally-defined defaults have little meaning
if the FDW doesn't allow inserts.  Maybe same thing applies to
GENERATED AS columns.

Would it make sense to clarify this on CREATE FOREIGN TABLE page?

Thanks,
Amit



Re: [HACKERS] generated columns

From
Peter Eisentraut
Date:
On 2019-04-04 16:37, Amit Langote wrote:
> OK,  thanks for explaining.  We do allow DEFAULT to be specified on
> foreign tables, although locally-defined defaults have little meaning
> if the FDW doesn't allow inserts.  Maybe same thing applies to
> GENERATED AS columns.
> 
> Would it make sense to clarify this on CREATE FOREIGN TABLE page?

done

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Sender Address Forgery]Re: [HACKERS] generated columns

From
Amit Langote
Date:
On 2019/04/08 20:50, Peter Eisentraut wrote:
> On 2019-04-04 16:37, Amit Langote wrote:
>> OK,  thanks for explaining.  We do allow DEFAULT to be specified on
>> foreign tables, although locally-defined defaults have little meaning
>> if the FDW doesn't allow inserts.  Maybe same thing applies to
>> GENERATED AS columns.
>>
>> Would it make sense to clarify this on CREATE FOREIGN TABLE page?
> 
> done

Thanks.

Regards,
Amit