Thread: [HACKERS] generated columns
Here is another attempt to implement generated columns. This is a well-known SQL-standard feature, also available for instance in DB2, MySQL, Oracle. A quick example: CREATE TABLE t1 ( ..., height_cm numeric, height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) ); (This is not related to the recent identity columns feature, other than the similar syntax and some overlap internally.) In previous discussions, it has often been a source of confusion whether these generated columns are supposed to be computed on insert/update and stored, or computed when read. The SQL standard is not explicit, but appears to lean toward stored. DB2 stores. Oracle computes on read. MySQL supports both. So I target implementing both. This makes sense: Both regular views and materialized views have their uses, too. For the syntax, I use the MySQL/Oracle syntax of appending [VIRTUAL|STORED]. In this patch, only VIRTUAL is fully implemented. I also have STORED kind of working, but it wasn't fully baked, so I haven't included it here. Known bugs: - pg_dump produces a warning about a dependency loop when dumping these. Will need to be fixed at some point, but it doesn't prevent anything from working right now. Open design issues: - COPY behavior: Currently, generated columns are automatically omitted if there is no column list, and prohibited if specified explicitly. When stored generated columns are implemented, they could be copied out. Some user options might be possible here. - Catalog storage: I store the generation expression in pg_attrdef, like a default. For the most part, this works well. It is not clear, however, what pg_attribute.atthasdef should say. Half the code thinks that atthasdef means "there is something in pg_attrdef", the other half thinks "column has a DEFAULT expression". Currently, I'm going with the former interpretation, because that is wired in quite deeply and things start to crash if you violate it, but then code that wants to know whether a column has a traditional DEFAULT expression needs to check atthasdef && !attgenerated or something like that. Missing/future functionality: - STORED variant - various ALTER TABLE variants - index support (and related constraint support) These can be added later once the basics are nailed down. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 31 August 2017 at 05:16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Here is another attempt to implement generated columns. This is a > well-known SQL-standard feature, also available for instance in DB2, > MySQL, Oracle. A quick example: > > CREATE TABLE t1 ( > ..., > height_cm numeric, > height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) > ); I only recently discovered we actually already have this feature. Kind of. stark=# CREATE TABLE t1 (height_cm numeric); CREATE TABLE Time: 38.066 ms stark***=# create function height_in(t t1) returns numeric language 'sql' as 'select t.height_cm * 2.54' ; CREATE FUNCTION Time: 1.216 ms stark***=# insert into t1 values (2); INSERT 0 1 Time: 10.170 ms stark***=# select t1.height_cm, t1.height_in from t1; ┌───────────┬───────────┐ │ height_cm │ height_in │ ├───────────┼───────────┤ │ 2 │ 5.08 │ └───────────┴───────────┘ (1 row) Time: 1.997 ms Yours looks better :) -- greg
On 30 August 2017 at 23:16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Here is another attempt to implement generated columns. This is a > well-known SQL-standard feature, also available for instance in DB2, > MySQL, Oracle. > [...] > > In previous discussions, it has often been a source of confusion whether > these generated columns are supposed to be computed on insert/update and > stored, or computed when read. The SQL standard is not explicit, but > appears to lean toward stored. DB2 stores. Oracle computes on read. > MySQL supports both. So I target implementing both. This makes sense: > Both regular views and materialized views have their uses, too. For the > syntax, I use the MySQL/Oracle syntax of appending [VIRTUAL|STORED]. In > this patch, only VIRTUAL is fully implemented. I also have STORED kind > of working, but it wasn't fully baked, so I haven't included it here. > Hi, It applies and compiles without problems, it passes regression tests and it does what it claims to do: During my own tests, though, i found some problems: -- UPDATEing the column, this is at least weird postgres=# update t1 set height_in = 15; ERROR: column "height_in" can only be updated to DEFAULT DETAIL: Column "height_in" is a generated column. postgres=# update t1 set height_in = default; UPDATE 1 -- In a view it doesn't show any value postgres=# create view v1 as select * from t1; CREATE VIEW postgres=# insert into t1(height_cm) values (10); INSERT 0 1 postgres=# select * from t1; id | height_cm | height_in --------+-----------+-----------198000 | 10 | 25.40 (1 row) postgres=# select * from v1; id | height_cm | height_in --------+-----------+-----------198000 | 10 | (1 row) -- In a inherits/partition tree, the default gets malformed postgres=# create table t1_1 () inherits (t1); CREATE TABLE postgres=# \d t1_1 Table "public.t1_1" Column | Type | Collation | Nullable | Default -----------+---------+-----------+----------+--------------------------------id | integer | | not null |nextval('t1_id_seq'::regclass)height_cm | numeric | | |height_in | numeric | | | height_cm* 2.54 Inherits: t1 postgres=# insert into t1_1 values (11); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> \q -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 10 September 2017 at 00:08, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote: > > During my own tests, though, i found some problems: > a few more tests: create table t1 (id serial,height_cm int,height_in int generated always as (height_cm * 10) ) ; """ postgres=# alter table t1 alter height_cm type numeric; ERROR: unexpected object depending on column: table t1 column height_in """ should i drop the column and recreate it after the fact? this seems more annoying than the same problem with views (drop view & recreate), specially after you implement STORED """ postgres=# alter table t1 alter height_in type numeric; ERROR: found unexpected dependency type 'a' """ uh!? also is interesting that in triggers, both before and after, the column has a null. that seems reasonable in a before trigger but not in an after trigger """ create function f_trg1() returns trigger as $$ begin raise notice '%', new.height_in; return new; end $$ language plpgsql; create trigger trg1 before insert on t1 for each row execute procedure f_trg1(); postgres=# insert into t1 values(default, 100); NOTICE: <NULL> INSERT 0 1 create trigger trg2 after insert on t1 for each row execute procedure f_trg1(); postgres=# insert into t1 values(default, 100); NOTICE: <NULL> NOTICE: <NULL> INSERT 0 1 """ the default value shouldn't be dropped. """ postgres=# alter table t1 alter height_in drop default; ALTER TABLE postgres=# \d t1 Table "public.t1" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------------------------------id | integer | |not null | nextval('t1_id_seq'::regclass)height_cm | integer | | |height_in | integer | | |generated always as () """ -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Why is a NULL reasonable for before triggers?On Sep 12, 2017, at 12:35 PM, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote:
also is interesting that in triggers, both before and after, the
column has a null. that seems reasonable in a before trigger but not
in an after trigger
If I create a table with a column with default and I omit that column on INSERT
Is the column value also NULL in the before trigger? (I hope not)
BTW, the original idea behind generated columns was to materialize them.
Reason being to avoid expensive computations of frequently used expressions
(and to support indexing in the absence of indexes with expressions)
You may find the following amusing:
Cheers
Serge Rielau
On 31 August 2017 at 05:16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Here is another attempt to implement generated columns. This is a > well-known SQL-standard feature, also available for instance in DB2, > MySQL, Oracle. A quick example: > > CREATE TABLE t1 ( > ..., > height_cm numeric, > height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) > ); Cool > - pg_dump produces a warning about a dependency loop when dumping these. > Will need to be fixed at some point, but it doesn't prevent anything > from working right now. > > Open design issues: > > - COPY behavior: Currently, generated columns are automatically omitted > if there is no column list, and prohibited if specified explicitly. > When stored generated columns are implemented, they could be copied out. > Some user options might be possible here. If the values are generated immutably there would be no value in including them in a dump. If you did dump them then they couldn't be reloaded without error, so again, no point in dumping them. COPY (SELECT...) already allows you options to include or exclude any columns you wish, so I don't see the need for special handling here. IMHO, COPY TO would exclude generated columns of either kind, ensuring that the reload would just work. > - Catalog storage: I store the generation expression in pg_attrdef, like > a default. For the most part, this works well. It is not clear, > however, what pg_attribute.atthasdef should say. Half the code thinks > that atthasdef means "there is something in pg_attrdef", the other half > thinks "column has a DEFAULT expression". Currently, I'm going with the > former interpretation, because that is wired in quite deeply and things > start to crash if you violate it, but then code that wants to know > whether a column has a traditional DEFAULT expression needs to check > atthasdef && !attgenerated or something like that. > > Missing/future functionality: > > - STORED variant For me, this option would be the main feature. Presumably if STORED then we wouldn't need the functions to be immutable, making it easier to have columns like last_update_timestamp or last_update_username etc.. I think an option to decide whether the default is STORED or VIRTUAL would be useful. > - various ALTER TABLE variants Adding a column with GENERATED STORED would always be a full table rewrite. Hmm, I wonder if its worth having a mixed mode: stored for new rows, only virtual for existing rows; that way we could add GENERATED columns easily. > - index support (and related constraint support) Presumably you can't index a VIRTUAL column. Or at least I don't think its worth spending time trying to make it work. > These can be added later once the basics are nailed down. I imagine that if a column is generated then it is not possible to have column level INSERT | UPDATE | DELETE privs on it. The generation happens automatically as part of the write action if stored, or not until select for virtual. It should be possible to have column level SELECT privs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 09/13/2017 04:04 AM, Simon Riggs wrote: > On 31 August 2017 at 05:16, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> - index support (and related constraint support) > > Presumably you can't index a VIRTUAL column. Or at least I don't think > its worth spending time trying to make it work. I think end users would be surprised if one can index STORED columns and expressions but not VIRTUAL columns. So unless it is a huge project I would say it is worth it. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 13 September 2017 at 09:09, Andreas Karlsson <andreas@proxel.se> wrote: > On 09/13/2017 04:04 AM, Simon Riggs wrote: >> >> On 31 August 2017 at 05:16, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> >>> - index support (and related constraint support) >> >> >> Presumably you can't index a VIRTUAL column. Or at least I don't think >> its worth spending time trying to make it work. > > > I think end users would be surprised if one can index STORED columns and > expressions but not VIRTUAL columns. So unless it is a huge project I would > say it is worth it. It must be stored in the index certainly. I guess virtual is similar to expression indexes then. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 12, 2017 at 10:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I think an option to decide whether the default is STORED or VIRTUAL > would be useful. That seems like it could be a bit of a foot-gun. For example, an extension author who uses generated columns will have to be careful to always specify one or the other, because they don't know what the default will be on the system where it's deployed. Similarly for an author of a portable application. I think it'll create fewer headaches if we just pick a default and stick with it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 13, 2017 at 10:09:37AM +0200, Andreas Karlsson wrote: > On 09/13/2017 04:04 AM, Simon Riggs wrote: > >On 31 August 2017 at 05:16, Peter Eisentraut > ><peter.eisentraut@2ndquadrant.com> wrote: > >>- index support (and related constraint support) > > > >Presumably you can't index a VIRTUAL column. Or at least I don't > >think its worth spending time trying to make it work. > > I think end users would be surprised if one can index STORED columns > and expressions but not VIRTUAL columns. So unless it is a huge > project I would say it is worth it. So long as the expression on the normal columns was immutable, it's fit for an expressional index, as is any immutable function composed with it. What am I missing? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 12 Sep 2017, at 21:35, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote: > > On 10 September 2017 at 00:08, Jaime Casanova > <jaime.casanova@2ndquadrant.com> wrote: >> >> During my own tests, though, i found some problems: > > a few more tests: > > create table t1 ( > id serial, > height_cm int, > height_in int generated always as (height_cm * 10) > ) ; > > > """ > postgres=# alter table t1 alter height_cm type numeric; > ERROR: unexpected object depending on column: table t1 column height_in > """ > should i drop the column and recreate it after the fact? this seems > more annoying than the same problem with views (drop view & recreate), > specially after you implement STORED > > > """ > postgres=# alter table t1 alter height_in type numeric; > ERROR: found unexpected dependency type 'a' > """ > uh!? > > > also is interesting that in triggers, both before and after, the > column has a null. that seems reasonable in a before trigger but not > in an after trigger > """ > create function f_trg1() returns trigger as $$ > begin > raise notice '%', new.height_in; > return new; > end > $$ language plpgsql; > > create trigger trg1 before insert on t1 > for each row execute procedure f_trg1(); > > postgres=# insert into t1 values(default, 100); > NOTICE: <NULL> > INSERT 0 1 > > create trigger trg2 after insert on t1 > for each row execute procedure f_trg1(); > > postgres=# insert into t1 values(default, 100); > NOTICE: <NULL> > NOTICE: <NULL> > INSERT 0 1 > """ > > the default value shouldn't be dropped. > """ > postgres=# alter table t1 alter height_in drop default; > ALTER TABLE > postgres=# \d t1 > Table "public.t1" > Column | Type | Collation | Nullable | Default > ----------------+---------+-----------+----------+-------------------------------- > id | integer | | not null | > nextval('t1_id_seq'::regclass) > height_cm | integer | | | > height_in | integer | | | generated always as () > “"" Based on this review, and the errors noted in upthread in the previous review, I’m marking this Returned with feedback. When an updated version of the patch is ready, please re-submit it to an upcoming commitfest. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 31, 2017 at 12:16:43AM -0400, Peter Eisentraut wrote: > In previous discussions, it has often been a source of confusion whether > these generated columns are supposed to be computed on insert/update and > stored, or computed when read. The SQL standard is not explicit, but > appears to lean toward stored. DB2 stores. Oracle computes on read. Question: How would one know the difference between storing computed columns vs. computing them on read? Answer?: Performance. If the computation is slow, then you'll really notice on read. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I know that for my use-cases, having both options available would be very appreciated. The vast majority of the computed columns I would use in my database would be okay to compute on read. But there are for sure some which would be performance prohibitive to have compute on read, so i'd rather have those stored.
So for me, i'd rather default to compute on read, as long storing the pre-computed value is an option when necessary.
Just my $0.02
Thanks,
-Adam
On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote: > I know that for my use-cases, having both options available would be very > appreciated. The vast majority of the computed columns I would use in my > database would be okay to compute on read. But there are for sure some > which would be performance prohibitive to have compute on read, so i'd > rather have those stored. > > So for me, i'd rather default to compute on read, as long storing the > pre-computed value is an option when necessary. Sure, I agree. I was just wondering whether there might be any other difference besides performance characteristics. The answer to that is, I think, "no". -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Nico Williams <nico@cryptonector.com> writes: > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote: >> So for me, i'd rather default to compute on read, as long storing the >> pre-computed value is an option when necessary. > Sure, I agree. I was just wondering whether there might be any other > difference besides performance characteristics. The answer to that is, > I think, "no". What about non-immutable functions in the generation expression? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote: > Nico Williams <nico@cryptonector.com> writes: > > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote: > >> So for me, i'd rather default to compute on read, as long storing the > >> pre-computed value is an option when necessary. > > > Sure, I agree. I was just wondering whether there might be any other > > difference besides performance characteristics. The answer to that is, > > I think, "no". > > What about non-immutable functions in the generation expression? Assuming they're permitted, which...well, I could make a case, they should be mutually exclusive with the cached option. I guess documenting the behavior in the manual would suffice, tempting as it would be to include a NOTICE when the table goes from having 0 or more generated columns all of which are immutable to having at least one that's not. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote: > Nico Williams <nico@cryptonector.com> writes: > > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote: > >> So for me, i'd rather default to compute on read, as long storing the > >> pre-computed value is an option when necessary. > > > Sure, I agree. I was just wondering whether there might be any other > > difference besides performance characteristics. The answer to that is, > > I think, "no". > > What about non-immutable functions in the generation expression? Aha, thanks! Yes, that would be noticeable. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
So yes, distinguishing stored vs. not stored computed columns is useful, especially if the expression can refer to other columns of the same row, though not only then. Examples: -- useful only if stored (assuming these never get updated) inserted_at TIMESTAMP WITHOUT TIME ZONE AS (clock_timestamp()) -- useful only if stored uuid uuid AS (uuid_generate_v4()) -- useful only if stored who_done_it TEXT (current_user) -- useful especially if not stored user_at_host TEXT (user || '@' || host) -- useful if stored original_user_at_host TEXT (user || '@' || host) I assume once set, a stored computed column cannot be updated, though maybe being able to allow this would be ok. Obviously all of this can be done with triggers and VIEWs... The most useful case is where a computed column is NOT stored, because it saves you having to have a table and a view, while support for the stored case merely saves you having to have triggers. Of course, triggers for computing columns are rather verbose, so not having to write those would be convenient. Similarly with RLS. RLS is not strictly necessary since VIEWs and TRIGGERs allow one to accomplish much the same results, but it's a lot of work to get that right, while RLS makes most policies very pithy. (RLS for *update* policies, however, still can't refer to NEW and OLD, so one still has to resort to triggers for updates in many cases). Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
There are some unanswered questions with column grants too. Do we allow granting access to a calculated column which accesses columns the user doesn't have access to? If so then this is a suitable substitute for using updateable views to handle things like granting users access to things like password hashes or personal data with details censored without giving them access to the unhashed password or full personal info. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 9/12/17 15:35, Jaime Casanova wrote: > On 10 September 2017 at 00:08, Jaime Casanova > <jaime.casanova@2ndquadrant.com> wrote: >> >> During my own tests, though, i found some problems: Here is an updated patch that should address the problems you have found. > also is interesting that in triggers, both before and after, the > column has a null. that seems reasonable in a before trigger but not > in an after trigger Logically, you are correct. But it seems excessive to compute all virtual columns for every trigger. I don't know how to consolidate that, especially with the current trigger API that lets you look more or less directly into the tuple. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 12/27/2017 09:31 AM, Peter Eisentraut wrote: > On 9/12/17 15:35, Jaime Casanova wrote: >> On 10 September 2017 at 00:08, Jaime Casanova >> <jaime.casanova@2ndquadrant.com> wrote: >>> >>> During my own tests, though, i found some problems: > > Here is an updated patch that should address the problems you have found. In the commit message it says: "The plan to is implement two kinds of generated columns: virtual (computed on read) and stored (computed on write). This patch only implements the virtual kind, leaving stubs to implement the stored kind later." and in the patch itself: +<para> + The generation expression can refer to other columns in the table, but + not other generated columns. Any functions and operators used must be + immutable. References to other tables are not allowed. +</para> Question -- when the "stored" kind of generated column is implemented, will the immutable restriction be relaxed? I would like, for example, be able to have a stored generated column that executes now() whenever the row is written/rewritten. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
On 12/30/17 16:04, Joe Conway wrote: > +<para> > + The generation expression can refer to other columns in the table, but > + not other generated columns. Any functions and operators used must be > + immutable. References to other tables are not allowed. > +</para> > > Question -- when the "stored" kind of generated column is implemented, > will the immutable restriction be relaxed? I would like, for example, be > able to have a stored generated column that executes now() whenever the > row is written/rewritten. That restriction is from the SQL standard, and I think it will stay. The virtual vs. stored choice is an optimization, but not meant to affect semantics. For example, you might want to automatically substitute a precomputed generated column into an expression, but that will become complicated and confusing if the expression is not deterministic. Another problem with your example is that a stored generated column would only be updated if a column it depends on is updated. So a column whose generation expression is just now() would never get updated. Maybe some of this could be relaxed at some point, but we would have to think it through carefully. For now, a trigger would still be the best implementation for your use case, I think. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/31/2017 09:38 AM, Peter Eisentraut wrote: > On 12/30/17 16:04, Joe Conway wrote: >> +<para> >> + The generation expression can refer to other columns in the table, but >> + not other generated columns. Any functions and operators used must be >> + immutable. References to other tables are not allowed. >> +</para> >> >> Question -- when the "stored" kind of generated column is implemented, >> will the immutable restriction be relaxed? I would like, for example, be >> able to have a stored generated column that executes now() whenever the >> row is written/rewritten. <snip> > Maybe some of this could be relaxed at some point, but we would have to > think it through carefully. For now, a trigger would still be the best > implementation for your use case, I think. Sure, but generated column behavior in general can be implemented via trigger. Anyway, I have seen requests for change data capture (https://en.wikipedia.org/wiki/Change_data_capture) in Postgres which is apparently available in our competition without requiring the use of triggers. Perhaps that is yet a different feature, but I was hopeful that this mechanism could be used to achieve it. -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
On 12/31/17 12:54, Joe Conway wrote: > Anyway, I have seen requests for change data capture > (https://en.wikipedia.org/wiki/Change_data_capture) in Postgres which is > apparently available in our competition without requiring the use of > triggers. Perhaps that is yet a different feature, but I was hopeful > that this mechanism could be used to achieve it. I think logical decoding provides CDC. The generated columns feature doesn't have much to do with that, in my mind. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Dec 27, 2017 at 12:31:05PM -0500, Peter Eisentraut wrote: > On 9/12/17 15:35, Jaime Casanova wrote: >> On 10 September 2017 at 00:08, Jaime Casanova >> <jaime.casanova@2ndquadrant.com> wrote: >>> >>> During my own tests, though, i found some problems: > > Here is an updated patch that should address the problems you have > found. Could you rebase the patch? I am planning to review it but there are conflicts in genbki.pl. -- Michael
Attachment
On 1/16/18 01:35, Michael Paquier wrote: > On Wed, Dec 27, 2017 at 12:31:05PM -0500, Peter Eisentraut wrote: >> On 9/12/17 15:35, Jaime Casanova wrote: >>> On 10 September 2017 at 00:08, Jaime Casanova >>> <jaime.casanova@2ndquadrant.com> wrote: >>>> >>>> During my own tests, though, i found some problems: >> >> Here is an updated patch that should address the problems you have >> found. > > Could you rebase the patch? I am planning to review it but there are > conflicts in genbki.pl. Here you go. Those changes actually meant that genbki.pl doesn't need to be touched by this patch at all, so that's a small win. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Jan 16, 2018 at 09:55:16AM -0500, Peter Eisentraut wrote: > Here you go. Those changes actually meant that genbki.pl doesn't need > to be touched by this patch at all, so that's a small win. Thanks for the updated version. I have spent some time looking at what you are proposing here. Instead of leaving bits for a feature that may or may not be implemented, have you considered just blocking STORED at parsing level and remove those bits? There is little point in keeping the equivalent of dead code in the tree. So I would suggest a patch simplification: - Drop the VIRTUAL/STORED parsing from the grammar for now. - Define VIRTUAL as the default for the future. This way, if support for STORED is added in the future the grammar can just be extended. This is actually implied in pg_dump.c. And +1 for the catalog format you are proposing. =# CREATE TABLE gen_1 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL); CREATE TABLE =# insert into gen_1 values (2000000000); INSERT 0 1 =# select * from gen_1 ; ERROR: 22003: integer out of range Overflow checks do not happen when inserting, which makes the following SELECT to fail. This could be confusing for the user because the row has been inserted. Perhaps some evaluation of virtual tuples at insert phase should happen. The existing behavior makes sense as well as virtual values are only evaluated when read, so a test would be at least welcome. Does the SQL spec mention the matter? How do other systems handle such cases? CHECK constraints can be combined, still.. The last patch crashes for typed tables: =# CREATE TYPE itest_type AS (f1 integer, f2 text, f3 bigint); CREATE TYPE =# CREATE TABLE itest12 OF itest_type (f1 WITH OPTIONS GENERATED ALWAYS AS (f2 *2)); [... boom ...] And for partitions: =# CREATE TABLE itest_parent (f1 date NOT NULL, f2 text, f3 bigint) PARTITION BY RANGE (f1); CREATE TABLE =# CREATE TABLE itest_child PARTITION OF itest_parent (f3 WITH OPTIONS GENERATED ALWAYS AS (f3)) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01'); [... boom ...] Like what we did in 005ac298, I would suggest to throw ERRCODE_FEATURE_NOT_SUPPORTED. Please also add some tests. +SELECT a, c FROM gtest12; -- FIXME: should be allowed +ERROR: permission denied for function gf1 [...] +ALTER TABLE gtest27 ALTER COLUMN b DROP DEFAULT; -- FIXME +ERROR: column "b" of relation "gtest27" is a generated column Any fixes for those? + if (get_attgenerated(relationId, attno)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("index creation on generated columns is not supported"))); Shouldn't such messages mention explicitely "virtually"-generated columns? For stored columns the support would not be complicated in this case. =# create table ac (a int, b text generated always as (substr(a::text, 0, 3))); CREATE TABLE =# alter table ac alter COLUMN a type text; ERROR: 42601: cannot alter type of a column used by a generated column DETAIL: Column "a" is used by generated column "b". In this case ALTER TABLE could have worked. No complain from me to disable that as a first step though. +UPDATE gtest1 SET b = DEFAULT WHERE a = 1; +UPDATE gtest1 SET b = 11 WHERE a = 1; -- error +ERROR: column "b" can only be updated to DEFAULT +DETAIL: Column "b" is a generated column. I see... This is consistent with the behavior of INSERT where DEFAULT can be used and the INSERT succeeds. +-- domains +CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10); +CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED ALWAYS AS (a * 2)); -- prohibited +ERROR: virtual generated column "b" cannot have a domain type CHECK constraints can be used, so I find this restriction confusing. No test coverage for DELETE triggers? +UPDATE gtest26 SET a = a * -2; +INFO: gtest1: old = (-2,) +INFO: gtest1: new = (4,) +INFO: gtest3: old = (-2,) +INFO: gtest3: new = (4,) +INFO: gtest4: old = (3,) +INFO: gtest4: new = (-6,) OLD and NEW values for generated columns don't show up. Am I missing something or they should be available? @@ -15526,13 +15553,6 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) - if (has_default) - appendPQExpBuffer(q, " DEFAULT %s", - tbinfo->attrdefs[j]->adef_expr); - - if (has_notnull) - appendPQExpBufferStr(q, " NOT NULL"); Why does the ordering of actions need to be changed here? [nit_mode] + if (attgenerated) + /* + * Generated column: Dropping anything that the generation expression + * refers to automatically drops the generated column. + */ + recordDependencyOnSingleRelExpr(&colobject, expr, RelationGetRelid(rel), + DEPENDENCY_AUTO, + DEPENDENCY_AUTO, false); Please use brackers here if you use a comment in the if() block... [/nit_mode] COPY as you are proposing looks sensible to me. I am not sure about any options though as it is possible to enforce the selection of generated columns using COPY (SELECT). Per my tests, generated columns can work with column system attributes (xmax, xmin, etc.). Some tests could be added. + /* + * Generated columns don't use the attnotnull field but use a full CHECK + * constraint instead. We could implement here that it finds that CHECK + * constraint and drops it, which is kind of what the SQL standard would + * require anyway, but that would be quite a bit more work. + */ + if (((Form_pg_attribute) GETSTRUCT(tuple))->attgenerated) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot use DROP NOT NULL on generated column \"%s\"", + colName))); And the point of storing NOT NULL constraints as CHECK constraints shows up again... :( It would be nice to mention as well at the top of ATExecSetNotNull() that a full-fledge switch could help generated columns as well. - if (tab->relkind == RELKIND_RELATION || - tab->relkind == RELKIND_PARTITIONED_TABLE) + if ((tab->relkind == RELKIND_RELATION || + tab->relkind == RELKIND_PARTITIONED_TABLE) && + get_attgenerated(RelationGetRelid(rel), attnum) != ATTRIBUTE_GENERATE I think that you should store the result of get_attgenerated() and reuse it multiple times. -- Michael
Attachment
On 1/19/18 00:18, Michael Paquier wrote: > Instead of leaving bits for a feature that may or may not be > implemented, have you considered just blocking STORED at parsing level > and remove those bits? There is little point in keeping the equivalent > of dead code in the tree. So I would suggest a patch simplification: > - Drop the VIRTUAL/STORED parsing from the grammar for now. > - Define VIRTUAL as the default for the future. fixed > =# CREATE TABLE gen_1 (a int, b int GENERATED ALWAYS AS (a * 2) > VIRTUAL); > CREATE TABLE > =# insert into gen_1 values (2000000000); > INSERT 0 1 > =# select * from gen_1 ; > ERROR: 22003: integer out of range > Overflow checks do not happen when inserting, which makes the following > SELECT to fail. This could be confusing for the user because the row has > been inserted. Perhaps some evaluation of virtual tuples at insert phase > should happen. The existing behavior makes sense as well as virtual > values are only evaluated when read, so a test would be at least > welcome. added test > Does the SQL spec mention the matter? How do other systems > handle such cases? In Oracle you get the same overflow error. > CHECK constraints can be combined, still.. I think you can compare this to a view. A view can produce overflow errors on read. But a CHECK constraint is more like a CHECK option on a view that checks as values are put into the view. > The last patch crashes for typed tables: > =# CREATE TYPE itest_type AS (f1 integer, f2 text, f3 bigint); > CREATE TYPE > =# CREATE TABLE itest12 OF itest_type (f1 WITH OPTIONS GENERATED ALWAYS > AS (f2 *2)); > [... boom ...] > And for partitions: > =# CREATE TABLE itest_parent (f1 date NOT NULL, f2 text, f3 bigint) > PARTITION BY RANGE (f1); > CREATE TABLE > =# CREATE TABLE itest_child PARTITION OF itest_parent (f3 WITH OPTIONS > GENERATED ALWAYS AS (f3)) FOR VALUES FROM ('2016-07-01') TO > ('2016-08-01'); > [... boom ...] > Like what we did in 005ac298, I would suggest to throw > ERRCODE_FEATURE_NOT_SUPPORTED. Please also add some tests. done > +SELECT a, c FROM gtest12; -- FIXME: should be allowed > +ERROR: permission denied for function gf1 This is quite hard to fix and I would like to leave this for a future release. > +ALTER TABLE gtest27 ALTER COLUMN b DROP DEFAULT; -- FIXME > +ERROR: column "b" of relation "gtest27" is a generated column That FIXME seems to have been a mistake. I have removed it. > + if (get_attgenerated(relationId, attno)) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("index creation on generated columns is not supported"))); > Shouldn't such messages mention explicitely "virtually"-generated > columns? For stored columns the support would not be complicated in this > case. done > +-- domains > +CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10); > +CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED > ALWAYS AS (a * 2)); -- prohibited > +ERROR: virtual generated column "b" cannot have a domain type > CHECK constraints can be used, so I find this restriction confusing. We currently don't have infrastructure to support this. We run all CHECK constraints whenever a row is changed, so that is easy. But we don't have facilities to recheck the domain constraint in column b when column a is updated. This could be done, but it's extra work. > No test coverage for DELETE triggers? done > +UPDATE gtest26 SET a = a * -2; > +INFO: gtest1: old = (-2,) > +INFO: gtest1: new = (4,) > +INFO: gtest3: old = (-2,) > +INFO: gtest3: new = (4,) > +INFO: gtest4: old = (3,) > +INFO: gtest4: new = (-6,) > OLD and NEW values for generated columns don't show up. Am I missing > something or they should be available? This was already discussed a few times in the thread. I don't know what a good solution is. I have in this patch added facilties to handle this better in other PLs. So the virtual generated column doesn't show up there in the trigger data. This is possible because we copy the trigger data from the internal structures into language-specific hashes/dictionaries/etc. In PL/pgSQL, this is a bit more difficult, because we handle the table's row type there. We can't just "hide" the generated column when looking at the row type. Actually, we could do it quite easily, but that would probably raise other weirdnesses. This also raises a question how a row type with generated columns should behave. I think a generation expression is a property of a table, so it does not apply in a row type. (Just like a default is a property of a table and does not apply in row types.) > Please use brackers here if you use a comment in the if() block... > [/nit_mode] done > COPY as you are proposing looks sensible to me. I am not sure about any > options though as it is possible to enforce the selection of generated > columns using COPY (SELECT). removed that comment > Per my tests, generated columns can work with column system attributes > (xmax, xmin, etc.). Some tests could be added. Hard to test that, because the results would be nondeterministic. > - if (tab->relkind == RELKIND_RELATION || > - tab->relkind == RELKIND_PARTITIONED_TABLE) > + if ((tab->relkind == RELKIND_RELATION || > + tab->relkind == RELKIND_PARTITIONED_TABLE) && > + get_attgenerated(RelationGetRelid(rel), attnum) != ATTRIBUTE_GENERATE > I think that you should store the result of get_attgenerated() and reuse > it multiple times. I don't see where that would apply. I think the hunks you are seeing belong to different functions. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Jan 25, 2018 at 10:26 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >> Does the SQL spec mention the matter? How do other systems >> handle such cases? > > In Oracle you get the same overflow error. That seems awful. If a user says "SELECT * FROM tab" and it fails, how are they supposed to recover, or even understand what the problem is? I think we should really try to at least generate an errcontext here: ERROR: integer out of range CONTEXT: while generating virtual column "b" And maybe a hint, too, like "try excluding this column". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/26/18 12:46, Robert Haas wrote: > On Thu, Jan 25, 2018 at 10:26 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >>> Does the SQL spec mention the matter? How do other systems >>> handle such cases? >> >> In Oracle you get the same overflow error. > > That seems awful. If a user says "SELECT * FROM tab" and it fails, > how are they supposed to recover, or even understand what the problem > is? I think we should really try to at least generate an errcontext > here: > > ERROR: integer out of range > CONTEXT: while generating virtual column "b" > > And maybe a hint, too, like "try excluding this column". This is expanded in the rewriter, so there is no context like that. This is exactly how views work, e.g., create table t1 (id int, length int); create view v1 as select id, length * 1000000000 as length_in_nanometers from t1; insert into t1 values (1, 5); select * from v1; ERROR: integer out of range I think this is not a problem in practice. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 27, 2018 at 05:05:14PM -0500, Peter Eisentraut wrote: > This is expanded in the rewriter, so there is no context like that. > This is exactly how views work, e.g., > > create table t1 (id int, length int); > create view v1 as select id, length * 1000000000 as length_in_nanometers > from t1; > insert into t1 values (1, 5); > select * from v1; > ERROR: integer out of range > > I think this is not a problem in practice. Yeah, I tend to have the same opinion while doing a second pass on the patch proposed on this thread. You could more context when using STORED columns, but for VIRTUAL that does not make such sense as the handling of values is close to views. That's the same handling for materialized views as well, you don't get any context when facing an overflow when either creating the matview or refreshing it. -- Michael
Attachment
On Thu, Jan 25, 2018 at 10:26:55PM -0500, Peter Eisentraut wrote: > On 1/19/18 00:18, Michael Paquier wrote: >> +SELECT a, c FROM gtest12; -- FIXME: should be allowed >> +ERROR: permission denied for function gf1 > > This is quite hard to fix and I would like to leave this for a future > release. I have been looking at that case more closely again, and on the contrary I would advocate that your patch is doing the *right* thing. In short, if the generation expression uses a function and the user has only been granted access to read the values, it seems to me that it we should require that this user also has the right to execute the function. Would that be too user-unfriendly? I think that this could avoid mistakes about giving access to unwanted functions when willing to just give a SELECT right as the function could be doing more operations. >> +-- domains >> +CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10); >> +CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED >> ALWAYS AS (a * 2)); -- prohibited >> +ERROR: virtual generated column "b" cannot have a domain type >> CHECK constraints can be used, so I find this restriction confusing. > > We currently don't have infrastructure to support this. We run all > CHECK constraints whenever a row is changed, so that is easy. But we > don't have facilities to recheck the domain constraint in column b when > column a is updated. This could be done, but it's extra work. Okay, let's leave with this restriction for now. >> OLD and NEW values for generated columns don't show up. Am I missing >> something or they should be available? > > This was already discussed a few times in the thread. I don't know what > a good solution is. > > I have in this patch added facilties to handle this better in other PLs. > So the virtual generated column doesn't show up there in the trigger > data. This is possible because we copy the trigger data from the > internal structures into language-specific hashes/dictionaries/etc. > > In PL/pgSQL, this is a bit more difficult, because we handle the table's > row type there. We can't just "hide" the generated column when looking > at the row type. Actually, we could do it quite easily, but that would > probably raise other weirdnesses. > > This also raises a question how a row type with generated columns should > behave. I think a generation expression is a property of a table, so it > does not apply in a row type. (Just like a default is a property of a > table and does not apply in row types.) Hm. Identity columns and default columns are part of rowtypes. STORED columns can alsobe part of a row type with little effort, so in order to be consistent with the all the existing behaviors, it seems to me that virtually-generated columns should be part of it as well. I have compiled in the attached SQL file how things behave with your patch. Only virtually-generated columns show a blank value. A empty value is unhelpful for the user, which brings a couple of possible approaches: 1) Make virtual columns part of a row type, which would make it consistent with all the other types. 2) For plpgsql, if all rows from OLD or NEW are requested, then print all columns except the ones virtually-generated. If a virtual column is directly requested, then issue an error. I would warmly welcome 1) as this would make all behaviors consistent with the other PLs and other types of generated columns. I would imagine that users would find weird the current inconsistency as well. >> Per my tests, generated columns can work with column system attributes >> (xmax, xmin, etc.). Some tests could be added. > > Hard to test that, because the results would be nondeterministic. tableoid can be deterministic if compared with data from pg_class: =# CREATE TABLE aa (a int PRIMARY KEY, b int GENERATED ALWAYS AS (tableoid)); CREATE TABLE =# INSERT INTO aa VALUES (1); INSERT =# SELECT aa.a from pg_class c, aa where c.oid = b; a --- 1 (1 row) The point is just to stress code paths where the attribute number is negative. >> - if (tab->relkind == RELKIND_RELATION || >> - tab->relkind == RELKIND_PARTITIONED_TABLE) >> + if ((tab->relkind == RELKIND_RELATION || >> + tab->relkind == RELKIND_PARTITIONED_TABLE) && >> + get_attgenerated(RelationGetRelid(rel), attnum) != ATTRIBUTE_GENERATE >> I think that you should store the result of get_attgenerated() and reuse >> it multiple times. > > I don't see where that would apply. I think the hunks you are seeing > belong to different functions. Looks like I messed up ATPrepAlterColumnType() and ATExecColumnDefault() while reading the previous version. Sorry for the useless noise. + /* + * Foreign keys on generated columns are not yet implemented. + */ + for (i = 0; i < numpks; i++) + { + if (get_attgenerated(RelationGetRelid(pkrel), pkattnum[i])) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("foreign key constraints referencing generated columns are not supported"))); + } It would be nice to test this code path. The commit fest is ending, perhaps this should be moved to the next one? The handling of row types for virtual columns is a must-fix in my opinion. -- Michael
Attachment
On Wed, Jan 31, 2018 at 10:18:04PM +0900, Michael Paquier wrote: > On Thu, Jan 25, 2018 at 10:26:55PM -0500, Peter Eisentraut wrote: >> On 1/19/18 00:18, Michael Paquier wrote: >>> +SELECT a, c FROM gtest12; -- FIXME: should be allowed >>> +ERROR: permission denied for function gf1 >> >> This is quite hard to fix and I would like to leave this for a future >> release. > > I have been looking at that case more closely again, and on the contrary > I would advocate that your patch is doing the *right* thing. In short, > if the generation expression uses a function and the user has only been > granted access to read the values, it seems to me that it we should > require that this user also has the right to execute the function. Would > that be too user-unfriendly? I think that this could avoid mistakes > about giving access to unwanted functions when willing to just give a > SELECT right as the function could be doing more operations. Attached is the SQL file I used with test cases for the review. I forgot to attach it yesterday. > Hm. Identity columns and default columns are part of rowtypes. STORED > columns can alsobe part of a row type with little effort, so in order to > be consistent with the all the existing behaviors, it seems to me that > virtually-generated columns should be part of it as well. I have > compiled in the attached SQL file how things behave with your > patch. Only virtually-generated columns show a blank value. The tests used are attached. -- Michael
Attachment
On 1/31/18 08:18, Michael Paquier wrote: >> This also raises a question how a row type with generated columns should >> behave. I think a generation expression is a property of a table, so it >> does not apply in a row type. (Just like a default is a property of a >> table and does not apply in row types.) > > Hm. Identity columns and default columns are part of rowtypes. STORED > columns can alsobe part of a row type with little effort, so in order to > be consistent with the all the existing behaviors, it seems to me that > virtually-generated columns should be part of it as well. I have > compiled in the attached SQL file how things behave with your > patch. Only virtually-generated columns show a blank value. > > A empty value is unhelpful for the user, which brings a couple of > possible approaches: > 1) Make virtual columns part of a row type, which would make it > consistent with all the other types. That would be nice. I'm going to study this some more to see what can be done. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote: > That would be nice. I'm going to study this some more to see what can > be done. Thanks for the update. Note: Peter has moved the patch to next CF. -- Michael
Attachment
On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote: > That would be nice. I'm going to study this some more to see what can > be done. By the way, cannot we consider just doing stored generated columns as a first cut? Both virtual and stored columns have their use cases, but stored values have less complication and support actually a larger set of features, including rowtypes, index and constraint support. So it seems to me that if something goes into v11 then stored columns would be a better choice at this stage of the development cycle. Other DBMSs support stored values by default as well, and your v1 patch had a large portion of the work done if I recall correctly. -- Michael
Attachment
On 2/1/18 21:25, Michael Paquier wrote: > On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote: >> That would be nice. I'm going to study this some more to see what can >> be done. > > Thanks for the update. Note: Peter has moved the patch to next CF. I didn't get to updating this patch, so I'm closing it in this CF. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/03/2018 20:46, Peter Eisentraut wrote: > On 2/1/18 21:25, Michael Paquier wrote: >> On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote: >>> That would be nice. I'm going to study this some more to see what can >>> be done. >> >> Thanks for the update. Note: Peter has moved the patch to next CF. > > I didn't get to updating this patch, so I'm closing it in this CF. Attached is a new version of this patch. Old news: This is a well-known SQL-standard feature, also available for instance in DB2, MySQL, Oracle. A quick example: CREATE TABLE t1 ( ..., height_cm numeric, height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) ); It supports both computed-on-write and computed-on-read variants, using the keywords STORED and VIRTUAL respectively. New news: Everything works more or less. Both STORED and VIRTUAL fully work now. I've done some refactoring to reduce the surface area of the patch. I also added some caching in the tuple descriptor's "const" area to avoid some performance overhead if no generated columns are used. There are some open questions about which I'll start separate subthreads for discussion. One thing I'd like reviewed now is the catalog representation. There are a couple of possible options, but changing them would have fairly deep code impact so it would help to get that settled soon. The general idea is that a generation expression is similar to a default, just applied at different times. So the actual generation expression is stored in pg_attrdef. The actual question is the representation in pg_attribute. Options: 1. (current implementation) New column attgenerated contains 's' for STORED, 'v' for VIRTUAL, '\0' for nothing. atthasdef means "there is something in pg_attrdef for this column". So a generated column would have atthasdef = true, and attgenerated = s/v. A traditional default would have atthasdef = true and attgenerated = '\0'. The advantage is that this is easiest to implement and the internal representation is the most useful and straightforward. The disadvantage is that old client code that wants to detect whether a column has a default would need to be changed (otherwise it would interpret a generated column as having a default value instead). 2. Alternative: A generated column has attgenerated = s/v but atthasdef = false, so that atthasdef means specifically "column has a default". Then a column would have a pg_attrdef entry for either attgenerated != '\0' or atthasdef = true. (Both couldn't be the case at the same time.) The advantage is that client code wouldn't need to be changed. But it's also possible that there is client code that just does a left join of pg_attribute and pg_attrdef without looking at atthasdef, so that would still be broken. The disadvantage is that the internal implementation would get considerably ugly. Most notably, the tuple descriptor would probably still look like #1, so there would have to be a conversion somewhere between variant #1 and #2. Or we'd have to duplicate all the tuple descriptor access code to keep that separate. There would be a lot of redundancy. 3. Radical alternative: Collapse everything into one new column. We could combine atthasdef and attgenerated and even attidentity into a new column. (Only one of the three can be the case.) This would give client code a clean break, which may or may not be good. The implementation would be uglier than #1 but probably cleaner than #2. We could also get 4 bytes back per pg_attribute row. I'm happy with the current choice #1, but it's worth thinking about. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, 27 Dec 2017 at 17:31, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
--
On 9/12/17 15:35, Jaime Casanova wrote:
> On 10 September 2017 at 00:08, Jaime Casanova
> <jaime.casanova@2ndquadrant.com> wrote:
>>
>> During my own tests, though, i found some problems:
Here is an updated patch that should address the problems you have found.
> also is interesting that in triggers, both before and after, the
> column has a null. that seems reasonable in a before trigger but not
> in an after trigger
Logically, you are correct. But it seems excessive to compute all
virtual columns for every trigger. I don't know how to consolidate
that, especially with the current trigger API that lets
you look more or less directly into the tuple.
I wasn't sure where this thought about after triggers ended up.
Presumably stored values can just be read from storage, so no overhead in after triggers?
Having the stored values show as NULL would be OK for virtual ones. But what do we do if the column is NOT NULL? Do we still have nulls then?
It would be useful to have a way to generate the values, if desired. Not sure how hard that is.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-10-30 09:35, Peter Eisentraut wrote: > [v5-0001-Generated-columns.patch ] Hi, I couldn't get this to apply to current head. I tried: patch --dry-run --ignore-whitespace -p 0 -F 5 < v5-0001-Generated-columns.patch and varied both -p and -F paramaters to no avail. Am I doing it wrong? ------- 8< ------- $ patch --ignore-whitespace -p 0 -F 5 < v5-0001-Generated-columns.patch (Stripping trailing CRs from patch; use --binary to disable.) can't find file to patch at input line 81 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- |From dae07c731d80021bf78b8d89a8eb14408dbd023a Mon Sep 17 00:00:00 2001 |From: Peter Eisentraut <peter_e@gmx.net> |Date: Mon, 29 Oct 2018 17:46:12 +0100 |Subject: [PATCH v5] Generated columns [...] | src/test/regress/parallel_schedule | 2 +- | src/test/regress/serial_schedule | 1 + | src/test/regress/sql/create_table_like.sql | 14 + | src/test/regress/sql/generated.sql | 408 ++++++++++ | 60 files changed, 2731 insertions(+), 92 deletions(-) | create mode 100644 src/test/regress/expected/generated.out | create mode 100644 src/test/regress/sql/generated.sql | |diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml |index 9edba96fab..567913c3b6 100644 |--- a/doc/src/sgml/catalogs.sgml |+++ b/doc/src/sgml/catalogs.sgml -------------------------- File to patch: ------- 8< ------- Thanks, Erik Rijkes
Hi > patch --dry-run --ignore-whitespace -p 0 -F 5 < > v5-0001-Generated-columns.patch > > and varied both -p and -F paramaters to no avail. Am I doing it wrong? I am able apply patch by command patch -p1 < v5-0001-Generated-columns.patch or by "git apply v5-0001-Generated-columns.patch", but only till commit d5eec4eefde70414c9929b32c411cb4f0900a2a9 (Add pg_partition_treeto display information about partitions) Unfortunately patch does not applied to current HEAD. Cfbot noticed this too: http://cfbot.cputube.org/patch_20_1443.log regards, Sergei
Hi I applied this patch on top 2fe42baf7c1ad96b5f9eb898161e258315298351 commit and found a bug while adding STORED column: postgres=# create table test(i int); CREATE TABLE postgres=# insert into test values (1),(2); INSERT 0 2 postgres=# alter table test add column gen_stored integer GENERATED ALWAYS AS ((i * 2)) STORED; ALTER TABLE postgres=# alter table test add column gen_virt integer GENERATED ALWAYS AS ((i * 2)); ALTER TABLE postgres=# table test; i | gen_stored | gen_virt ---+------------+---------- 1 | | 2 2 | | 4 Virtual columns was calculated on table read and its ok, but stored column does not update table data. regards, Sergei
On 30/10/2018 15:19, Erik Rijkers wrote: > On 2018-10-30 09:35, Peter Eisentraut wrote: > >> [v5-0001-Generated-columns.patch ] > > Hi, > > I couldn't get this to apply to current head. > > I tried: > > patch --dry-run --ignore-whitespace -p 0 -F 5 < > v5-0001-Generated-columns.patch > > and varied both -p and -F paramaters to no avail. Am I doing it wrong? You need -p1. Or use git apply or git am. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 30/10/2018 15:39, Sergei Kornilov wrote: >> patch --dry-run --ignore-whitespace -p 0 -F 5 < >> v5-0001-Generated-columns.patch >> >> and varied both -p and -F paramaters to no avail. Am I doing it wrong? > I am able apply patch by command > patch -p1 < v5-0001-Generated-columns.patch > or by "git apply v5-0001-Generated-columns.patch", but only till commit d5eec4eefde70414c9929b32c411cb4f0900a2a9 (Add pg_partition_treeto display information about partitions) > > Unfortunately patch does not applied to current HEAD. Cfbot noticed this too: http://cfbot.cputube.org/patch_20_1443.log Here's another one. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Oct 30, 2018 at 09:35:18AM +0100, Peter Eisentraut wrote: > Attached is a new version of this patch. Thanks Peter for sending a new patch. I am still assigned as a reviewer, and still plan to look at it in more details. > It supports both computed-on-write and computed-on-read variants, using > the keywords STORED and VIRTUAL respectively. It is actually good to see that you are tackling both problems now. -- Michael
Attachment
On 2018-10-30 16:14, Sergei Kornilov wrote: > Hi > > I applied this patch on top 2fe42baf7c1ad96b5f9eb898161e258315298351 > commit and found a bug while adding STORED column: > > postgres=# create table test(i int); > CREATE TABLE > postgres=# insert into test values (1),(2); > INSERT 0 2 > postgres=# alter table test add column gen_stored integer GENERATED > ALWAYS AS ((i * 2)) STORED; > ALTER TABLE > postgres=# alter table test add column gen_virt integer GENERATED > ALWAYS AS ((i * 2)); > ALTER TABLE > postgres=# table test; > i | gen_stored | gen_virt > ---+------------+---------- > 1 | | 2 > 2 | | 4 > > Virtual columns was calculated on table read and its ok, but stored > column does not update table data. This workaround is possible: update test set i = i where gen_stored is null returning *; i | gen_stored | gen_virt ---+------------+---------- 1 | 2 | 2 2 | 4 | 4 (2 rows) table test ; i | gen_stored | gen_virt ---+------------+---------- 3 | 6 | 6 4 | 8 | 8 1 | 2 | 2 2 | 4 | 4 (4 rows) Hm, well, I suppose it's still a bug... I have also noticed that logical replication isn't possible on tables with a generated column. That's a shame but I suppsoe that is as expected. Erik Rijkers > > regards, Sergei
On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers <er@xs4all.nl> wrote:
--
I have also noticed that logical replication isn't possible on tables
with a generated column. That's a shame but I suppsoe that is as
expected.
Couldn't see anything like that in the patch. Presumably unintended consequence. The generated value needs to be in WAL, so decoding it should be trivial.
Virtual columns wouldn't need to be replicated.
I guess we might choose to replicate generated cols as a value, or leave them out and let them be generated on the downstream side. The default should be to just treat them as a value.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-10-31 09:15, Simon Riggs wrote: > On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers <er@xs4all.nl> wrote: > > >> I have also noticed that logical replication isn't possible on tables >> with a generated column. That's a shame but I suppsoe that is as >> expected. >> > > Couldn't see anything like that in the patch. Presumably unintended > consequence. The generated value needs to be in WAL, so decoding it > should > be trivial. > These log messages occur on attempting at logical replication: ( table t1 has no generated columns; replicates fine. table t2 has one generated column; replication fails: see below ) LOG: database system is ready to accept connections LOG: logical replication apply worker for subscription "sub1" has started LOG: logical replication table synchronization worker for subscription "sub1", table "t1" has started LOG: logical replication table synchronization worker for subscription "sub1", table "t2" has started LOG: logical replication table synchronization worker for subscription "sub1", table "t1" has finished ERROR: column "i2" is a generated column DETAIL: Generated columns cannot be used in COPY. LOG: background worker "logical replication worker" (PID 22252) exited with exit code 1 > Virtual columns wouldn't need to be replicated. > > I guess we might choose to replicate generated cols as a value, or > leave > them out and let them be generated on the downstream side. The default > should be to just treat them as a value. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > <http://www.2ndquadrant.com/> > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 31 Oct 2018 at 08:29, Erik Rijkers <er@xs4all.nl> wrote:
--
On 2018-10-31 09:15, Simon Riggs wrote:
> On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers <er@xs4all.nl> wrote:
>
>
>> I have also noticed that logical replication isn't possible on tables
>> with a generated column. That's a shame but I suppsoe that is as
>> expected.
>>
>
> Couldn't see anything like that in the patch. Presumably unintended
> consequence. The generated value needs to be in WAL, so decoding it
> should
> be trivial.
>
These log messages occur on attempting at logical replication:
( table t1 has no generated columns; replicates fine.
table t2 has one generated column; replication fails: see below )
LOG: database system is ready to accept connections
LOG: logical replication apply worker for subscription "sub1" has
started
LOG: logical replication table synchronization worker for subscription
"sub1", table "t1" has started
LOG: logical replication table synchronization worker for subscription
"sub1", table "t2" has started
LOG: logical replication table synchronization worker for subscription
"sub1", table "t1" has finished
ERROR: column "i2" is a generated column
DETAIL: Generated columns cannot be used in COPY.
LOG: background worker "logical replication worker" (PID 22252) exited
with exit code 1
OK, so the problem is COPY.
Which means we have an issue with restore. We need to be able to pg_dump a table with generated columns, then restore it afterwards. More generally, we need to be able to handle data that has already been generated - the "generate" idea should apply to new data not existing data.
Sounds like we need to do an ALTER TABLE ... GENERATE ALWAYS after the table has been re-created and re-loaded, so that both logical replication and dump/restore would work.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi > OK, so the problem is COPY. > > Which means we have an issue with restore. We need to be able to pg_dump a table with generated columns, then restore itafterwards. More generally, we need to be able to handle data that has already been generated - the "generate" idea shouldapply to new data not existing data. pg_dump was fixed in published patch (i check this yesterday). It does not COPY generated columns values, only non-generated. regards, Sergei
On 10/30/18, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > 3. Radical alternative: Collapse everything into one new column. We > could combine atthasdef and attgenerated and even attidentity into a new > column. (Only one of the three can be the case.) This would give > client code a clean break, which may or may not be good. The > implementation would be uglier than #1 but probably cleaner than #2. We > could also get 4 bytes back per pg_attribute row. Thinking about the distinction between 'stored' and 'virtual', I immediately thought of atthasmissing. In a way it indicates that there is a virtual default value. It seems the level of materialization is an orthogonal concept to the kind of value we have. What if attgenerated had d = normal default value i = identity default a = identity always c = generated column and in addition there was an attmaterialized column with s = stored v = virtual In this scheme, -Normal attribute: '\0' + 's' -Default value: 'd' + 's' -Fast default: 'd' + 'v' until the table is rewritten -Identity column: 'i'/'a' + 's' -Generated column: 'c' + 's'/'v' This way, we'd still end up with 1 fewer column (2 new ones minus atthasdef, attidentity, and atthasmissing). A bit crazier, what if "d = dropped" was another allowed value in attmaterialized -- we could then get rid of attisdropped as well. That has obvious disadvantages, but the broader idea is that this design may have use cases we haven't thought of yet. Thoughts? -John Naylor
On 30/10/2018 16:14, Sergei Kornilov wrote: > Hi > > I applied this patch on top 2fe42baf7c1ad96b5f9eb898161e258315298351 commit and found a bug while adding STORED column: > > postgres=# create table test(i int); > CREATE TABLE > postgres=# insert into test values (1),(2); > INSERT 0 2 > postgres=# alter table test add column gen_stored integer GENERATED ALWAYS AS ((i * 2)) STORED; > ALTER TABLE > postgres=# alter table test add column gen_virt integer GENERATED ALWAYS AS ((i * 2)); > ALTER TABLE > postgres=# table test; > i | gen_stored | gen_virt > ---+------------+---------- > 1 | | 2 > 2 | | 4 > > Virtual columns was calculated on table read and its ok, but stored column does not update table data. This is a small bug that I will fix in my next update. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 31/10/2018 08:58, Erikjan Rijkers wrote: > I have also noticed that logical replication isn't possible on tables > with a generated column. That's a shame but I suppsoe that is as > expected. This is an issue we need to discuss. How should this work? The simplest solution would be to exclude generated columns from the replication stream altogether. Similar considerations also apply to foreign tables. What is the meaning of a stored generated column on a foreign table? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 6 Nov 2018 at 04:31, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
--
On 31/10/2018 08:58, Erikjan Rijkers wrote:
> I have also noticed that logical replication isn't possible on tables
> with a generated column. That's a shame but I suppsoe that is as
> expected.
This is an issue we need to discuss. How should this work?
The simplest solution would be to exclude generated columns from the
replication stream altogether.
IMHO...
Virtual generated columns need not be WAL-logged, or sent.
Stored generated columns should be treated just like we'd treat a column value added by a trigger.
e.g. if we had a Timestamp column called LastUpdateTimestamp we would want to send that value
Similar considerations also apply to foreign tables. What is the
meaning of a stored generated column on a foreign table?
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/11/2018 14:28, Simon Riggs wrote: > The simplest solution would be to exclude generated columns from the > replication stream altogether. > > > IMHO... > > Virtual generated columns need not be WAL-logged, or sent. right > Stored generated columns should be treated just like we'd treat a column > value added by a trigger. > > e.g. if we had a Timestamp column called LastUpdateTimestamp we would > want to send that value Generated columns cannot have volatile expression results in them, so this case cannot happen. Also, we don't know whether the generation expression on the target is the same (or even if it looks the same, consider locale issues etc.), so we need to recompute the generated columns on the target anyway, so it's pointless to send the already computed stored values. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 6 Nov 2018 at 13:16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
--
> Stored generated columns should be treated just like we'd treat a column
> value added by a trigger.
>
> e.g. if we had a Timestamp column called LastUpdateTimestamp we would
> want to send that value
Generated columns cannot have volatile expression results in them, so
this case cannot happen.
Also, we don't know whether the generation expression on the target is
the same (or even if it looks the same, consider locale issues etc.), so
we need to recompute the generated columns on the target anyway, so it's
pointless to send the already computed stored values.
Makes sense.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 30, 2018 at 09:35:18AM +0100, Peter Eisentraut wrote: > 1. (current implementation) New column attgenerated contains 's' for > STORED, 'v' for VIRTUAL, '\0' for nothing. atthasdef means "there is > something in pg_attrdef for this column". > > 2. Alternative: A generated column has attgenerated = s/v but atthasdef > = false, so that atthasdef means specifically "column has a default". > Then a column would have a pg_attrdef entry for either attgenerated != > '\0' or atthasdef = true. > > 3. Radical alternative: Collapse everything into one new column. We > could combine atthasdef and attgenerated and even attidentity into a new > column. (Only one of the three can be the case.) This would give > client code a clean break, which may or may not be good. The > implementation would be uglier than #1 but probably cleaner than #2. We > could also get 4 bytes back per pg_attribute row. > > I'm happy with the current choice #1, but it's worth thinking about. #3 looks very appealing in my opinion as those columns have no overlap, so it would take five possible values: - generated always - generated by default - default value - stored expression - virtual expression Obviously this requires a first patch to combine the catalog representation of the existing columns atthasdef and attidentity. + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("generated colums are not supported on typed tables"))); s/colums/columns/. -- Michael
Attachment
On Tue, Oct 30, 2018 at 4:35 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > 1. (current implementation) New column attgenerated contains 's' for > STORED, 'v' for VIRTUAL, '\0' for nothing. atthasdef means "there is > something in pg_attrdef for this column". So a generated column would > have atthasdef = true, and attgenerated = s/v. A traditional default > would have atthasdef = true and attgenerated = '\0'. The advantage is > that this is easiest to implement and the internal representation is the > most useful and straightforward. The disadvantage is that old client > code that wants to detect whether a column has a default would need to > be changed (otherwise it would interpret a generated column as having a > default value instead). > > 2. Alternative: A generated column has attgenerated = s/v but atthasdef > = false, so that atthasdef means specifically "column has a default". > Then a column would have a pg_attrdef entry for either attgenerated != > '\0' or atthasdef = true. (Both couldn't be the case at the same time.) > The advantage is that client code wouldn't need to be changed. But > it's also possible that there is client code that just does a left join > of pg_attribute and pg_attrdef without looking at atthasdef, so that > would still be broken. The disadvantage is that the internal > implementation would get considerably ugly. Most notably, the tuple > descriptor would probably still look like #1, so there would have to be > a conversion somewhere between variant #1 and #2. Or we'd have to > duplicate all the tuple descriptor access code to keep that separate. > There would be a lot of redundancy. > > 3. Radical alternative: Collapse everything into one new column. We > could combine atthasdef and attgenerated and even attidentity into a new > column. (Only one of the three can be the case.) This would give > client code a clean break, which may or may not be good. The > implementation would be uglier than #1 but probably cleaner than #2. We > could also get 4 bytes back per pg_attribute row. > > I'm happy with the current choice #1, but it's worth thinking about. I don't have a strong position on 1 vs. 2 vs. 3, but I do think it would be nicer not to use '\0' as a column value. I'd suggest you use 'n' or '0' or '-' or some other printable character instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> 3. Radical alternative: Collapse everything into one new column. We
> could combine atthasdef and attgenerated and even attidentity into a new
> column. (Only one of the three can be the case.) This would give
> client code a clean break, which may or may not be good. The
> implementation would be uglier than #1 but probably cleaner than #2. We
> could also get 4 bytes back per pg_attribute row.
>
> I'm happy with the current choice #1, but it's worth thinking about.
#3 looks very appealing in my opinion as those columns have no overlap,
so it would take five possible values:
Could the removed columns live on...as generated-always columns?
Disclaimer: I had never seen this patch before. I did not participate in this feature design. I did not discuss this review with the author or anybody in 2ndQuadrant. I do not have any particular affective bonds with its author. I did not receive payment nor goods in exchange for this review. I encourage others to review this patch, and all other patches in the current commitfest and all future commitfests. Now that the TupleTableSlot work has landed, the API of ExecComputeStoredGenerated is clearly inadequate. This should be adjusted to work with the slot only, and not generate a heap tuple at all -- if the calling code needs the heap tuple, have that code generate that from the slot. (Example problem: ExecConstraints runs using the slot, not the heap tuple.) The pg_dump tests verify a virtual generated column, but not a virtual stored column. It'd be good to have one for the latter. The tables in new test "generated" appear not to be dropped, which is good to test pg_upgrade as well as pg_dump; I'd add a comment that this is on purpose, lest someone else add DROP lines there later. I think some tests for logical replication would be good as well. It's unclear why you made generated columns on partitions unsupported. I'd fix the limitation if possible, but if not, at least document it. (I particularly notice that partition key is marked as unsupported in the regression test. Consider partitioning on a SERIAL column, this is clearly something that users will expect to work.) About your catalog representation question: On 2018-Oct-30, Peter Eisentraut wrote: > 1. (current implementation) New column attgenerated contains 's' for > STORED, 'v' for VIRTUAL, '\0' for nothing. atthasdef means "there is > something in pg_attrdef for this column". So a generated column would > have atthasdef = true, and attgenerated = s/v. A traditional default > would have atthasdef = true and attgenerated = '\0'. The advantage is > that this is easiest to implement and the internal representation is the > most useful and straightforward. The disadvantage is that old client > code that wants to detect whether a column has a default would need to > be changed (otherwise it would interpret a generated column as having a > default value instead). I think this is a reasonable implementation. Client code that is confused by this will obviously have to be updated, but if it isn't, the bug is not serious. I would clearly not go to the extreme trouble that #2 poses just to avoid this problem. That said, I think your "radical alternative" #3 is better. Maybe we want to avoid multiple compatibility breaks, so we'd go with 3 for pg12 instead of doing 1 now and changing it again later. Like Robert, I would use something other than '\0' anyhow. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 15/11/2018 15:10, Robert Haas wrote: > I don't have a strong position on 1 vs. 2 vs. 3, but I do think it > would be nicer not to use '\0' as a column value. I'd suggest you use > 'n' or '0' or '-' or some other printable character instead. I had carefully considered this when attidentity was added. Using '\0' allows you to use this column as a boolean in C code, which is often convenient. Also, there are numerous places where a pg_attribute form or a tuple descriptor is initialized to all zeroes, which works well for most fields, and adding one exception like this would create a lot of extra work and bloat the patch and create potential for future instability. Also note that a C char '\0' is represented as '' (empty string) in SQL, so this also creates a natural representation in SQL. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 19/11/2018 19:54, Alvaro Herrera wrote: > It's unclear why you made generated columns on partitions unsupported. > I'd fix the limitation if possible, but if not, at least document it. This is explained here: + /* + * Generated columns in partition key expressions: + * + * - Stored generated columns cannot work: They are computed + * after BEFORE triggers, but partition routing is done + * before all triggers. + * + * - Virtual generated columns could work. But there is a + * problem when dropping such a table: Dropping a table + * calls relation_open(), which causes partition keys to be + * constructed for the partcache, but at that point the + * generation expression is already deleted (through + * dependencies), so this will fail. So if you remove the + * restriction below, things will appear to work, but you + * can't drop the table. :-( + */ > (I particularly notice that partition key is marked as unsupported in > the regression test. Consider partitioning on a SERIAL column, this is > clearly something that users will expect to work.) A serial column is not a generated column. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/11/2018 13:27, Peter Eisentraut wrote: > This is a small bug that I will fix in my next update. Time for another update. Lot's of rebasing, many things fixed, including the ADD COLUMN bug you found, replication, foreign tables, better caching, some corner cases in trigger behavior, more documentation. This addresses everything I've had in my notes, so it's functionally and logically complete from my perspective. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi
pá 11. 1. 2019 v 9:31 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
On 06/11/2018 13:27, Peter Eisentraut wrote:
> This is a small bug that I will fix in my next update.
Time for another update. Lot's of rebasing, many things fixed,
including the ADD COLUMN bug you found, replication, foreign tables,
better caching, some corner cases in trigger behavior, more
documentation. This addresses everything I've had in my notes, so it's
functionally and logically complete from my perspective.
I am looking on this patch - it is great feature.
The documentation contains paragraph
+ The generation expression can only use immutable functions and cannot
+ use subqueries or reference anything other than the current row in any
+ way.
+ use subqueries or reference anything other than the current row in any
+ way.
It is necessary for stored columns?
I tested it with pseudo constant - current_timestamp, session_user. But current_database() is disallowed.
on second hand, this is strange
postgres=# create table foo3 (inserted text generated always as (current_timestamp) virtual);
CREATE TABLE
CREATE TABLE
Regards
Pavel
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/01/2019 16:22, Pavel Stehule wrote: > The documentation contains paragraph > > + The generation expression can only use immutable functions and cannot > + use subqueries or reference anything other than the current row > in any > + way. > > It is necessary for stored columns? See here: https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f914ca@2ndquadrant.com > I tested it with pseudo constant - current_timestamp, session_user. But > current_database() is disallowed. > > on second hand, this is strange > > postgres=# create table foo3 (inserted text generated always as > (current_timestamp) virtual); > CREATE TABLE Ah, the volatility checking needs some improvements. I'll address that in the next patch version. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
ne 13. 1. 2019 v 10:43 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
On 11/01/2019 16:22, Pavel Stehule wrote:
> The documentation contains paragraph
>
> + The generation expression can only use immutable functions and cannot
> + use subqueries or reference anything other than the current row
> in any
> + way.
>
> It is necessary for stored columns?
See here:
https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f914ca@2ndquadrant.com
I understand - it is logical. But it is sad, so this feature is not complete. The benefit is not too big - against functional indexes or views. But it can be first step.
> I tested it with pseudo constant - current_timestamp, session_user. But
> current_database() is disallowed.
>
> on second hand, this is strange
>
> postgres=# create table foo3 (inserted text generated always as
> (current_timestamp) virtual);
> CREATE TABLE
Ah, the volatility checking needs some improvements. I'll address that
in the next patch version.
ok
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jan 13, 2019 at 03:31:23PM +0100, Pavel Stehule wrote: > ne 13. 1. 2019 v 10:43 odesílatel Peter Eisentraut < > peter.eisentraut@2ndquadrant.com> napsal: >> See here: >> >> https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f914ca@2ndquadrant.com > > I understand - it is logical. But it is sad, so this feature is not > complete. The benefit is not too big - against functional indexes or views. > But it can be first step. I wouldn't say that. Volatibility restrictions based on immutable functions apply to many concepts similar like expression pushdowns to make for deterministic results. The SQL spec takes things on the safe side. >> Ah, the volatility checking needs some improvements. I'll address that >> in the next patch version. > > ok The same problem happens for stored and virtual columns. The latest patch has a small header conflict at the top of rewriteHandler.c which is simple enough to fix. It would be nice to add a test with composite types, say something like: =# create type double_int as (a int, b int); CREATE TYPE =# create table double_tab (a int, b double_int GENERATED ALWAYS AS ((a * 2, a * 3)) stored, c double_int GENERATED ALWAYS AS ((a * 4, a * 5)) virtual); CREATE TABLE =# insert into double_tab values (1), (6); INSERT 0 2 =# select * from double_tab ; a | b | c ---+---------+--------- 1 | (2,3) | (4,5) 6 | (12,18) | (24,30) (2 rows) Glad to see that typed tables are handled and forbidden, and the trigger definition looks sane to me. +-- partitioned table +CREATE TABLE gtest_parent (f1 date NOT NULL, f2 bigint, f3 bigint GENERATED ALWAYS AS (f2 * 2)) PARTITION BY RANGE (f1); +CREATE TABLE gtest_child PARTITION OF gtest_parent FOR VALUES FROM ('2016-07-01') TO ('2016-08-01'); +INSERT INTO gtest_parent (f1, f2) VALUES ('2016-07-15', 1); In my experience, having tests which handle multiple layers of partitions is never a bad thing. + /* + * Changing the type of a column that is used by a + * generated column is not allowed by SQL standard. + * It might be doable with some thinking and effort. Just mentioning the part about the SQL standard seems fine to me. + bool has_generated_stored; + bool has_generated_virtual; } TupleConstr; Could have been more simple to use a char as representation here. Using NULL as generation expression results in a crash when selecting the relation created: =# CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (NULL)); CREATE TABLE =# select * from gtest_err_1; server closed the connection unexpectedly =# CREATE TABLE gtest_err_2 (a int PRIMARY KEY, b int NOT NULL GENERATED ALWAYS AS (NULL)); CREATE TABLE A NOT NULL column can use NULL as generated result :) + The view <literal>column_column_usage</literal> identifies all generated "column_column_usage" is redundant. Could it be possible to come up with a better name? When testing a bulk INSERT into a table which has a stored generated column, memory keeps growing in size linearly, which does not seem normal to me. If inserting more tuples than what I tested (I stopped at 10M because of lack of time), it seems to me that this could result in OOMs. I would have expected the memory usage to be steady. + /* ignore virtual generated columns; they are always null here */ + if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL) + continue; Looking for an assertion or a sanity check of some kind? + if (pstate->p_expr_kind == EXPR_KIND_GENERATED_COLUMN && + attnum < InvalidAttrNumber && attnum != TableOidAttributeNumber) + ereport(ERROR, + (errcode(ERRCODE_INVALID_COLUMN_REFERENCE), + errmsg("cannot use system column \"%s\" in column generation expression", + colname), + parser_errposition(pstate, location))); There is a test using xmon, you may want one with tableoid. +/* + * Thin wrapper around libpq to obtain server version. + */ +static int +libpqrcv_server_version(WalReceiverConn *conn) This should be introduced in separate patch in my opinion (needed afterwards for logirep). I have not gone through the PL part of the changes yet, except plpgsql. What about the catalog representation of attgenerated? Would it merge with attidentity & co? Or not? -- Michael
Attachment
út 15. 1. 2019 v 8:14 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Sun, Jan 13, 2019 at 03:31:23PM +0100, Pavel Stehule wrote:
> ne 13. 1. 2019 v 10:43 odesílatel Peter Eisentraut <
> peter.eisentraut@2ndquadrant.com> napsal:
>> See here:
>>
>> https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f914ca@2ndquadrant.com
>
> I understand - it is logical. But it is sad, so this feature is not
> complete. The benefit is not too big - against functional indexes or views.
> But it can be first step.
I wouldn't say that. Volatibility restrictions based on immutable
functions apply to many concepts similar like expression pushdowns to
make for deterministic results. The SQL spec takes things on the safe
side.
I would to have a mechanism for safe replacement of triggers of type
if TG_TYPE = 'INSERT' THEN
NEW.inserted := CURRENT_TIMESTAMP;
ELSE IF TG_TYPE = 'UPDATE' THEN
NEW.updated := CURRENT_TIMESTAMP;
..
But I understand, so current SQL spec design is safe.
Regards
Pavel
On 15/01/2019 08:18, Pavel Stehule wrote: > I would to have a mechanism for safe replacement of triggers of type > > if TG_TYPE = 'INSERT' THEN > NEW.inserted := CURRENT_TIMESTAMP; > ELSE IF TG_TYPE = 'UPDATE' THEN > NEW.updated := CURRENT_TIMESTAMP; > .. That kind of use is probably better addressed with a temporal facility. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
st 16. 1. 2019 v 9:26 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
On 15/01/2019 08:18, Pavel Stehule wrote:
> I would to have a mechanism for safe replacement of triggers of type
>
> if TG_TYPE = 'INSERT' THEN
> NEW.inserted := CURRENT_TIMESTAMP;
> ELSE IF TG_TYPE = 'UPDATE' THEN
> NEW.updated := CURRENT_TIMESTAMP;
> ..
That kind of use is probably better addressed with a temporal facility.
yes. I am looking for this functionality in Postgres
Pavel
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 15/01/2019 08:13, Michael Paquier wrote: > When testing a bulk INSERT into a table which has a stored generated > column, memory keeps growing in size linearly, which does not seem > normal to me. If inserting more tuples than what I tested (I stopped > at 10M because of lack of time), it seems to me that this could result > in OOMs. I would have expected the memory usage to be steady. What are you executing exactly? One INSERT command with many rows? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Nov 22, 2018 at 10:16 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 15/11/2018 15:10, Robert Haas wrote: > > I don't have a strong position on 1 vs. 2 vs. 3, but I do think it > > would be nicer not to use '\0' as a column value. I'd suggest you use > > 'n' or '0' or '-' or some other printable character instead. > > I had carefully considered this when attidentity was added. Using '\0' > allows you to use this column as a boolean in C code, which is often > convenient. Also, there are numerous places where a pg_attribute form > or a tuple descriptor is initialized to all zeroes, which works well for > most fields, and adding one exception like this would create a lot of > extra work and bloat the patch and create potential for future > instability. Also note that a C char '\0' is represented as '' (empty > string) in SQL, so this also creates a natural representation in SQL. I'm not really convinced. I think that the stdbool work you've been doing shows that blurring the line between char and bool is a bad idea. And I believe that on general principle, anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I couldn't compile v7-0001 and I am testing with the older v6-0001 (of which I still had an instance). So the problem below may have been fixed already. If you add a generated column to a file_fdw foreign table, it works OK wih VIRTUAL (the default) but with STORED it adds an empty column, silently. I would say it would make more sense to get an error. Erik Rijkers
On Wed, Jan 16, 2019 at 02:14:41PM +0100, Peter Eisentraut wrote: > On 15/01/2019 08:13, Michael Paquier wrote: >> When testing a bulk INSERT into a table which has a stored generated >> column, memory keeps growing in size linearly, which does not seem >> normal to me. If inserting more tuples than what I tested (I stopped >> at 10M because of lack of time), it seems to me that this could result >> in OOMs. I would have expected the memory usage to be steady. > > What are you executing exactly? One INSERT command with many rows? Yes, something like that grows the memory and CPU usage rather linearly: CREATE TABLE tab (a int, b int GENERATED ALWAYS AS (a * 2) STORED); INSERT INTO tab VALUES (generate_series(1,100000000)); -- Michael
Attachment
On 16/01/2019 22:40, Erik Rijkers wrote: > I couldn't compile v7-0001 and I am testing with the older v6-0001 (of > which I still had an instance). > > So the problem below may have been fixed already. > > If you add a generated column to a file_fdw foreign table, it works OK > wih VIRTUAL (the default) but with STORED it adds an empty column, > silently. I would say it would make more sense to get an error. Yes, v7 has addressed foreign-data wrappers. (I only tested with postgres_fdw.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 17, 2019 at 10:12:26AM +0900, Michael Paquier wrote: > Yes, something like that grows the memory and CPU usage rather > linearly: > CREATE TABLE tab (a int, b int GENERATED ALWAYS AS (a * 2) STORED); > INSERT INTO tab VALUES (generate_series(1,100000000)); The latest patch set got plenty of feedback not addressed yet, so I am marking it as returned with feedback. -- Michael
Attachment
On 2019-02-01 04:48, Michael Paquier wrote: > On Thu, Jan 17, 2019 at 10:12:26AM +0900, Michael Paquier wrote: >> Yes, something like that grows the memory and CPU usage rather >> linearly: >> CREATE TABLE tab (a int, b int GENERATED ALWAYS AS (a * 2) STORED); >> INSERT INTO tab VALUES (generate_series(1,100000000)); > > The latest patch set got plenty of feedback not addressed yet, so I am > marking it as returned with feedback. Here is an updated patch which should address the review comments in the latest round. My perspective on this patch is: The stored generated column part is pretty solid. It can target PG12. The virtual generated column part is still a bit iffy. I'm still finding places here and there where virtual columns are not being expanded correctly. Maybe it needs more refactoring. One big unsolved issue is how the storage of such columns should work. Right now, they are stored as nulls. That works fine, but what I suppose we'd really want is to not store them at all. That, however, creates all kinds of complications in the planner if target lists have non-matching lengths or the resnos don't match up. I haven't figured out how to do this cleanly. So I'm thinking if we can get agreement on the stored columns, I can cut out the virtual column stuff for PG12. That should be fairly easy. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2019-01-15 08:13, Michael Paquier wrote: >>> Ah, the volatility checking needs some improvements. I'll address that >>> in the next patch version. >> >> ok > > The same problem happens for stored and virtual columns. This is fixed in v8. > It would be nice to add a test with composite types, say something > like: > =# create type double_int as (a int, b int); > CREATE TYPE > =# create table double_tab (a int, > b double_int GENERATED ALWAYS AS ((a * 2, a * 3)) stored, > c double_int GENERATED ALWAYS AS ((a * 4, a * 5)) virtual); > CREATE TABLE > =# insert into double_tab values (1), (6); > INSERT 0 2 > =# select * from double_tab ; > a | b | c > ---+---------+--------- > 1 | (2,3) | (4,5) > 6 | (12,18) | (24,30) > (2 rows) I added that. > + bool has_generated_stored; > + bool has_generated_virtual; > } TupleConstr; > Could have been more simple to use a char as representation here. Seems confusing if both apply at the same time. > Using NULL as generation expression results in a crash when selecting > the relation created: > =# CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (NULL)); > CREATE TABLE > =# select * from gtest_err_1; > server closed the connection unexpectedly This is fixed in v8. > + The view <literal>column_column_usage</literal> identifies all > generated > "column_column_usage" is redundant. Could it be possible to come up > with a better name? This is specified in the SQL stnadard. > When testing a bulk INSERT into a table which has a stored generated > column, memory keeps growing in size linearly, which does not seem > normal to me. This was a silly coding error. It's fixed in v8. > +/* > + * Thin wrapper around libpq to obtain server version. > + */ > +static int > +libpqrcv_server_version(WalReceiverConn *conn) > This should be introduced in separate patch in my opinion (needed > afterwards for logirep). Yes, it could be committed separately. > What about the catalog representation of attgenerated? Would it merge > with attidentity & co? Or not? I think the way it is now seems best. The other options that were discussed are also plausible, but that the discussions did not reveal any overwhelming arguments for a a change. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Feb 25, 2019 at 09:51:52PM +0100, Peter Eisentraut wrote: > On 2019-01-15 08:13, Michael Paquier wrote: >> + bool has_generated_stored; >> + bool has_generated_virtual; >> } TupleConstr; >> Could have been more simple to use a char as representation here. > > Seems confusing if both apply at the same time. Ouch, I see. The flags count for all attributes. I missed that in a previous read of the patch. Yeah, two booleans make sense. >> When testing a bulk INSERT into a table which has a stored generated >> column, memory keeps growing in size linearly, which does not seem >> normal to me. > > This was a silly coding error. It's fixed in v8. Thanks, this one looks fine. >> +/* >> + * Thin wrapper around libpq to obtain server version. >> + */ >> +static int >> +libpqrcv_server_version(WalReceiverConn *conn) >> This should be introduced in separate patch in my opinion (needed >> afterwards for logirep). > > Yes, it could be committed separately. I would split that one and I think that it could go in. If you wish to keep things grouped that's fine by me as well. >> What about the catalog representation of attgenerated? Would it merge >> with attidentity & co? Or not? > > I think the way it is now seems best. The other options that were > discussed are also plausible, but that the discussions did not reveal > any overwhelming arguments for a a change. Hm. Does the SQL standard mention more features which could be merged with stored values, virtual values, default expressions and identity columns? I don't know the last trends in this area but I am wondering if there are any other column specific, expression-like features like that associated to a column. That could give more strength with having one column in pg_attribute to govern them all. Well, assuming that something else is implemented of course. That's a lot of assumptions, and it's not like the current implementation is wrong either. -- Michael
Attachment
On Mon, Feb 25, 2019 at 09:46:35PM +0100, Peter Eisentraut wrote: > The virtual generated column part is still a bit iffy. I'm still > finding places here and there where virtual columns are not being > expanded correctly. Maybe it needs more refactoring. One big unsolved > issue is how the storage of such columns should work. Right now, they > are stored as nulls. That works fine, but what I suppose we'd really > want is to not store them at all. That, however, creates all kinds of > complications in the planner if target lists have non-matching lengths > or the resnos don't match up. I haven't figured out how to do this > cleanly. Hmm. Not storing virtual columns looks like the correct concept to me instead of storing them as NULL. > So I'm thinking if we can get agreement on the stored columns, I can cut > out the virtual column stuff for PG12. That should be fairly easy. The shape of what is used for stored columns looks fine to me. + if (attgenerated) + { + /* + * Generated column: Dropping anything that the generation expression + * refers to automatically drops the generated column. + */ + recordDependencyOnSingleRelExpr(&colobject, expr, RelationGetRelid(rel), + DEPENDENCY_AUTO, + DEPENDENCY_AUTO, false); + } A CCI is not necessary I think here, still the recent thread about automatic dependencies with identity columns had a much similar pattern... + else if (generated[0] == ATTRIBUTE_GENERATED_VIRTUAL) + default_str = psprintf("generated always as (%s)", PQgetvalue(res, i, attrdef_col)); Nit: I would add VIRTUAL instead of relying on the default option. Another thing I was thinking about: could it be possible to add a sanity check in sanity_check.sql so as a column more that one field in attidentity, attgenerated and atthasdef set at the same time? -- Michael
Attachment
On 2019-02-26 06:30, Michael Paquier wrote: > + if (attgenerated) > + { > + /* > + * Generated column: Dropping anything that the generation expression > + * refers to automatically drops the generated column. > + */ > + recordDependencyOnSingleRelExpr(&colobject, expr, RelationGetRelid(rel), > + DEPENDENCY_AUTO, > + DEPENDENCY_AUTO, false); > + } > A CCI is not necessary I think here, still the recent thread about > automatic dependencies with identity columns had a much similar > pattern... Yeah, worth taking another look. > > + else if (generated[0] == ATTRIBUTE_GENERATED_VIRTUAL) > + default_str = psprintf("generated always as (%s)", PQgetvalue(res, i, attrdef_col)); > Nit: I would add VIRTUAL instead of relying on the default option. I suppose we'll decide that when the virtual columns are actually added. I see your point. > Another thing I was thinking about: could it be possible to add a > sanity check in sanity_check.sql so as a column more that one > field in attidentity, attgenerated and atthasdef set at the same time? There is something like that at the top of src/test/regress/sql/generated.sql. I can expand that. But it only covers the tests. For run time checks, you'd want something like pg_catcheck. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-02-26 06:12, Michael Paquier wrote: > Hm. Does the SQL standard mention more features which could be merged > with stored values, virtual values, default expressions and identity > columns? I don't know the last trends in this area but I am wondering > if there are any other column specific, expression-like features like > that associated to a column. That could give more strength with > having one column in pg_attribute to govern them all. Well, assuming > that something else is implemented of course. That's a lot of > assumptions, and it's not like the current implementation is wrong > either. The system-versioned tables feature might be the most adjacent one. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Feb 27, 2019 at 08:05:02AM +0100, Peter Eisentraut wrote: > There is something like that at the top of > src/test/regress/sql/generated.sql. I can expand that. I think you mean identity.sql. I would think that this would live better with some other sanity checks. I am fine with your final decision on the matter. > But it only covers the tests. For run time checks, you'd want > something like pg_catcheck. Sure. -- Michael
Attachment
On 2019-01-15 08:13, Michael Paquier wrote: > +/* > + * Thin wrapper around libpq to obtain server version. > + */ > +static int > +libpqrcv_server_version(WalReceiverConn *conn) > This should be introduced in separate patch in my opinion (needed > afterwards for logirep). I have committed this separately. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Here is an updated patch with just the "stored" functionality, as discussed. The actual functionality is much smaller now, contained in the executor. Everything else is mostly DDL support, trigger handling, and some frontend stuff. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi
po 18. 3. 2019 v 8:35 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
Here is an updated patch with just the "stored" functionality, as discussed.
The actual functionality is much smaller now, contained in the executor.
Everything else is mostly DDL support, trigger handling, and some
frontend stuff.
probably I found a bug
create table foo(id int, name text);
insert into foo values(1, 'aaa');
alter table foo add column name_upper text generated always as (upper(name)) stored;
update foo set name = 'bbb' where id = 1; -- ok
alter table foo drop column name_upper;
update foo set name = 'bbbx' where id = 1; -- ok
alter table foo add column name_upper text generated always as (upper(name)) stored;
update foo set name = 'bbbxx' where id = 1; -- error
postgres=# update foo set name = 'bbbxx' where id = 1; -- error
ERROR: no generation expression found for column number 3 of table "foo"
ERROR: no generation expression found for column number 3 of table "foo"
Regards
Pavel
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 18, 2019 at 03:14:09PM +0100, Pavel Stehule wrote: > postgres=# update foo set name = 'bbbxx' where id = 1; -- error > ERROR: no generation expression found for column number 3 of table > "foo" Yes I can see the problem after adding a generated column and dropping it on an INSERT query. I have read through the code once. + if (relid && attnum && get_attgenerated(relid, attnum)) Better to use OidIsValid here? + (walrcv_server_version(wrconn) >= 120000 ? "AND a.attgenerated = ''" : ""), I think that it is better to always have version-related references stored as defines. +CREATE TABLE gtest22a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED UNIQUE); +CREATE TABLE gtest22b (a int, b int GENERATED ALWAYS AS (a * 2) STORED, PRIMARY KEY (a, b)); Some tests for unique constraints with a generated column should be in place? It would be nice to have extra tests for forbidden expression types on generated columns especially SRF, subquery and aggregates/window functions. -- Michael
Attachment
On 2019-03-20 03:51, Michael Paquier wrote: > On Mon, Mar 18, 2019 at 03:14:09PM +0100, Pavel Stehule wrote: >> postgres=# update foo set name = 'bbbxx' where id = 1; -- error >> ERROR: no generation expression found for column number 3 of table >> "foo" > > Yes I can see the problem after adding a generated column and dropping > it on an INSERT query. fixed > + if (relid && attnum && get_attgenerated(relid, attnum)) > Better to use OidIsValid here? fixed > + (walrcv_server_version(wrconn) >= 120000 ? "AND a.attgenerated = ''" : ""), > I think that it is better to always have version-related references > stored as defines. A valid idea, but I don't see it widely done (see psql, pg_dump). > +CREATE TABLE gtest22a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED UNIQUE); > +CREATE TABLE gtest22b (a int, b int GENERATED ALWAYS AS (a * 2) STORED, PRIMARY KEY (a, b)); > Some tests for unique constraints with a generated column should be in > place? done > It would be nice to have extra tests for forbidden expression types > on generated columns especially SRF, subquery and aggregates/window > functions. done -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2019-03-26 20:50, Pavel Stehule wrote: > It is great feature and I'll mark this feature as ready for commit Committed, thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
so 30. 3. 2019 v 9:03 odesílatel Peter Eisentraut <peter.eisentraut@2ndquadrant.com> napsal:
On 2019-03-26 20:50, Pavel Stehule wrote:
> It is great feature and I'll mark this feature as ready for commit
Committed, thanks.
great feature, it reduce some necessity of triggers
Regards
Pavel
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Mar 30, 2019 at 09:03:03AM +0100, Peter Eisentraut wrote: > On 2019-03-26 20:50, Pavel Stehule wrote: > > It is great feature and I'll mark this feature as ready for commit > > Committed, thanks. create_table.sgml now has this: https://www.postgresql.org/docs/devel/sql-createtable.html#id-1.9.3.85.6.2.18.1.2 + <para> + The keyword <literal>STORED</literal> is required to signify that the + column will be computed on write and will be stored on disk. default. + </para> What does "default." mean ? Also, this is working but not documented as valid: postgres=# CREATE TABLE t (j int, i int GENERATED BY DEFAULT AS (j*j+1) STORED); Justin
On 2019-01-16 22:40, Erik Rijkers wrote: > > If you add a generated column to a file_fdw foreign table, it works OK > wih VIRTUAL (the default) but with STORED it adds an empty column, > silently. I would say it would make more sense to get an error. VIRTUAL is gone, but that other issue is still there: STORED in a file_fdw foreign table still silently creates the column which then turns out to be useless on SELECT, with an error like: "ERROR: column some_column_name is a generated column DETAIL: Generated columns cannot be used in COPY." Maybe it'd be possible to get an error earlier, i.e., while trying to create such a useless column? thanks, Erik Rijkers
On Sat, Mar 30, 2019 at 4:03 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-26 20:50, Pavel Stehule wrote:
> It is great feature and I'll mark this feature as ready for commit
Committed, thanks.
I can't do a same-major-version pg_upgrade across this commit, as the new pg_dump is trying to dump a nonexistent attgenerated column from the old database. Is that acceptable collateral damage? I thought we usually avoided that breakage within the dev branch, but I don't know how we do it.
Cheers,
Jeff
On 2019-03-30 10:24, Justin Pryzby wrote: > create_table.sgml now has this: > > https://www.postgresql.org/docs/devel/sql-createtable.html#id-1.9.3.85.6.2.18.1.2 > + <para> > + The keyword <literal>STORED</literal> is required to signify that the > + column will be computed on write and will be stored on disk. default. > + </para> > > What does "default." mean ? Typo, fixed. > Also, this is working but not documented as valid: > postgres=# CREATE TABLE t (j int, i int GENERATED BY DEFAULT AS (j*j+1) STORED); Fixed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-03-31 05:49, Erik Rijkers wrote: > STORED in a > file_fdw foreign table still silently creates the column which then > turns out to be useless on SELECT, with an error like: > > "ERROR: column some_column_name is a generated column > DETAIL: Generated columns cannot be used in COPY." > > Maybe it'd be possible to get an error earlier, i.e., while trying to > create such a useless column? I'll look into it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-03-31 15:22, Jeff Janes wrote: > I can't do a same-major-version pg_upgrade across this commit, as the > new pg_dump is trying to dump a nonexistent attgenerated column from the > old database. Is that acceptable collateral damage? I thought we > usually avoided that breakage within the dev branch, but I don't know > how we do it. We don't. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 01, 2019 at 10:53:27AM +0200, Peter Eisentraut wrote: > On 2019-03-31 15:22, Jeff Janes wrote: >> I can't do a same-major-version pg_upgrade across this commit, as the >> new pg_dump is trying to dump a nonexistent attgenerated column from the >> old database. Is that acceptable collateral damage? I thought we >> usually avoided that breakage within the dev branch, but I don't know >> how we do it. > > We don't. Really? I thought on the contrary that it should work. This reminds me of commit 59a884e9, which has been discussed here: https://www.postgresql.org/message-id/20160212001846.GB29511@momjian.us -- Michael
Attachment
On Mon, Apr 01, 2019 at 10:44:05PM +0900, Michael Paquier wrote: > Really? I thought on the contrary that it should work. This reminds > me of commit 59a884e9, which has been discussed here: > https://www.postgresql.org/message-id/20160212001846.GB29511@momjian.us And this remark makes little sense as we are taking about incompatible dump formats with pg_dump. My apologies for the useless noise. /me runs fast and hides, fast. -- Michael
Attachment
On 2019-04-01 10:52, Peter Eisentraut wrote: > On 2019-03-31 05:49, Erik Rijkers wrote: >> STORED in a >> file_fdw foreign table still silently creates the column which then >> turns out to be useless on SELECT, with an error like: >> >> "ERROR: column some_column_name is a generated column >> DETAIL: Generated columns cannot be used in COPY." >> >> Maybe it'd be possible to get an error earlier, i.e., while trying to >> create such a useless column? > > I'll look into it. I've been trying to create a test case for file_fdw for this, but I'm not getting your result. Can you send a complete test case? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-04-02 14:43, Peter Eisentraut wrote: > On 2019-04-01 10:52, Peter Eisentraut wrote: >> On 2019-03-31 05:49, Erik Rijkers wrote: >>> STORED in a >>> file_fdw foreign table still silently creates the column which then >>> turns out to be useless on SELECT, with an error like: >>> >>> "ERROR: column some_column_name is a generated column >>> DETAIL: Generated columns cannot be used in COPY." >>> >>> Maybe it'd be possible to get an error earlier, i.e., while trying to >>> create such a useless column? >> >> I'll look into it. > > I've been trying to create a test case for file_fdw for this, but I'm > not getting your result. Can you send a complete test case? Ah, I had not noticed before: with an asterisk ('select * from table' ) one gets no error, just empty values. An actual error seems to occur when one mentions the generated-column-name explicitly in the select-list. select "id", "Ratio Log2 GEN" from <file_fdw foreign table>; " ERROR: column "Ratio Log2 GEN" is a generated column DETAIL: Generated columns cannot be used in COPY. " That's from a quick test here at work; maybe that gives you enough info. If that doesn't make it repeatable (for you) I'll make a more complete example this evening (from home).
On 2019-04-02 15:36, Erik Rijkers wrote: > On 2019-04-02 14:43, Peter Eisentraut wrote: >> On 2019-04-01 10:52, Peter Eisentraut wrote: >>> On 2019-03-31 05:49, Erik Rijkers wrote: >>>> STORED in a >>>> file_fdw foreign table still silently creates the column which then >>>> turns out to be useless on SELECT, with an error like: >>>> >>>> "ERROR: column some_column_name is a generated column >>>> DETAIL: Generated columns cannot be used in COPY." >>>> >>>> Maybe it'd be possible to get an error earlier, i.e., while trying >>>> to >>>> create such a useless column? >>> >>> I'll look into it. >> >> I've been trying to create a test case for file_fdw for this, but I'm >> not getting your result. Can you send a complete test case? attached is run_ft.sh which creates a text file: /tmp/pg_head.txt then sets it up as a foreign table, and adds a generated column. Then selects a succesful select, followed by a error-producing select. Some selects are succesful but some fail. I'm not sure why it sometimes fails (it's not just the explicitness of the generated-column-name like I suggested earlier). My output of run_ft.sh is below. $ ./run_ft.sh create schema if not exists "tmp"; CREATE SCHEMA create server if not exists "tmpserver" foreign data wrapper file_fdw; CREATE SERVER drop foreign table if exists tmp.pg_head cascade; DROP FOREIGN TABLE create foreign table tmp.pg_head ( "Gene" text, "Ratio H/L normalized Exp1" numeric ) server tmpserver options ( delimiter E'\t' , format 'csv' , header 'TRUE' , filename '/tmp/pg_head.txt' ); CREATE FOREIGN TABLE alter foreign table tmp.pg_head add column "Ratio H/L normalized Exp1 Log2 (Generated column)" numeric generated always as (case when "Ratio H/L normalized Exp1" > 0 then log(2, "Ratio H/L normalized Exp1") else null end) stored ; ALTER FOREIGN TABLE -- this is OK (although the generated-column values are all empty/null) select "Gene" , "Ratio H/L normalized Exp1" , "Ratio H/L normalized Exp1 Log2 (Generated column)" from tmp.pg_head limit 3 ; Gene | Ratio H/L normalized Exp1 | Ratio H/L normalized Exp1 Log2 (Generated column) --------+---------------------------+--------------------------------------------------- Dhx9 | NaN | Gapdh | 0.42288 | Gm8797 | 0.81352 | (3 rows) -- but this fails select "Gene" , "Ratio H/L normalized Exp1 Log2 (Generated column)" from tmp.pg_head limit 3 ; ERROR: column "Ratio H/L normalized Exp1 Log2 (Generated column)" is a generated column DETAIL: Generated columns cannot be used in COPY.
Attachment
On 2019/04/03 3:54, Erik Rijkers wrote: > create schema if not exists "tmp"; > CREATE SCHEMA > create server if not exists "tmpserver" foreign data wrapper file_fdw; > CREATE SERVER > drop foreign table if exists tmp.pg_head cascade; > DROP FOREIGN TABLE > create foreign table tmp.pg_head ( > "Gene" text, > "Ratio H/L normalized Exp1" numeric > ) > server tmpserver > options ( > delimiter E'\t' > , format 'csv' > , header 'TRUE' > , filename '/tmp/pg_head.txt' > ); > CREATE FOREIGN TABLE > alter foreign table tmp.pg_head > add column "Ratio H/L normalized Exp1 Log2 (Generated column)" numeric > generated always as (case when "Ratio H/L normalized Exp1" > 0 then log(2, > "Ratio H/L normalized Exp1") else null end) stored > ; > ALTER FOREIGN TABLE > -- this is OK (although the generated-column values are all empty/null) > select > "Gene" > , "Ratio H/L normalized Exp1" > , "Ratio H/L normalized Exp1 Log2 (Generated column)" > from tmp.pg_head > limit 3 ; > Gene | Ratio H/L normalized Exp1 | Ratio H/L normalized Exp1 Log2 > (Generated column) > --------+---------------------------+--------------------------------------------------- > > Dhx9 | NaN | > Gapdh | 0.42288 | > Gm8797 | 0.81352 | > (3 rows) > > -- but this fails > select > "Gene" > , "Ratio H/L normalized Exp1 Log2 (Generated column)" > from tmp.pg_head > limit 3 ; > ERROR: column "Ratio H/L normalized Exp1 Log2 (Generated column)" is a > generated column > DETAIL: Generated columns cannot be used in COPY. Fwiw, here's a scenario that fails similarly when using copy, which is what file_fdw uses internally. $ cat foo.csv 1 create table foo (a int, b int generated always as (a+1) stored); copy foo (a, b) from '/tmp/foo.csv'; ERROR: column "b" is a generated column DETAIL: Generated columns cannot be used in COPY. whereas: copy foo from '/tmp/foo.csv'; COPY 1 select * from foo; a │ b ───┼─── 1 │ 2 (1 row) Erik said upthread that generated columns should really be disallowed for (at least) file_fdw foreign tables [1], because (?) they don't support inserts. Changing copy.c to "fix the above seems out of the question. Thanks, Amit [1] https://www.postgresql.org/message-id/e61c597ac4541b77750594eea73a774c%40xs4all.nl
On 2019-04-02 20:54, Erik Rijkers wrote: > attached is run_ft.sh which creates a text file: /tmp/pg_head.txt > then sets it up as a foreign table, and adds a generated column. > > Then selects a succesful select, followed by a error-producing select. I have committed a fix for this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019/04/04 16:52, Peter Eisentraut wrote: > On 2019-04-02 20:54, Erik Rijkers wrote: >> attached is run_ft.sh which creates a text file: /tmp/pg_head.txt >> then sets it up as a foreign table, and adds a generated column. >> >> Then selects a succesful select, followed by a error-producing select. > > I have committed a fix for this. +-- generated column tests +CREATE FOREIGN TABLE gft1 (a int, b text, c text GENERATED ALWAYS AS ('foo') STORED) SERVER file_server +OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ','); +SELECT a, c FROM gft1; + a | c +---+-------- + 1 | _null_ + 1 | _null_ Hmm, I'm afraid we might get bug reports if we go with this. Why is it OK to get null in this case when a user explicitly asked for 'foo'? Thanks, Amit
On 2019-04-04 11:42, Amit Langote wrote: > Hmm, I'm afraid we might get bug reports if we go with this. Why is it OK > to get null in this case when a user explicitly asked for 'foo'? The way stored generated columns work on foreign tables is that the to-be-stored value is computed, then given to the foreign table handler, which then has to store it, and then return it later when queried. However, since the backing store of a foreign table is typically modifiable directly by the user via other channels, it's possible to create situations where actually stored data does not satisfy the generation expression. That is the case here. I don't know of a principled way to prevent that. It's just one of the problems that can happen when you store data in a foreign table: You have very little control over what data actually ends up being stored. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Apr 4, 2019 at 8:01 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2019-04-04 11:42, Amit Langote wrote: > > Hmm, I'm afraid we might get bug reports if we go with this. Why is it OK > > to get null in this case when a user explicitly asked for 'foo'? > > The way stored generated columns work on foreign tables is that the > to-be-stored value is computed, then given to the foreign table handler, > which then has to store it, and then return it later when queried. > However, since the backing store of a foreign table is typically > modifiable directly by the user via other channels, it's possible to > create situations where actually stored data does not satisfy the > generation expression. That is the case here. I don't know of a > principled way to prevent that. It's just one of the problems that can > happen when you store data in a foreign table: You have very little > control over what data actually ends up being stored. OK, thanks for explaining. We do allow DEFAULT to be specified on foreign tables, although locally-defined defaults have little meaning if the FDW doesn't allow inserts. Maybe same thing applies to GENERATED AS columns. Would it make sense to clarify this on CREATE FOREIGN TABLE page? Thanks, Amit
On 2019-04-04 16:37, Amit Langote wrote: > OK, thanks for explaining. We do allow DEFAULT to be specified on > foreign tables, although locally-defined defaults have little meaning > if the FDW doesn't allow inserts. Maybe same thing applies to > GENERATED AS columns. > > Would it make sense to clarify this on CREATE FOREIGN TABLE page? done -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019/04/08 20:50, Peter Eisentraut wrote: > On 2019-04-04 16:37, Amit Langote wrote: >> OK, thanks for explaining. We do allow DEFAULT to be specified on >> foreign tables, although locally-defined defaults have little meaning >> if the FDW doesn't allow inserts. Maybe same thing applies to >> GENERATED AS columns. >> >> Would it make sense to clarify this on CREATE FOREIGN TABLE page? > > done Thanks. Regards, Amit