Thread: Feature Freeze

Feature Freeze

From
Bruce Momjian
Date:
Now that it is August 1, we are in feature freeze.  During the next
month, all outstanding patches will be reviewed and hopefully applied.
After that, we will start preparing for beta, hopefully to start in
September or October.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


GENERATED ... AS IDENTITY, Was: Re: Feature Freeze

From
Zoltan Boszormenyi
Date:
Hi,

I have progressed a bit with my pet project, a.k.a $SUBJECT.

Now GENERATED ALWAYS AS IDENTITY and
GENERATED ALWAYS AS ( expr ) work as
intended. Documentation was also extended.
Some test cases are also included, that shows
that ALTER TABLE ALTER TYPE keeps both
the sequence and the GENERATED ALWAYS
property. Gzipped patch is attached.

Next steps are:
- pg_dump support
- more ALTER TABLE support for adding and
  dropping IDENTITY and GENERATED ALWAYS
  features
- more testing

I still maintain that I don't see any standard
requirement between the GENERATED AS IDENTITY
and NEXT VALUE FOR but obviously both
require SEQUENCE as supported feature
in parallel. I can be proven wrong, though,
but please, quote section# and text where
it can be found in the standard.

As for why GENERATED ALWAYS AS IDENTITY
is useful? Consider someone who is coming from
another DBMS (Informix, Access, etc.) where
"INSERT INTO table (id, ...) VALUES (0, ...);"
inserts the next value for the autoincrementer field
instead of 0. Leaving out fields from INSERT is
not allowed in the source because of documentation
reasons and writing DEFAULT is not handy or not
found in that legacy DBMS' features.
Multiply it with N applications that was written
that way over the years of the lifespan of a large
project, count in the human resistance to learn
something new (say 2.5x multiplier, but that may be
under-estimated :-) ) and a feature that help porting
easier will be a cheered feature. IIRC Bruce Momjian
himself wrote in this list that ease-of-use features
can boost PostgreSQL userbase pretty quickly.

So, please, review my patch in it's current state
and decide whether it's a 8.2-worthy feature.

BTW, is there anyone working on COPY FROM ( select ) feature?

Thanks in advance and best regards,
Zoltán Böszörményi


Attachment

Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze

From
Alvaro Herrera
Date:
Zoltan Boszormenyi wrote:

> BTW, is there anyone working on COPY FROM ( select ) feature?

I am, but it's in a too early stage to be in 8.2.  Sorry :-(

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze

From
Rod Taylor
Date:
On Tue, 2006-08-01 at 18:10 +0200, Zoltan Boszormenyi wrote:
> Hi,
>
> I have progressed a bit with my pet project, a.k.a $SUBJECT.
>
> Now GENERATED ALWAYS AS IDENTITY and
> GENERATED ALWAYS AS ( expr ) work as
> intended. Documentation was also extended.

I'm only commenting because I debated trying to implement this feature a
couple of times. The ugliness required for pg_dump put me off of doing
it.

I did not see a test for enforcement during COPY. UPDATE restrictions
appear to have been missed as well:
       4) If <set clause> SC specifies an <object column> that       references a column of which some underlying
columnis either a       generated column or an identity column whose descriptor       indicates that values are always
generated,then the <update       source> specified in SC shall consist of a <default       specification>. 

<object column> is the <update target>, or the left hand side of the
equation. In short, if a column marked GENERATED ALWAYS is updated then
it must be to DEFAULT or not provided as an update target.
       CREATE TABLE tab (col integer GENERATED ALWAYS AS IDENTITY);       UPDATE tab SET col = DEFAULT; -- ACCEPTED
 UPDATE tab SET col = 1; -- ERROR 


For db restoration (pg_dump), how do you restore to the same values as
previously if it is always regenerated? By making ALWAYS a suggestion
for some users instead of always enforced and providing an override
mechanism for it. I assume it only works for relation owners but I've
not figured out how the spec does permissions.
       <override clause> ::=           OVERRIDING USER VALUE         | OVERRIDING SYSTEM VALUE

In short, pg_dump should append OVERRIDING SYSTEM VALUE to any insert or
copy for relations with an GENERATED ALWAYS identity column and the
backend will need to respect that.

ALWAYS is really only enforced for anyone who doesn't have permission to
specify otherwise.


Another one that got me is what do you do if you do this:
       CREATE TABLE tab (col integer);       INSERT INTO tab VALUES (10);       ALTER TABLE tab ALTER col GENERATED
ALWAYSAS IDENTITY; 

What is the value for "tab"."col"? It would seem that the table should
be rewritten with all values for "col" recalculated -- thus it would be
'1'. But wait! Can we add the <override clause> here too to keep the old
values and change the enforcement for new tuples only?


> Some test cases are also included, that shows
> that ALTER TABLE ALTER TYPE keeps both
> the sequence and the GENERATED ALWAYS
> property. Gzipped patch is attached.
>
> Next steps are:
> - pg_dump support
> - more ALTER TABLE support for adding and
>   dropping IDENTITY and GENERATED ALWAYS
>   features
> - more testing
>
> I still maintain that I don't see any standard
> requirement between the GENERATED AS IDENTITY
> and NEXT VALUE FOR but obviously both
> require SEQUENCE as supported feature
> in parallel. I can be proven wrong, though,
> but please, quote section# and text where
> it can be found in the standard.
>
> As for why GENERATED ALWAYS AS IDENTITY
> is useful? Consider someone who is coming from
> another DBMS (Informix, Access, etc.) where
> "INSERT INTO table (id, ...) VALUES (0, ...);"
> inserts the next value for the autoincrementer field
> instead of 0. Leaving out fields from INSERT is
> not allowed in the source because of documentation
> reasons and writing DEFAULT is not handy or not
> found in that legacy DBMS' features.
> Multiply it with N applications that was written
> that way over the years of the lifespan of a large
> project, count in the human resistance to learn
> something new (say 2.5x multiplier, but that may be
> under-estimated :-) ) and a feature that help porting
> easier will be a cheered feature. IIRC Bruce Momjian
> himself wrote in this list that ease-of-use features
> can boost PostgreSQL userbase pretty quickly.
>
> So, please, review my patch in it's current state
> and decide whether it's a 8.2-worthy feature.
>
> BTW, is there anyone working on COPY FROM ( select ) feature?
>
> Thanks in advance and best regards,
> Zoltán Böszörményi
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
--



Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze

From
Zoltan Boszormenyi
Date:
Rod Taylor írta:
> On Tue, 2006-08-01 at 18:10 +0200, Zoltan Boszormenyi wrote:
>   
>> Hi,
>>
>> I have progressed a bit with my pet project, a.k.a $SUBJECT.
>>
>> Now GENERATED ALWAYS AS IDENTITY and
>> GENERATED ALWAYS AS ( expr ) work as
>> intended. Documentation was also extended.
>>     
>
> I'm only commenting because I debated trying to implement this feature a
> couple of times.

Thanks for commenting it.

>  The ugliness required for pg_dump put me off of doing
> it.
>   

I haven't looked into it yet.

> I did not see a test for enforcement during COPY.

That was sort of intended, COPY is expected
to pull back the same record it wrote out.
But see below.

>  UPDATE restrictions
> appear to have been missed as well:
>
>         4) If <set clause> SC specifies an <object column> that
>         references a column of which some underlying column is either a
>         generated column or an identity column whose descriptor
>         indicates that values are always generated, then the <update
>         source> specified in SC shall consist of a <default
>         specification>.
>
> <object column> is the <update target>, or the left hand side of the
> equation. In short, if a column marked GENERATED ALWAYS is updated then
> it must be to DEFAULT or not provided as an update target.
>
>         CREATE TABLE tab (col integer GENERATED ALWAYS AS IDENTITY);
>         UPDATE tab SET col = DEFAULT; -- ACCEPTED
>         UPDATE tab SET col = 1; -- ERROR
>   

Yes, I have also read that detail but not yet implemented it.
I was too happy that I found a straightforward way to make
GENERATED ALWAYS work.

> For db restoration (pg_dump), how do you restore to the same values as
> previously if it is always regenerated? By making ALWAYS a suggestion
> for some users instead of always enforced and providing an override
> mechanism for it. I assume it only works for relation owners but I've
> not figured out how the spec does permissions.
>
>         <override clause> ::=
>             OVERRIDING USER VALUE
>           | OVERRIDING SYSTEM VALUE
>
> In short, pg_dump should append OVERRIDING SYSTEM VALUE to any insert or
> copy for relations with an GENERATED ALWAYS identity column and the
> backend will need to respect that.
>   

Aren't INSERT and COPY distinguished in code paths?
(I don't have too deep knowledge about PostgreSQL internals, yet.)
If they are, OVERRIDING SYSTEM VALUE will be
needed only when pg_dump produces INSERTs.

> ALWAYS is really only enforced for anyone who doesn't have permission to
> specify otherwise.
>
>
> Another one that got me is what do you do if you do this:
>
>         CREATE TABLE tab (col integer);
>         INSERT INTO tab VALUES (10);
>         ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY;
>
> What is the value for "tab"."col"? It would seem that the table should
> be rewritten with all values for "col" recalculated -- thus it would be
> '1'. But wait! Can we add the <override clause> here too to keep the old
> values and change the enforcement for new tuples only?
>   

I don't think we should rewrite existing rows because
when it was inserted, the stored value was valid
according to the rules at that time. What if you
have more than one rows in that table?
Which record gets which value? You cannot know
which record was inserted first because subsequent
updates may ruin that order before the ALTER TABLE.
And recalculating the max value of col isn't too reliable
if another session is also inserting records.

And what about non-unique columns?
Plain SERIALs aren't declared unique automatically, either.
Consider the following:

CREATE TABLE tab (col integer);
INSERT INTO tab VALUES (10);
ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY( MINVALUE 1 MAXVALUE 12 CYCLE );


Here I expect equal values and I don't want
existing rows rewritten.

E.g. if you want a new start value, you will also need to issue
ALTER TABLE tab ALTER col RESTART WITH n;
which I started to implement.

Also, for a unique SERIAL column, you can still
insert a record with an out-of-order number and
one of the INSERTs that reach that number will
fail with unique violation. e.g. it's not a real
autoincrementer field. Or you can alter a
sequence that supports such a column.
PostgreSQL documents both behaviour and
I wanted to keep it.

Thanks for the comments,
Zoltán Böszörményi

>> Some test cases are also included, that shows
>> that ALTER TABLE ALTER TYPE keeps both
>> the sequence and the GENERATED ALWAYS
>> property. Gzipped patch is attached.
>>
>> Next steps are:
>> - pg_dump support
>> - more ALTER TABLE support for adding and
>>   dropping IDENTITY and GENERATED ALWAYS
>>   features
>> - more testing
>>
>> I still maintain that I don't see any standard
>> requirement between the GENERATED AS IDENTITY
>> and NEXT VALUE FOR but obviously both
>> require SEQUENCE as supported feature
>> in parallel. I can be proven wrong, though,
>> but please, quote section# and text where
>> it can be found in the standard.
>>
>> As for why GENERATED ALWAYS AS IDENTITY
>> is useful? Consider someone who is coming from
>> another DBMS (Informix, Access, etc.) where
>> "INSERT INTO table (id, ...) VALUES (0, ...);"
>> inserts the next value for the autoincrementer field
>> instead of 0. Leaving out fields from INSERT is
>> not allowed in the source because of documentation
>> reasons and writing DEFAULT is not handy or not
>> found in that legacy DBMS' features.
>> Multiply it with N applications that was written
>> that way over the years of the lifespan of a large
>> project, count in the human resistance to learn
>> something new (say 2.5x multiplier, but that may be
>> under-estimated :-) ) and a feature that help porting
>> easier will be a cheered feature. IIRC Bruce Momjian
>> himself wrote in this list that ease-of-use features
>> can boost PostgreSQL userbase pretty quickly.
>>
>> So, please, review my patch in it's current state
>> and decide whether it's a 8.2-worthy feature.
>>
>> BTW, is there anyone working on COPY FROM ( select ) feature?
>>
>> Thanks in advance and best regards,
>> Zoltán Böszörményi
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>>     



Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze

From
Rod Taylor
Date:
> > For db restoration (pg_dump), how do you restore to the same values as
> > previously if it is always regenerated? By making ALWAYS a suggestion
> > for some users instead of always enforced and providing an override
> > mechanism for it. I assume it only works for relation owners but I've
> > not figured out how the spec does permissions.
> >
> >         <override clause> ::=
> >             OVERRIDING USER VALUE
> >           | OVERRIDING SYSTEM VALUE
> >
> > In short, pg_dump should append OVERRIDING SYSTEM VALUE to any insert or
> > copy for relations with an GENERATED ALWAYS identity column and the
> > backend will need to respect that.
> >   
> 
> Aren't INSERT and COPY distinguished in code paths?

Yes, they are separate but they also use the same permission set.

Any user can copy into a structure at any time and virtually every
restriction will be applied normally (CHECK, DEFAULT, etc.). Copy
bypasses Rules, significant parsing overhead since there is no need to
look for subselects, and possibly some triggers are bypassed. I'm fairly
sure that foreign key triggers fire.

In short, COPY doesn't bypass enforcement. GENERATED ALWAYS is part of
that enforcement and should be included in that.

If it is not included, we cannot recommend GENERATED ALWAYS for uses
like recording CURRENT_USER in an audit log since the data could be
fudged.

> > ALWAYS is really only enforced for anyone who doesn't have permission to
> > specify otherwise.
> >
> >
> > Another one that got me is what do you do if you do this:
> >
> >         CREATE TABLE tab (col integer);
> >         INSERT INTO tab VALUES (10);
> >         ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY;
> >
> > What is the value for "tab"."col"? It would seem that the table should
> > be rewritten with all values for "col" recalculated -- thus it would be
> > '1'. But wait! Can we add the <override clause> here too to keep the old
> > values and change the enforcement for new tuples only?
> >   
> 
> I don't think we should rewrite existing rows because
> when it was inserted, the stored value was valid
> according to the rules at that time. What if you
> have more than one rows in that table?

SERIAL has, until recently, been described as a macro. A tool for
setting things up quickly but many parts of which can be changed by hand
after-ward. It's not exactly a good source for information on how this
structure should work. For one, you can easily override the suggested
default a serial gives at any time as any user. The intention of ALWAYS
is to prevent exactly that behaviour.

I don't have an opinion on ALTER TABLE changes for this one way or the
other. It was my intention to advise that a group decision is required
and some research into what other databases do in this case. I believe
MSSQL and DB2 both implement this functionality.

Oh, and one more item. These expressions have the same abilities as a
CHECK constraint for referencing other columns.

This example comes from an IBM Guide:
       CREATE TABLE T1(c1 INT, c2 DOUBLE,                        c3 DOUBLE GENERATED ALWAYS AS (c1 + c2),
       c4 SMALLINT GENERATED ALWAYS AS                          (CASE                            WHEN c1 > c2 THEN 1
                        ELSE NULL                          END)                      ); 
 

Here is what IBM has to say about ALTER TABLE and GENERATED ALWAYS:
       GENERATED                Specifies that DB2 generates values for the column.
ALWAYS                                Specifies that DB2 will always generate                               a value for
thecolumn when a row is                               inserted into the table, or whenever the
    result value of the                               generation-expression might change. The
   result of the expression is stored in                               the table. GENERATED ALWAYS is the
               recommended option unless data                               propagation or unload and reload
                  operations are being performed.                               GENERATED ALWAYS is the required option
                             for generated columns.                                                      BY DEFAULT
                          Specifies that DB2 will generate a value                               for the column when a
rowis inserted                               into the table, or updated, specifying
DEFAULTfor the column, unless an                               explicit value is specified. BY DEFAULT
            is the recommended option when using                               data propagation or performing unload
                          and reload operations.                      identity-options                This clause
cannotbe specified when adding a column to               an existing table.                      AS
(generation-expression)               Specifies that the definition of the column is based on               an
expression.Requires that the table be put in check               pending state, using the SET INTEGRITY statement.
After              the ALTER TABLE statement, the SET INTEGRITY statement               with FORCE GENERATED must be
usedto update and check               all the values in that column against the new               expression. For
detailson specifying a column with a               generation-expression, see "CREATE TABLE".
 

I'm not sure what they're describing, actually. It appears they don't
require it to be updated but that the option is available to do so when
the equation changes.

> Which record gets which value? You cannot know

Note that when you add a new column with a default, including nextval of
a sequence, it is expected that those values will be assigned by tuple
order in the table.
       CREATE SEQUENCE foo;       ALTER TABLE tab ADD col integer DEFAULT nextval('foo');

> which record was inserted first because subsequent
> updates may ruin that order before the ALTER TABLE.
> And recalculating the max value of col isn't too reliable
> if another session is also inserting records.
> 
> And what about non-unique columns?

I'm not sure what unique versus non-unique has to do with this. The
question I have is what is the meaning of ALWAYS.

There is never a guarantee that alter table will succeed in all cases.
       ALTER TABLE tab ALTER col TYPE bigint USING 2;

> Plain SERIALs aren't declared unique automatically, either.
> Consider the following:
> 
> CREATE TABLE tab (col integer);
> INSERT INTO tab VALUES (10);
> ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY( MINVALUE 1 MAXVALUE 12 CYCLE );

-- 



Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze

From
Zoltan Boszormenyi
Date:
Rod Taylor írta:
>>> For db restoration (pg_dump), how do you restore to the same values as
>>> previously if it is always regenerated? By making ALWAYS a suggestion
>>> for some users instead of always enforced and providing an override
>>> mechanism for it. I assume it only works for relation owners but I've
>>> not figured out how the spec does permissions.
>>>
>>>         <override clause> ::=
>>>             OVERRIDING USER VALUE
>>>           | OVERRIDING SYSTEM VALUE
>>>
>>> In short, pg_dump should append OVERRIDING SYSTEM VALUE to any insert or
>>> copy for relations with an GENERATED ALWAYS identity column and the
>>> backend will need to respect that.
>>>   
>>>       
>> Aren't INSERT and COPY distinguished in code paths?
>>     
>
> Yes, they are separate but they also use the same permission set.
>
> Any user can copy into a structure at any time and virtually every
> restriction will be applied normally (CHECK, DEFAULT, etc.). Copy
> bypasses Rules, significant parsing overhead since there is no need to
> look for subselects, and possibly some triggers are bypassed. I'm fairly
> sure that foreign key triggers fire.
>
> In short, COPY doesn't bypass enforcement. GENERATED ALWAYS is part of
> that enforcement and should be included in that.
>
> If it is not included, we cannot recommend GENERATED ALWAYS for uses
> like recording CURRENT_USER in an audit log since the data could be
> fudged.
>   

OK. So COPY needs an OVERRIDING close, too, not just INSERT.
In the meantime I implemented the discussed restrictions on
UPDATE for GENERATED ALWAYS columns, allowing
"UPDATE tab SET col = default" only for
GENERATED ALWAYS AS. I also implemented
INSERT ... OVERRIDING { SYSTEM | USER } VALUE.
If I got it correctly, OVERRIDING USER VALUE
seems to be the same as omitting the OVERRIDING clause...

>>> ALWAYS is really only enforced for anyone who doesn't have permission to
>>> specify otherwise.
>>>
>>>
>>> Another one that got me is what do you do if you do this:
>>>
>>>         CREATE TABLE tab (col integer);
>>>         INSERT INTO tab VALUES (10);
>>>         ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY;
>>>
>>> What is the value for "tab"."col"? It would seem that the table should
>>> be rewritten with all values for "col" recalculated -- thus it would be
>>> '1'. But wait! Can we add the <override clause> here too to keep the old
>>> values and change the enforcement for new tuples only?
>>>   
>>>       
>> I don't think we should rewrite existing rows because
>> when it was inserted, the stored value was valid
>> according to the rules at that time. What if you
>> have more than one rows in that table?
>>     
>
> SERIAL has, until recently, been described as a macro. A tool for
> setting things up quickly but many parts of which can be changed by hand
> after-ward. It's not exactly a good source for information on how this
> structure should work. For one, you can easily override the suggested
> default a serial gives at any time as any user. The intention of ALWAYS
> is to prevent exactly that behaviour.
>
> I don't have an opinion on ALTER TABLE changes for this one way or the
> other. It was my intention to advise that a group decision is required
> and some research into what other databases do in this case. I believe
> MSSQL and DB2 both implement this functionality.
>   

I see.

I looked a bit into the TODO entry that's about
ALTER TABLE tab ALTER col RENAME newcol
should also rename the sequence. My question is:
is it legal to call pg_get_serial_sequence() from
src/backen/parser/analyze.c:transformAlterTableStmt()?
This would be the easiest way to issue an
ALTER TABLE oldseq RENAME newseq
command automatically.

And I think I found a bug in PostgreSQL.
If I do this:

create table tab1 (id serial, t text); -- creates tab1_id_seq
create table tab2 (id serial, t text) inherits (tab1); -- creates 
tab2_id_seq
drop table tab1 cascade;

then tab1_id_seq gets also dropped but tab2_id_seq doesn't.
Both 8.1.4 and current 8.2CVS do this.

> Oh, and one more item. These expressions have the same abilities as a
> CHECK constraint for referencing other columns.
>
> This example comes from an IBM Guide:
>
>         CREATE TABLE T1(c1 INT, c2 DOUBLE, 
>                         c3 DOUBLE GENERATED ALWAYS AS (c1 + c2),
>                         c4 SMALLINT GENERATED ALWAYS AS 
>                           (CASE
>                              WHEN c1 > c2 THEN 1 
>                              ELSE NULL
>                            END)
>                        ); 
>
>   

For this to work, we need to lift the restriction
on DEFAULT so cother columns can appear in the
expression. Dependencies must be tracked between
columns so GENERATED ALWAYS columns on UPDATE
and DEFAULT/ GENERATED ALWAYS columns on
INSERT get their computed values. Circular dependencies
must be avoided, etc. Hm.

> Here is what IBM has to say about ALTER TABLE and GENERATED ALWAYS:
>
>         GENERATED 
>                 Specifies that DB2 generates values for the column.
>                 
>                         ALWAYS 
>                                 Specifies that DB2 will always generate
>                                 a value for the column when a row is
>                                 inserted into the table, or whenever the
>                                 result value of the
>                                 generation-expression might change. The
>                                 result of the expression is stored in
>                                 the table. GENERATED ALWAYS is the
>                                 recommended option unless data
>                                 propagation or unload and reload
>                                 operations are being performed.
>                                 GENERATED ALWAYS is the required option
>                                 for generated columns.
>                                 
>                         BY DEFAULT 
>                                 Specifies that DB2 will generate a value
>                                 for the column when a row is inserted
>                                 into the table, or updated, specifying
>                                 DEFAULT for the column, unless an
>                                 explicit value is specified. BY DEFAULT
>                                 is the recommended option when using
>                                 data propagation or performing unload
>                                 and reload operations.
>                 
>         identity-options 
>                 This clause cannot be specified when adding a column to
>                 an existing table.
>   

This is even more strict then SQL2003 which allows
only one IDENTITY column at any time, so this is allowed:

CREATE TABLE tab (id serial, ...);
ALTER TABLE tab DROP id;
ALTER TABLE tab ADD id serial GENERATED AS IDENTITY (...);

I deliberately omitted the check to enforce it.

>                 
>         AS (generation-expression) 
>                 Specifies that the definition of the column is based on
>                 an expression. Requires that the table be put in check
>                 pending state, using the SET INTEGRITY statement. After
>                 the ALTER TABLE statement, the SET INTEGRITY statement
>                 with FORCE GENERATED must be used to update and check
>                 all the values in that column against the new
>                 expression. For details on specifying a column with a
>                 generation-expression, see "CREATE TABLE".
>
> I'm not sure what they're describing, actually. It appears they don't
> require it to be updated but that the option is available to do so when
> the equation changes.
>
>   
>> Which record gets which value? You cannot know
>>     
>
> Note that when you add a new column with a default, including nextval of
> a sequence, it is expected that those values will be assigned by tuple
> order in the table.
>
>         CREATE SEQUENCE foo;
>         ALTER TABLE tab ADD col integer DEFAULT nextval('foo');
>
>   
>> which record was inserted first because subsequent
>> updates may ruin that order before the ALTER TABLE.
>> And recalculating the max value of col isn't too reliable
>> if another session is also inserting records.
>>
>> And what about non-unique columns?
>>     
>
> I'm not sure what unique versus non-unique has to do with this. The
> question I have is what is the meaning of ALWAYS.
>
> There is never a guarantee that alter table will succeed in all cases.
>
>         ALTER TABLE tab ALTER col TYPE bigint USING 2;
>
>   
>> Plain SERIALs aren't declared unique automatically, either.
>> Consider the following:
>>
>> CREATE TABLE tab (col integer);
>> INSERT INTO tab VALUES (10);
>> ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY( MINVALUE 1 MAXVALUE 12 CYCLE );
>>     




Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze

From
Zoltan Boszormenyi
Date:
Hi,

next version follows. Changes:

- Supports OVERRIDING { USER | SYSTEM } VALUE syntax
  not yet documented, I have doubts about USER variant
- UPDATES is forbidden entirely on GENERATED ALWAYS
  AS IDENTITY columns, UPDATE tab SET col = DEFAULT is
  allowed on GENERATED ALWAYS AS ( expr ) columns
- ALTER TABLE tab ALTER col RESTART [WITH] N  and
  ALTER TABLE tab ALTER col SET identity_options are supported
  but not yet documented
- extended the test case but the expected .out wasn't updated
  so 1 out of 101 tests fail.

After exercising with the last one, ALTER tab RENAME to newtab
and ALTER tab RENAME col TO newcol should be easy.
With the introduced infrastructure to correctly support
the first two changes (new column attribute: attidentity)
it is be easy to implement checks to disallow
ALTER TABLE tab DROP DEFAULT on IDENTITY columns.

Best regards,
Zoltán Böszörményi

Zoltan Boszormenyi írta:
> Rod Taylor írta:
>>>> For db restoration (pg_dump), how do you restore to the same values as
>>>> previously if it is always regenerated? By making ALWAYS a suggestion
>>>> for some users instead of always enforced and providing an override
>>>> mechanism for it. I assume it only works for relation owners but I've
>>>> not figured out how the spec does permissions.
>>>>
>>>>         <override clause> ::=
>>>>             OVERRIDING USER VALUE
>>>>           | OVERRIDING SYSTEM VALUE
>>>>
>>>> In short, pg_dump should append OVERRIDING SYSTEM VALUE to any
>>>> insert or
>>>> copy for relations with an GENERATED ALWAYS identity column and the
>>>> backend will need to respect that.
>>>>
>>> Aren't INSERT and COPY distinguished in code paths?
>>>
>>
>> Yes, they are separate but they also use the same permission set.
>>
>> Any user can copy into a structure at any time and virtually every
>> restriction will be applied normally (CHECK, DEFAULT, etc.). Copy
>> bypasses Rules, significant parsing overhead since there is no need to
>> look for subselects, and possibly some triggers are bypassed. I'm fairly
>> sure that foreign key triggers fire.
>>
>> In short, COPY doesn't bypass enforcement. GENERATED ALWAYS is part of
>> that enforcement and should be included in that.
>>
>> If it is not included, we cannot recommend GENERATED ALWAYS for uses
>> like recording CURRENT_USER in an audit log since the data could be
>> fudged.
>>
>
> OK. So COPY needs an OVERRIDING close, too, not just INSERT.
> In the meantime I implemented the discussed restrictions on
> UPDATE for GENERATED ALWAYS columns, allowing
> "UPDATE tab SET col = default" only for
> GENERATED ALWAYS AS. I also implemented
> INSERT ... OVERRIDING { SYSTEM | USER } VALUE.
> If I got it correctly, OVERRIDING USER VALUE
> seems to be the same as omitting the OVERRIDING clause...
>
>>>> ALWAYS is really only enforced for anyone who doesn't have
>>>> permission to
>>>> specify otherwise.
>>>>
>>>>
>>>> Another one that got me is what do you do if you do this:
>>>>
>>>>         CREATE TABLE tab (col integer);
>>>>         INSERT INTO tab VALUES (10);
>>>>         ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY;
>>>>
>>>> What is the value for "tab"."col"? It would seem that the table should
>>>> be rewritten with all values for "col" recalculated -- thus it
>>>> would be
>>>> '1'. But wait! Can we add the <override clause> here too to keep
>>>> the old
>>>> values and change the enforcement for new tuples only?
>>>>
>>> I don't think we should rewrite existing rows because
>>> when it was inserted, the stored value was valid
>>> according to the rules at that time. What if you
>>> have more than one rows in that table?
>>>
>>
>> SERIAL has, until recently, been described as a macro. A tool for
>> setting things up quickly but many parts of which can be changed by hand
>> after-ward. It's not exactly a good source for information on how this
>> structure should work. For one, you can easily override the suggested
>> default a serial gives at any time as any user. The intention of ALWAYS
>> is to prevent exactly that behaviour.
>>
>> I don't have an opinion on ALTER TABLE changes for this one way or the
>> other. It was my intention to advise that a group decision is required
>> and some research into what other databases do in this case. I believe
>> MSSQL and DB2 both implement this functionality.
>>
>
> I see.
>
> I looked a bit into the TODO entry that's about
> ALTER TABLE tab ALTER col RENAME newcol
> should also rename the sequence. My question is:
> is it legal to call pg_get_serial_sequence() from
> src/backen/parser/analyze.c:transformAlterTableStmt()?
> This would be the easiest way to issue an
> ALTER TABLE oldseq RENAME newseq
> command automatically.
>
> And I think I found a bug in PostgreSQL.
> If I do this:
>
> create table tab1 (id serial, t text); -- creates tab1_id_seq
> create table tab2 (id serial, t text) inherits (tab1); -- creates
> tab2_id_seq
> drop table tab1 cascade;
>
> then tab1_id_seq gets also dropped but tab2_id_seq doesn't.
> Both 8.1.4 and current 8.2CVS do this.
>
>> Oh, and one more item. These expressions have the same abilities as a
>> CHECK constraint for referencing other columns.
>>
>> This example comes from an IBM Guide:
>>
>>         CREATE TABLE T1(c1 INT, c2 DOUBLE,                         c3
>> DOUBLE GENERATED ALWAYS AS (c1 + c2),
>>                         c4 SMALLINT GENERATED ALWAYS AS
>>                           (CASE
>>                              WHEN c1 > c2 THEN 1
>>                              ELSE NULL
>>                            END)
>>                        );
>>
>
> For this to work, we need to lift the restriction
> on DEFAULT so cother columns can appear in the
> expression. Dependencies must be tracked between
> columns so GENERATED ALWAYS columns on UPDATE
> and DEFAULT/ GENERATED ALWAYS columns on
> INSERT get their computed values. Circular dependencies
> must be avoided, etc. Hm.
>
>> Here is what IBM has to say about ALTER TABLE and GENERATED ALWAYS:
>>
>>         GENERATED                 Specifies that DB2 generates values
>> for the column.
>>                                         ALWAYS
>>                                 Specifies that DB2 will always generate
>>                                 a value for the column when a row is
>>                                 inserted into the table, or whenever the
>>                                 result value of the
>>                                 generation-expression might change. The
>>                                 result of the expression is stored in
>>                                 the table. GENERATED ALWAYS is the
>>                                 recommended option unless data
>>                                 propagation or unload and reload
>>                                 operations are being performed.
>>                                 GENERATED ALWAYS is the required option
>>                                 for generated columns.
>>                                                         BY DEFAULT
>>                                 Specifies that DB2 will generate a value
>>                                 for the column when a row is inserted
>>                                 into the table, or updated, specifying
>>                                 DEFAULT for the column, unless an
>>                                 explicit value is specified. BY DEFAULT
>>                                 is the recommended option when using
>>                                 data propagation or performing unload
>>                                 and reload operations.
>>                         identity-options                 This clause
>> cannot be specified when adding a column to
>>                 an existing table.
>>
>
> This is even more strict then SQL2003 which allows
> only one IDENTITY column at any time, so this is allowed:
>
> CREATE TABLE tab (id serial, ...);
> ALTER TABLE tab DROP id;
> ALTER TABLE tab ADD id serial GENERATED AS IDENTITY (...);
>
> I deliberately omitted the check to enforce it.
>
>>                         AS (generation-expression)
>> Specifies that the definition of the column is based on
>>                 an expression. Requires that the table be put in check
>>                 pending state, using the SET INTEGRITY statement. After
>>                 the ALTER TABLE statement, the SET INTEGRITY statement
>>                 with FORCE GENERATED must be used to update and check
>>                 all the values in that column against the new
>>                 expression. For details on specifying a column with a
>>                 generation-expression, see "CREATE TABLE".
>>
>> I'm not sure what they're describing, actually. It appears they don't
>> require it to be updated but that the option is available to do so when
>> the equation changes.
>>
>>
>>> Which record gets which value? You cannot know
>>>
>>
>> Note that when you add a new column with a default, including nextval of
>> a sequence, it is expected that those values will be assigned by tuple
>> order in the table.
>>
>>         CREATE SEQUENCE foo;
>>         ALTER TABLE tab ADD col integer DEFAULT nextval('foo');
>>
>>
>>> which record was inserted first because subsequent
>>> updates may ruin that order before the ALTER TABLE.
>>> And recalculating the max value of col isn't too reliable
>>> if another session is also inserting records.
>>>
>>> And what about non-unique columns?
>>>
>>
>> I'm not sure what unique versus non-unique has to do with this. The
>> question I have is what is the meaning of ALWAYS.
>>
>> There is never a guarantee that alter table will succeed in all cases.
>>
>>         ALTER TABLE tab ALTER col TYPE bigint USING 2;
>>
>>
>>> Plain SERIALs aren't declared unique automatically, either.
>>> Consider the following:
>>>
>>> CREATE TABLE tab (col integer);
>>> INSERT INTO tab VALUES (10);
>>> ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY( MINVALUE 1
>>> MAXVALUE 12 CYCLE );
>>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Attachment