Thread: IDENTITY/GENERATED columns

IDENTITY/GENERATED columns

From
Zoltan Boszormenyi
Date:
Hi,

my last patch didn't make it to the -hackers list,
here's a newer one. Let me list what this patch does now:

- CREATE TABLE/ALTER TABLE ADD syntax support for
    GENERATED { ALWAYS | BY DEFAULT } AS
     { ( expr ) | IDENTITY ( seq_opts ) }
- catalog indicators of the above properties
- INSERT|COPY syntax support for OVERRIDING { SYSTEM | USER } VALUE
- INSERT|COPY fails for non-owner when using OVERRIDING SYSTEM VALUE
    on tables with GENERATED ALWAYS columns
- UPDATE fails when using other than DEFAULT literal for
    GENERATED ALWAYS columns
- GRANT {INSERT|UPDATE|ALL} ON table automatically
    gives UPDATE permission for the supporting sequence
    (missing: ALTER TABLE ADD should give permission
     for the new sequence)
- ALTER TABLE tab ALTER col { SET seq_opts | RESTART [WITH] N }
    syntax support to alter the supporting sequence
- ALTER TABLE tab RENAME also renames the supporting sequence
    on both table and column renaming
- ALTER TABLE SET/DROP default is disallowed
- pg_dump support for exporting the above properties including
    sequence parameters. Data dump uses OVERRIDING SYSTEM VALUE
    when the table has GENERATED ALWAYS columns
- test cases for some operations
- documented

With this version, I mostly covered these TODO entries:


     * Add support for SQL-standard GENERATED/IDENTITY columns (almost
       done :-) )
     * %Disallow changing DEFAULT expression of a SERIAL column? (done)
     * %Disallow ALTER SEQUENCE changes for SERIAL sequences because
       pg_dump does not dump the changes
       (pg_dump dumps the sequence options)
     * Add DEFAULT .. AS OWNER so permission checks are done as the table
       owner
       This would be useful for SERIAL nextval() calls and CHECK
constraints.
       (GRANT TABLE grants UPDATE on the supporting sequence,
        ALTER TABLE ADD COLUMN will be implemented)
     * %Have ALTER TABLE RENAME rename SERIAL sequence names (done)
     * Allow SERIAL sequences to inherit permissions from the base table?
       (not needed because INSERT is a fastpath, permissions added
        with GRANT TABLE, and [will be] extended on ALTER TABLE ADD COLUMN)

I am considering using ALTER TABLE tab ALTER col TYPE
to add (and drop) GENERATED ALWAYS and GENERATED AS IDENTITY
properties to existing columns. As it stands now, ALTER TYPE
doesn't change these column attributes.

Please, review.

Best regards,
Zoltán Böszörményi




Attachment

Re: IDENTITY/GENERATED columns

From
Zoltan Boszormenyi
Date:
Hi,

here's the next, changes are at the end.

Zoltan Boszormenyi írta:
> Hi,
>
> my last patch didn't make it to the -hackers list,
> here's a newer one. Let me list what this patch does now:
>
> - CREATE TABLE/ALTER TABLE ADD syntax support for
>     GENERATED { ALWAYS | BY DEFAULT } AS
>      { ( expr ) | IDENTITY ( seq_opts ) }
> - catalog indicators of the above properties
> - INSERT|COPY syntax support for OVERRIDING { SYSTEM | USER } VALUE
> - INSERT|COPY fails for non-owner when using OVERRIDING SYSTEM VALUE
>     on tables with GENERATED ALWAYS columns
> - UPDATE fails when using other than DEFAULT literal for
>     GENERATED ALWAYS columns
> - GRANT {INSERT|UPDATE|ALL} ON table automatically
>     gives UPDATE permission for the supporting sequence
>     (missing: ALTER TABLE ADD should give permission
>      for the new sequence)
> - ALTER TABLE tab ALTER col { SET seq_opts | RESTART [WITH] N }
>     syntax support to alter the supporting sequence
> - ALTER TABLE tab RENAME also renames the supporting sequence
>     on both table and column renaming
> - ALTER TABLE SET/DROP default is disallowed
> - pg_dump support for exporting the above properties including
>     sequence parameters. Data dump uses OVERRIDING SYSTEM VALUE
>     when the table has GENERATED ALWAYS columns
> - test cases for some operations
> - documented
>
> With this version, I mostly covered these TODO entries:
>
>
>     * Add support for SQL-standard GENERATED/IDENTITY columns (almost
>       done :-) )
>     * %Disallow changing DEFAULT expression of a SERIAL column? (done)
>     * %Disallow ALTER SEQUENCE changes for SERIAL sequences because
>       pg_dump does not dump the changes
>       (pg_dump dumps the sequence options)
>     * Add DEFAULT .. AS OWNER so permission checks are done as the table
>       owner
>       This would be useful for SERIAL nextval() calls and CHECK
> constraints.
>       (GRANT TABLE grants UPDATE on the supporting sequence,
>        ALTER TABLE ADD COLUMN will be implemented)
>     * %Have ALTER TABLE RENAME rename SERIAL sequence names (done)
>     * Allow SERIAL sequences to inherit permissions from the base table?
>       (not needed because INSERT is a fastpath, permissions added
>        with GRANT TABLE, and [will be] extended on ALTER TABLE ADD
> COLUMN)
>
> I am considering using ALTER TABLE tab ALTER col TYPE
> to add (and drop) GENERATED ALWAYS and GENERATED AS IDENTITY
> properties to existing columns. As it stands now, ALTER TYPE
> doesn't change these column attributes.
>
> Please, review.
>
> Best regards,
> Zoltán Böszörményi


Changes from previous version:

- ALTER TABLE ADD col type GENERATED AS IDENTITY
  adds permission over the newly created sequence
  to those who already have INSERT or UPDATE on the table.
- ALTER TABLE SET/DROP DEFAULT also disallowed on
  GENERATED ALWAYS columns
- some codepaths were consolidated to avoid source bloat

A remaining larger problem is that
GENERATED ALWAYS AS (expr) should allow column
references from the same table, etc. just like CHECK.
For the sake of generality, DEFAULT must have
the same features, too. E.g. evaluating DEFAULT
value should be done as a second phase, after
evaluating other columns with given constants or
from DEFAULT that doesn't depend on other columns.
Circular dependencies must be avoided, etc.

I talked about implementin setting and dropping
GENERATED / IDENTITY properties. I cooked up this syntax:

ALTER TABLE tab ALTER col SET GENERATED ALWAYS AS ( expr )
    Behaves the same as "SET DEFAULT expr" but sets attforceddef.
ALTER TABLE tab ALTER col SET GENERATED ALWAYS AS IDENTITY [( seq_opts )]
    It should create supporting sequence iff the column
    isn't already an IDENTITY column.
ALTER TABLE tab ALTER col DROP IDENTITY
    It should be spelled loud instead of DROP DEFAULT.

That leaves ALTER TABLE ALTER TYPE alone.

Then a short check showed that changing such attributes
on a column isn't defined by SQL2003. The nice thing about
not implementing the above is that it avoids the debate
whether SET GENERATED ALWAYS AS [IDENTITY]
should also rewrite the records.

My answer in the debate would be that
GENERATED ALWAYS AS (expr)
should naturally do that but it isn't clear at all for
GENERATED ALWAYS AS IDENTITY

Closing words: I think it is now ready for acceptance,
it does all what I wanted it to do and it conforms to
SQL2003. Maybe I introduced some memory leaks
but I tried to avoid it by carefully inspecting what other
functions do. And some documentation details can
also be improved. But I think the code is ready.


Please, review.

Thanks and best regards,
Zoltán Böszörményi


Attachment

Re: IDENTITY/GENERATED columns

From
Zoltan Boszormenyi
Date:
Hi,

here's the next version. Changes:
- Extended documentation
- Extending permissions to new sequences
  ALTER TABLE tab ADD col type GENERATED  AS IDENTITY
  didn't work as advertised, now it seems to.
- Test case was also extended.
- Previously introduced memory leaks were plugged. Really.

Now the only feature missing is the previously discussed
GENERATED ALWAYS AS ( expr ) so it can be used like this:

CREATE TABLE tab (
    c1 double,
    c2 integer,
    c3 double GENERATED ALWAYS AS ( col1 + col2),
    c4 SMALLINT GENERATED ALWAYS AS
                (CASE WHEN c1 > c2 THEN 1 ELSE NULL END)
);

What should the following code produce as a result?

INSERT INTO tab (c1, c2, c3, c4) VALUES (1.1, 2, 0, 0);

This should insert (1.1, 2, 3.1, NULL)

UPDATE tab SET c2 = 1;

Only c2 changes, so: (1.1, 1, 3.1, NULL)
Or should it change to (1.1, 1, 2.1, 1),
e.g. recompute all columns that depend on
changed columns?

UPDATE tab SET c4 = DEFAULT, c3 = DEFAULT, c2 = 2, c1 = 3.5;

Now what? It should be (3.5, 2, 5.5, 1)
But based on current UPDATE behaviour,
e.g. values gets computed based on previous
values, it becomes (3.5, 2, 2.1, 1)

That would really need changing the behaviour of UPDATE.
Currently, if I do an

UPDATE tab SET c1 = 3.5, c2 = 2, c3 = c1 + c2;

then c3 gets its value based on the previous content
of the record. For the above GENERATED ALWAYS
AS (expr) construct to work, UPDATE have to compute
the column values in multipass, something like this:

constant values are computed;
while (is there any non-computed columns)
{
    newly_computed = 0;
    foreach (column, non-computed-columns)
    {
       if (column value depends only on computed columns)
       {
          compute it;
          newly_computed++;
       }
    }
    if (newly_computed == 0)
       elog(ERROR, "circular dependency");
}

This behaviour change would enable something like this:
CREATE tab2 (c1 integer, c2 integer, c3 integer);
INSERT INTO tab2 (c1,c2,c3) VALUES (1, 2, c1 + c2);

Does this described behaviour have any precedent or
standard compliance?

Best regards,
Zoltán Böszörményi


Attachment

Re: IDENTITY/GENERATED columns

From
Bruce Momjian
Date:
This is being done for 8.3, right?

---------------------------------------------------------------------------

Zoltan Boszormenyi wrote:
> Hi,
>
> here's the next version. Changes:
> - Extended documentation
> - Extending permissions to new sequences
>   ALTER TABLE tab ADD col type GENERATED  AS IDENTITY
>   didn't work as advertised, now it seems to.
> - Test case was also extended.
> - Previously introduced memory leaks were plugged. Really.
>
> Now the only feature missing is the previously discussed
> GENERATED ALWAYS AS ( expr ) so it can be used like this:
>
> CREATE TABLE tab (
>     c1 double,
>     c2 integer,
>     c3 double GENERATED ALWAYS AS ( col1 + col2),
>     c4 SMALLINT GENERATED ALWAYS AS
>                 (CASE WHEN c1 > c2 THEN 1 ELSE NULL END)
> );
>
> What should the following code produce as a result?
>
> INSERT INTO tab (c1, c2, c3, c4) VALUES (1.1, 2, 0, 0);
>
> This should insert (1.1, 2, 3.1, NULL)
>
> UPDATE tab SET c2 = 1;
>
> Only c2 changes, so: (1.1, 1, 3.1, NULL)
> Or should it change to (1.1, 1, 2.1, 1),
> e.g. recompute all columns that depend on
> changed columns?
>
> UPDATE tab SET c4 = DEFAULT, c3 = DEFAULT, c2 = 2, c1 = 3.5;
>
> Now what? It should be (3.5, 2, 5.5, 1)
> But based on current UPDATE behaviour,
> e.g. values gets computed based on previous
> values, it becomes (3.5, 2, 2.1, 1)
>
> That would really need changing the behaviour of UPDATE.
> Currently, if I do an
>
> UPDATE tab SET c1 = 3.5, c2 = 2, c3 = c1 + c2;
>
> then c3 gets its value based on the previous content
> of the record. For the above GENERATED ALWAYS
> AS (expr) construct to work, UPDATE have to compute
> the column values in multipass, something like this:
>
> constant values are computed;
> while (is there any non-computed columns)
> {
>     newly_computed = 0;
>     foreach (column, non-computed-columns)
>     {
>        if (column value depends only on computed columns)
>        {
>           compute it;
>           newly_computed++;
>        }
>     }
>     if (newly_computed == 0)
>        elog(ERROR, "circular dependency");
> }
>
> This behaviour change would enable something like this:
> CREATE tab2 (c1 integer, c2 integer, c3 integer);
> INSERT INTO tab2 (c1,c2,c3) VALUES (1, 2, c1 + c2);
>
> Does this described behaviour have any precedent or
> standard compliance?
>
> Best regards,
> Zolt?n B?sz?rm?nyi
>

[ application/x-tar is not supported, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: IDENTITY/GENERATED columns

From
Zoltan Boszormenyi
Date:
Yes, I am not ready with it.

Bruce Momjian írta:
> This is being done for 8.3, right?
>
> ---------------------------------------------------------------------------
>
> Zoltan Boszormenyi wrote:
>
>> Hi,
>>
>> here's the next version. Changes:
>> - Extended documentation
>> - Extending permissions to new sequences
>>   ALTER TABLE tab ADD col type GENERATED  AS IDENTITY
>>   didn't work as advertised, now it seems to.
>> - Test case was also extended.
>> - Previously introduced memory leaks were plugged. Really.
>>
>> Now the only feature missing is the previously discussed
>> GENERATED ALWAYS AS ( expr ) so it can be used like this:
>>
>> CREATE TABLE tab (
>>     c1 double,
>>     c2 integer,
>>     c3 double GENERATED ALWAYS AS ( col1 + col2),
>>     c4 SMALLINT GENERATED ALWAYS AS
>>                 (CASE WHEN c1 > c2 THEN 1 ELSE NULL END)
>> );
>>
>> What should the following code produce as a result?
>>
>> INSERT INTO tab (c1, c2, c3, c4) VALUES (1.1, 2, 0, 0);
>>
>> This should insert (1.1, 2, 3.1, NULL)
>>
>> UPDATE tab SET c2 = 1;
>>
>> Only c2 changes, so: (1.1, 1, 3.1, NULL)
>> Or should it change to (1.1, 1, 2.1, 1),
>> e.g. recompute all columns that depend on
>> changed columns?
>>
>> UPDATE tab SET c4 = DEFAULT, c3 = DEFAULT, c2 = 2, c1 = 3.5;
>>
>> Now what? It should be (3.5, 2, 5.5, 1)
>> But based on current UPDATE behaviour,
>> e.g. values gets computed based on previous
>> values, it becomes (3.5, 2, 2.1, 1)
>>
>> That would really need changing the behaviour of UPDATE.
>> Currently, if I do an
>>
>> UPDATE tab SET c1 = 3.5, c2 = 2, c3 = c1 + c2;
>>
>> then c3 gets its value based on the previous content
>> of the record. For the above GENERATED ALWAYS
>> AS (expr) construct to work, UPDATE have to compute
>> the column values in multipass, something like this:
>>
>> constant values are computed;
>> while (is there any non-computed columns)
>> {
>>     newly_computed = 0;
>>     foreach (column, non-computed-columns)
>>     {
>>        if (column value depends only on computed columns)
>>        {
>>           compute it;
>>           newly_computed++;
>>        }
>>     }
>>     if (newly_computed == 0)
>>        elog(ERROR, "circular dependency");
>> }
>>
>> This behaviour change would enable something like this:
>> CREATE tab2 (c1 integer, c2 integer, c3 integer);
>> INSERT INTO tab2 (c1,c2,c3) VALUES (1, 2, c1 + c2);
>>
>> Does this described behaviour have any precedent or
>> standard compliance?
>>
>> Best regards,
>> Zolt?n B?sz?rm?nyi
>>
>>
>
> [ application/x-tar is not supported, skipping... ]
>
>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>
>