Thread: identity columns

identity columns

From

Peter Eisentraut

Date:

31 August 2016, 04:00:58

Here is another attempt to implement identity columns. This is a
standard-conforming variant of PostgreSQL's serial columns. It also
fixes a few usability issues that serial columns have:

- need to set permissions on sequence in addition to table (*)
- CREATE TABLE / LIKE copies default but refers to same sequence
- cannot add/drop serialness with ALTER TABLE
- dropping default does not drop sequence
- slight weirdnesses because serial is some kind of special macro

(*) Not actually implemented yet, because I wanted to make use of the
NEXT VALUE FOR stuff I had previously posted, but I have more work to do
there.

There have been previous attempts at this between 2003 and 2007. The
latest effort failed because it tried to implement identity columns and
generated columns in one go, but then discovered that they have wildly
different semantics. But AFAICT, there weren't any fundamental issues
with the identity parts of the patch.

Here some interesting threads of old:

https://www.postgresql.org/message-id/flat/xzp1xw2x5jo.fsf%40dwp.des.no#xzp1xw2x5jo.fsf@dwp.des.no
https://www.postgresql.org/message-id/flat/1146360362.839.104.camel%40home#1146360362.839.104.camel@home
https://www.postgresql.org/message-id/23436.1149629297%40sss.pgh.pa.us
https://www.postgresql.org/message-id/flat/448C9B7A.6010000%40dunaweb.hu#448C9B7A.6010000@dunaweb.hu
https://www.postgresql.org/message-id/flat/45DACD1E.2000207%40dunaweb.hu#45DACD1E.2000207@dunaweb.hu
https://www.postgresql.org/message-id/flat/18812.1178572575%40sss.pgh.pa.us

Some comments on the implementation, and where it differs from previous
patches:

- The new attidentity column stores whether a column is an identity
column and when it is generated (always/by default). I kept this
independent from atthasdef mainly for he benefit of existing (client)
code querying those catalogs, but I kept confusing myself by this, so
I'm not sure about that choice. Basically we need to store four
distinct states (nothing, default, identity always, identity by default)
somehow.

- The way attidentity is initialized in some places is a bit hackish. I
haven't focused on that so much without having a clear resolution to the
previous question.

- One previous patch managed to make GENERATED an unreserved key word.
I have it reserved right now, but perhaps with a bit more trying it can
be unreserved.

- I did not implement the restriction of one identity column per table.
That didn't seem necessary.

- I did not implement an override clause on COPY. If COPY supplies a
value, it is always used.

- pg_dump is as always a mess. Some of that is because of the way
pg_upgrade works: It only gives out one OID at a time, so you need to
create the table and sequences in separate entries.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

identity-columns.patch

Re: identity columns

From

Vitaly Burovoy

Date:

07 September 2016, 12:09:20

Hello,

The first look at the patch:

On 8/30/16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> Here is another attempt to implement identity columns.  This is a
> standard-conforming variant of PostgreSQL's serial columns.
>
> ...
>
> Some comments on the implementation, and where it differs from previous
> patches:
>
> - The new attidentity column stores whether a column is an identity
> column and when it is generated (always/by default).  I kept this
> independent from atthasdef mainly for he benefit of existing (client)
> code querying those catalogs, but I kept confusing myself by this, so
> I'm not sure about that choice.  Basically we need to store four
> distinct states (nothing, default, identity always, identity by default)
> somehow.

I don't have a string opinion which way is preferred. I think if the
community is not against it, it can be left as is.

> ...
> - I did not implement the restriction of one identity column per table.
> That didn't seem necessary.

I think it should be mentioned in docs' "Compatibility" part as a PG's
extension (similar to "Zero-column Tables").

> ...
>
> --
> Peter Eisentraut              http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Questions:
1. Is your patch implements T174 feature? Should a corresponding line
be changed in src/backend/catalog/sql_features.txt?
2. Initializing attidentity in most places is ' ' but makefuncs.c has
"n->identity = 0;". Is it correct?
3. doc/src/sgml/ref/create_table.sgml (5th chunk) has "TODO". Why?
4. There is "ADD GENERATED", but the standard says it should be "SET
GENERATED" (ISO/IEC 9075-2 Subcl.11.20)
5. In ATExecAddIdentity: is it a good idea to check whether
"attTup->attidentity" is the same as the given in "(ADD) SET
GENERATED" and do nothing (except changing sequence's options) in
addition to strict checking for "unset" (" ")?
6. In ATExecDropIdentity: is it a good idea to do nothing if the
column is already not a identity (the same behavior as DROP NOT
NULL/DROP DEFAULT)?
7. Is there any reason to insert CREATE_TABLE_LIKE_IDENTITY before
CREATE_TABLE_LIKE_INDEXES, not at the end?

Why do you change catversion.h? It can lead conflict when other
patches influence it are committed...

I'll have a closer look soon.

-- 
Best regards,
Vitaly Burovoy

Re: identity columns

From

Vitaly Burovoy

Date:

10 September 2016, 03:45:47

Hello Peter,

I have reviewed the patch.
Currently it does not applies at the top of master, the last commit
without a conflict is 975768f

It compiles and passes "make check" tests, but fails with "make check-world" at:
test foreign_data             ... FAILED

It tries to implement SQL:2011 feature T174 ("Identity columns"):
* column definition;
* column altering;
* inserting clauses "OVERRIDING {SYSTEM|USER} VALUE".

It has documentation changes.


===
The implementation has several distinctions from the standard:

1. The standard requires "... ALTER COLUMN ... SET GENERATED { ALWAYS
| BY DEFAULT }" (9075-2:2011 subcl 11.20), but the patch implements it
as "... ALTER COLUMN ... ADD GENERATED { ALWAYS | BY DEFAULT } AS
IDENTITY"

2. The standard requires not more than one identity column, the patch
does not follow that requirement, but it does not mentioned in the
doc.

3. Changes in the table "information_schema.columns" is not full. Some
required columns still empty for identity columns:
* identity_start
* identity_increment
* identity_maximum
* identity_minimum
* identity_cycle

4. "<alter identity column specification>" is not fully implemented
because "<set identity column generation clause>" is implemented
whereas "<alter identity column option>" is not.

5. According to 9075-2:2011 subcl 14.11 Syntax Rule 11)c) for a column
with an indication that values are generated by default the only
possible "<override clause>" is "OVERRIDING USER VALUE".
Implementation allows to use "OVERRIDING SYSTEM VALUE" (as "do
nothing"), but it should be mentioned in "Compatibility" part in the
doc.

postgres=# CREATE TABLE itest10 (a int generated BY DEFAULT as
identity PRIMARY KEY, b text);
CREATE TABLE
postgres=# INSERT INTO itest10 overriding SYSTEM value SELECT 10, 'a';
INSERT 0 1
postgres=# SELECT * FROM itest10;a  | b
---+---10 | a
(1 row)

===

6. "CREATE TABLE ... (LIKE ... INCLUDING ALL)" fails Assertion  at
src/backend/commands/tablecmds.c:631
postgres=# CREATE TABLE itest1 (a int generated by default as identity, b text);
CREATE TABLE
postgres=# CREATE TABLE x(LIKE itest1 INCLUDING ALL);
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: Failed.


===

Also the implementation has several flaws in corner cases:


7. Changing default is allowed but a column is still "identity":

postgres=# CREATE TABLE itest4 (a int GENERATED ALWAYS AS IDENTITY, b text);
CREATE TABLE
postgres=# ALTER TABLE itest4 ALTER COLUMN a set DEFAULT 1;
ALTER TABLE
postgres=# \d itest4                       Table "public.itest4"Column |  Type   |                    Modifiers
--------+---------+--------------------------------------------------a      | integer | not null default 1  generated
alwaysas identityb      | text    |
 


---
8. Changing a column to be "identity" raises "duplicate key" exception:

postgres=# CREATE TABLE itest4 (a serial, b text);
CREATE TABLE
postgres=# ALTER TABLE itest4 ALTER COLUMN a ADD GENERATED ALWAYS AS IDENTITY;
ERROR:  duplicate key value violates unique constraint
"pg_attrdef_adrelid_adnum_index"


---
9. Changing type of a column deletes linked sequence but leaves
"default" and "identity" marks:

postgres=# CREATE TABLE itest4 (a int GENERATED ALWAYS AS IDENTITY, b int);
CREATE TABLE
postgres=# ALTER TABLE itest4 ALTER COLUMN a TYPE text;
ALTER TABLE
postgres=# \d itest4;                               Table "public.itest4"Column |  Type   |
Modifiers
--------+---------+------------------------------------------------------------------a      | text    | default
nextval('16445'::regclass) generated
 
always as identityb      | integer |

postgres=# insert into itest4(b) values(1);
ERROR:  could not open relation with OID 16445
postgres=# select * from itest4;a | b
---+---
(0 rows)


---
10. "identity" modifier is lost when the table inherits another one:

postgres=# CREATE TABLE itest_err_1 (a int);
CREATE TABLE
postgres=# CREATE TABLE x (a int GENERATED ALWAYS AS
IDENTITY)inherits(itest_err_1);
NOTICE:  merging column "a" with inherited definition
CREATE TABLE
postgres=# \d itest_err_1; \d x; Table "public.itest_err_1"Column |  Type   | Modifiers
--------+---------+-----------a      | integer |
Number of child tables: 1 (Use \d+ to list them.)
                        Table "public.x"Column |  Type   |                   Modifiers
--------+---------+-----------------------------------------------a      | integer | not null default
nextval('x_a_seq'::regclass)
Inherits: itest_err_1


---
11. The documentation says "OVERRIDING ... VALUE" can be placed even
before "DEFAULT VALUES", but it is against SQL spec and the
implementation:

postgres=# CREATE TABLE itest10 (a int GENERATED BY DEFAULT AS
IDENTITY, b text);
CREATE TABLE
postgres=# INSERT INTO itest10 DEFAULT VALUES;
INSERT 0 1
postgres=# INSERT INTO itest10 OVERRIDING USER VALUE VALUES(1,2);
INSERT 0 1
postgres=# INSERT INTO itest10 OVERRIDING USER VALUE DEFAULT VALUES;
ERROR:  syntax error at or near "DEFAULT" at character 43


---
12. Dump/restore is broken for some cases:

postgres=# CREATE SEQUENCE itest1_a_seq;
CREATE SEQUENCE
postgres=# CREATE TABLE itest1 (a int generated by default as identity, b text);
CREATE TABLE
postgres=# DROP SEQUENCE itest1_a_seq;
DROP SEQUENCE
postgres=# CREATE DATABASE a;
CREATE DATABASE
postgres=# \q

comp ~ $ pg_dump postgres | psql a
SET
SET
SET
SET
SET
SET
SET
SET
COMMENT
CREATE EXTENSION
COMMENT
SET
SET
SET
CREATE TABLE
ALTER TABLE
ALTER TABLE
COPY 0
ERROR:  relation "itest1_a_seq1" does not exist
LINE 1: SELECT pg_catalog.setval('itest1_a_seq1', 2, true);


---
13. doc/src/sgml/ref/create_table.sgml (5th chunk) has "TODO". Why?

---
14. It would be fine if psql has support of new clauses.


===
Also several notes:

15. Initializing attidentity in most places is ' ' but makefuncs.c has
"n->identity = 0;". Is it correct?

---
16. I think it is a good idea to not raise exceptions for "SET
GENERATED/DROP IDENTITY" if a column has the same type of identity/not
an identity. To be consistent with "SET/DROP NOT NULL".


-- 
Best regards,
Vitaly Burovoy

Re: identity columns

From

Peter Eisentraut

Date:

12 September 2016, 21:02:15

Thank you for this extensive testing.  I will work on getting the bugs
fixed.  Just a couple of comments on some of your points:

On 9/9/16 11:45 PM, Vitaly Burovoy wrote:
> It compiles and passes "make check" tests, but fails with "make check-world" at:
> test foreign_data             ... FAILED

I do not see that.  You can you show the diffs?

> 1. The standard requires "... ALTER COLUMN ... SET GENERATED { ALWAYS
> | BY DEFAULT }" (9075-2:2011 subcl 11.20), but the patch implements it
> as "... ALTER COLUMN ... ADD GENERATED { ALWAYS | BY DEFAULT } AS
> IDENTITY"

SET and ADD are two different things.  The SET command just changes the
parameters of the underlying sequence.  This can be implemented later
and doesn't seem so important now.  The ADD command is not in the
standard, but I needed it for pg_dump, mainly.  I will need to document
this.

> 14. It would be fine if psql has support of new clauses.

What do you mean by that?  Tab completion?

> 16. I think it is a good idea to not raise exceptions for "SET
> GENERATED/DROP IDENTITY" if a column has the same type of identity/not
> an identity. To be consistent with "SET/DROP NOT NULL".

These behaviors are per SQL standard.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: identity columns

From

Vitaly Burovoy

Date:

12 September 2016, 23:59:44

On 9/12/16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> Thank you for this extensive testing.  I will work on getting the bugs
> fixed.  Just a couple of comments on some of your points:
>
> On 9/9/16 11:45 PM, Vitaly Burovoy wrote:
>> It compiles and passes "make check" tests, but fails with "make
>> check-world" at:
>> test foreign_data             ... FAILED
>
> I do not see that.  You can you show the diffs?

I can't reproduce it, it is my fault, may be I did not clean build dir.

>> 1. The standard requires "... ALTER COLUMN ... SET GENERATED { ALWAYS
>> | BY DEFAULT }" (9075-2:2011 subcl 11.20), but the patch implements it
>> as "... ALTER COLUMN ... ADD GENERATED { ALWAYS | BY DEFAULT } AS
>> IDENTITY"
>
> SET and ADD are two different things.  The SET command just changes the
> parameters of the underlying sequence.

Well... As for me ADD is used when you can add more than one property
of the same kind to a relation (e.g. column or constraint), but SET is
used when you change something and it replaces previous state (e.g.
NOT NULL, DEFAULT, STORAGE, SCHEMA, TABLESPACE etc.)

You can't set ADD more than one IDENTITY to a column, so it should be "SET".

> This can be implemented later and doesn't seem so important now.

Hmm. Now you're passing params to CreateSeqStmt because they are the same.
Is it hard to pass them to AlterSeqStmt (if there is no SET GENERATED")?

> The ADD command is not in the standard, but I needed it for pg_dump, mainly.
> I will need to document this.

Firstly, why to introduce new grammar which differs from the standard
instead of follow the standard?
Secondly, I see no troubles to follow the standard:
src/bin/pg_dump/pg_dump.c:
- "ALTER COLUMN %s ADD GENERATED ",
+ "ALTER COLUMN %s SET GENERATED ",

src/backend/parser/gram.y:
- /* ALTER TABLE <name> ALTER [COLUMN] <colname> ADD GENERATED ... AS
IDENTITY ... */
- | ALTER opt_column ColId ADD_P GENERATED generated_when AS
IDENTITY_P OptParenthesizedSeqOptList
-     c->options = $9;
+ /* ALTER TABLE <name> ALTER [COLUMN] <colname> SET GENERATED ... */
+ | ALTER opt_column ColId SET GENERATED generated_when OptSeqOptList
-     c->options = $7;

I guess "ALTER opt_column ColId SET OptSeqOptList" is easy to be
implemented, after some research "ALTER opt_column ColId RESTART [WITH
...]" also can be added.

(and reflected in the docs)

>> 14. It would be fine if psql has support of new clauses.
>
> What do you mean by that?  Tab completion?

Yes, I'm about it. Or tab completion is usually developed later?

>> 16. I think it is a good idea to not raise exceptions for "SET
>> GENERATED/DROP IDENTITY" if a column has the same type of identity/not
>> an identity. To be consistent with "SET/DROP NOT NULL".
>
> These behaviors are per SQL standard.

Can you point to concrete rule(s) in the standard?

I could not find it in ISO/IEC 9075-2 subclauses 11.20 "<alter
identity column specification>" and 11.21 "<drop identity property
clause>".
Only subclause 4.15.11 "Identity columns" says "The columns of a base
table BT can optionally include not more than one identity column."
(which you don't follow).

For instance, subclause 11.42 <drop character set statement>, General
Rules p.1 says explicitly about exception.
Or (for columns): 11.4 <column definition>, General Rules p.3: "The
<column name> in the <column definition> SHALL NOT be equivalent to
the <column name> of any other column of T."

> --
> Peter Eisentraut              http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Several additional thoughts:
1. I think it is wise to add ability to set name of a sequence (as
PG's extension of the standard) to SET GENERATED or GENERATED in a
relation definition (something like CONSTRAINTs names), without it it
is hard to fix conflicts with other sequences (e.g. from serial pseudo
type) and manual changes of the sequence ("alter seq rename", "alter
seq set schema" etc.).
2. Is it useful to rename sequence linked with identity constraint
when table is renamed (similar way when sequence moves to another
schema following the linked table)?
3. You're setting OWNER to a sequence, but what about USAGE privilege
to roles have INSERT/UPDATE privileges to the table? For security
reasons table is owned by a role different from roles which are using
the table (to prevent changing its definition).

-- 
Best regards,
Vitaly Burovoy

Re: identity columns

From

Robert Haas

Date:

28 September 2016, 17:42:09

On Mon, Sep 12, 2016 at 5:02 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Thank you for this extensive testing.  I will work on getting the bugs
> fixed.

It looks like the patch has not been updated; since the CommitFest is
(hopefully) wrapping up, I am marking this "Returned with Feedback"
for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: identity columns

From

Peter Eisentraut

Date:

01 November 2016, 04:00:52

New patch.

On 9/9/16 11:45 PM, Vitaly Burovoy wrote:
> 1. The standard requires "... ALTER COLUMN ... SET GENERATED { ALWAYS
> | BY DEFAULT }" (9075-2:2011 subcl 11.20), but the patch implements it
> as "... ALTER COLUMN ... ADD GENERATED { ALWAYS | BY DEFAULT } AS
> IDENTITY"

Has both now.  They do different things, as documented.

> 2. The standard requires not more than one identity column, the patch
> does not follow that requirement, but it does not mentioned in the
> doc.

fixed

> 3. Changes in the table "information_schema.columns" is not full.

fixed

> 4. "<alter identity column specification>" is not fully implemented
> because "<set identity column generation clause>" is implemented
> whereas "<alter identity column option>" is not.

done

> 5. According to 9075-2:2011 subcl 14.11 Syntax Rule 11)c) for a column
> with an indication that values are generated by default the only
> possible "<override clause>" is "OVERRIDING USER VALUE".
> Implementation allows to use "OVERRIDING SYSTEM VALUE" (as "do
> nothing"), but it should be mentioned in "Compatibility" part in the
> doc.

done (documented)

> 6. "CREATE TABLE ... (LIKE ... INCLUDING ALL)" fails Assertion  at
> src/backend/commands/tablecmds.c:631

fixed

> 7. Changing default is allowed but a column is still "identity":

fixed

> 8. Changing a column to be "identity" raises "duplicate key" exception:

fixed

> 9. Changing type of a column deletes linked sequence but leaves
> "default" and "identity" marks:

fixed

> 10. "identity" modifier is lost when the table inherits another one:

fixed, but I invite more testing of inheritance-related things

> 11. The documentation says "OVERRIDING ... VALUE" can be placed even
> before "DEFAULT VALUES", but it is against SQL spec and the
> implementation:

fixed

> 12. Dump/restore is broken for some cases:

fixed

> 13. doc/src/sgml/ref/create_table.sgml (5th chunk) has "TODO". Why?

fixed

> 14. It would be fine if psql has support of new clauses.

done

> 15. Initializing attidentity in most places is ' ' but makefuncs.c has
> "n->identity = 0;". Is it correct?

fixed

> 16. I think it is a good idea to not raise exceptions for "SET
> GENERATED/DROP IDENTITY" if a column has the same type of identity/not
> an identity. To be consistent with "SET/DROP NOT NULL".

The present behavior is per SQL standard.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v2-identity-columns.patch

Re: identity columns

From

Haribabu Kommi

Date:

22 November 2016, 11:51:50

Hi Vitaly,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.

But you haven't shared your review yet on the new patch in this commitfest.

Please share your review about the patch. This will help us in smoother

operation of commitfest.

Please Ignore if you already shared your review.

Regards,

Hari Babu

Fujitsu Australia

Re: identity columns

From

Haribabu Kommi

Date:

05 December 2016, 02:37:35

On Tue, Nov 22, 2016 at 10:51 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Hi Vitaly,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet on the new patch in this commitfest.
Please share your review about the patch. This will help us in smoother
operation of commitfest.

Please Ignore if you already shared your review.

Moved to next CF with "needs review" status.

Regards,

Hari Babu

Fujitsu Australia