Thread: future of serial and identity columns

future of serial and identity columns

From

Peter Eisentraut

Date:

04 October 2022, 07:41:19

In PostgreSQL 10, we added identity columns, as an alternative to serial 
columns (since 6.something).  They mostly work the same.  Identity 
columns are SQL-conforming, have some more features (e.g., overriding 
clause), and are a bit more robust in schema management.  Some of that 
was described in [0].  AFAICT, there have been no complaints since that 
identity columns lack features or are somehow a regression over serial 
columns.

But clearly, the syntax "serial" is more handy, and most casual examples 
use that syntax.  So it seems like we are stuck with maintaining these 
two variants in parallel forever.  I was thinking we could nudge this a 
little by remapping "serial" internally to create an identity column 
instead.  At least then over time, the use of the older serial 
mechanisms would go away.

Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE + 
ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT).  So if we 
did this, any existing databases would keep their old semantics, and 
those who really need it can manually create the old semantics as well.

Attached is a demo patch how the implementation of this change would 
look like.  This creates a bunch of regression test failures, but 
AFAICT, those are mainly display differences and some very peculiar test 
setups that are intentionally examining some edge cases.  These would 
need to be investigated in more detail, of course.


[0]: 
https://www.enterprisedb.com/blog/postgresql-10-identity-columns-explained

Attachment

0001-WIP-Change-serial-types-to-map-to-identity-columns.patch

Re: future of serial and identity columns

From

Laurenz Albe

Date:

04 October 2022, 09:33:15

On Tue, 2022-10-04 at 09:41 +0200, Peter Eisentraut wrote:
> In PostgreSQL 10, we added identity columns, as an alternative to serial 
> columns (since 6.something).  They mostly work the same.  Identity 
> columns are SQL-conforming, have some more features (e.g., overriding 
> clause), and are a bit more robust in schema management.  Some of that 
> was described in [0].  AFAICT, there have been no complaints since that 
> identity columns lack features or are somehow a regression over serial 
> columns.
> 
> But clearly, the syntax "serial" is more handy, and most casual examples 
> use that syntax.  So it seems like we are stuck with maintaining these 
> two variants in parallel forever.  I was thinking we could nudge this a 
> little by remapping "serial" internally to create an identity column 
> instead.  At least then over time, the use of the older serial 
> mechanisms would go away.

I think that would be great.
That might generate some confusion among users who follow old tutorials
and are surprised that the eventual table definition differs, but I'd say
that is a good thing.

Yours,
Laurenz Albe

Re: future of serial and identity columns

From

Bruce Momjian

Date:

05 October 2022, 21:26:52

On Tue, Oct  4, 2022 at 09:41:19AM +0200, Peter Eisentraut wrote:
> In PostgreSQL 10, we added identity columns, as an alternative to serial
> columns (since 6.something).  They mostly work the same.  Identity columns
> are SQL-conforming, have some more features (e.g., overriding clause), and
> are a bit more robust in schema management.  Some of that was described in
> [0].  AFAICT, there have been no complaints since that identity columns lack
> features or are somehow a regression over serial columns.

FYI, SERIAL came from Informix syntax, and it was already a macro, so
making it a different macro seems fine.  ;-)

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: future of serial and identity columns

From

Vik Fearing

Date:

07 October 2022, 12:02:52

On 10/4/22 09:41, Peter Eisentraut wrote:
> In PostgreSQL 10, we added identity columns, as an alternative to serial 
> columns (since 6.something).  They mostly work the same.  Identity 
> columns are SQL-conforming, have some more features (e.g., overriding 
> clause), and are a bit more robust in schema management.  Some of that 
> was described in [0].  AFAICT, there have been no complaints since that 
> identity columns lack features or are somehow a regression over serial 
> columns.
> 
> But clearly, the syntax "serial" is more handy, and most casual examples 
> use that syntax.  So it seems like we are stuck with maintaining these 
> two variants in parallel forever.  I was thinking we could nudge this a 
> little by remapping "serial" internally to create an identity column 
> instead.  At least then over time, the use of the older serial 
> mechanisms would go away.
> 
> Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE + 
> ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT).  So if we 
> did this, any existing databases would keep their old semantics, and 
> those who really need it can manually create the old semantics as well.
> 
> Attached is a demo patch how the implementation of this change would 
> look like.  This creates a bunch of regression test failures, but 
> AFAICT, those are mainly display differences and some very peculiar test 
> setups that are intentionally examining some edge cases.  These would 
> need to be investigated in more detail, of course.

I haven't tested the patch yet, just read it.

Is there any reason to use BY DEFAULT over ALWAYS?  I tend to prefer the 
latter.
-- 
Vik Fearing

Re: future of serial and identity columns

From

Magnus Hagander

Date:

07 October 2022, 12:07:34

On Fri, Oct 7, 2022 at 2:03 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 10/4/22 09:41, Peter Eisentraut wrote:
> In PostgreSQL 10, we added identity columns, as an alternative to serial
> columns (since 6.something). They mostly work the same. Identity
> columns are SQL-conforming, have some more features (e.g., overriding
> clause), and are a bit more robust in schema management. Some of that
> was described in [0]. AFAICT, there have been no complaints since that
> identity columns lack features or are somehow a regression over serial
> columns.
>
> But clearly, the syntax "serial" is more handy, and most casual examples
> use that syntax. So it seems like we are stuck with maintaining these
> two variants in parallel forever. I was thinking we could nudge this a
> little by remapping "serial" internally to create an identity column
> instead. At least then over time, the use of the older serial
> mechanisms would go away.
>
> Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE +
> ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT). So if we
> did this, any existing databases would keep their old semantics, and
> those who really need it can manually create the old semantics as well.
>
> Attached is a demo patch how the implementation of this change would
> look like. This creates a bunch of regression test failures, but
> AFAICT, those are mainly display differences and some very peculiar test
> setups that are intentionally examining some edge cases. These would
> need to be investigated in more detail, of course.

I haven't tested the patch yet, just read it.

Is there any reason to use BY DEFAULT over ALWAYS? I tend to prefer the
latter.

I would assume to maintain backwards compatibility with the semantics of SERIAL today?

I do also prefer ALWAYS, but that would make it a compatibility break.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: future of serial and identity columns

From

Peter Eisentraut

Date:

11 October 2022, 07:59:58

On 04.10.22 09:41, Peter Eisentraut wrote:
> Attached is a demo patch how the implementation of this change would 
> look like.  This creates a bunch of regression test failures, but 
> AFAICT, those are mainly display differences and some very peculiar test 
> setups that are intentionally examining some edge cases.  These would 
> need to be investigated in more detail, of course.

The feedback was pretty positive, so I dug through all the tests to at 
least get to the point where I could see the end of it.  The attached 
patch 0001 is the actual code and documentation changes.  The 0002 patch 
is just tests randomly updated or disabled to make the whole suite pass. 
  This reveals that there are a few things that would warrant further 
investigation, in particular around extensions and partitioning.  To be 
continued.

Attachment

Re: future of serial and identity columns

From

Corey Huinker

Date:

12 October 2022, 06:22:31

The feedback was pretty positive, so I dug through all the tests to at
least get to the point where I could see the end of it. The attached
patch 0001 is the actual code and documentation changes. The 0002 patch
is just tests randomly updated or disabled to make the whole suite pass.
This reveals that there are a few things that would warrant further
investigation, in particular around extensions and partitioning. To be
continued.

I like what I see so far!

Question: the xref refers the reader to sql-createtable, which is a pretty big page, which could leave the reader lost. Would it make sense to create a SQL-CREATETABLE-IDENTITY anchor and link to that instead?

Re: future of serial and identity columns

From

Peter Eisentraut

Date:

12 October 2022, 07:13:13

On 12.10.22 08:22, Corey Huinker wrote:
> Question: the xref  refers the reader to sql-createtable, which is a 
> pretty big page, which could leave the reader lost. Would it make sense 
> to create a SQL-CREATETABLE-IDENTITY anchor and link to that instead?

Yes, I think that would be good.

Re: future of serial and identity columns

From

Alvaro Herrera

Date:

12 October 2022, 08:05:05

On 2022-Oct-11, Peter Eisentraut wrote:

> diff --git a/src/test/modules/test_ddl_deparse/expected/alter_table.out
b/src/test/modules/test_ddl_deparse/expected/alter_table.out
> index 87a1ab7aabce..30e3dbb8d08a 100644
> --- a/src/test/modules/test_ddl_deparse/expected/alter_table.out
> +++ b/src/test/modules/test_ddl_deparse/expected/alter_table.out
> @@ -25,12 +25,9 @@ NOTICE:  DDL test: type simple, tag CREATE TABLE
>  CREATE TABLE grandchild () INHERITS (child);
>  NOTICE:  DDL test: type simple, tag CREATE TABLE
>  ALTER TABLE parent ADD COLUMN b serial;
> -NOTICE:  DDL test: type simple, tag CREATE SEQUENCE
> -NOTICE:  DDL test: type alter table, tag ALTER TABLE
> -NOTICE:    subcommand: type ADD COLUMN (and recurse) desc column b of table parent
> -NOTICE:  DDL test: type simple, tag ALTER SEQUENCE
> +ERROR:  cannot recursively add identity column to table that has child tables

I think this change merits some discussion.  Surely we cannot simply
disallow SERIAL from being used with inheritance.  Do we need to have
a way for identity columns to be used by children tables?

(My first thought was "let's keep SERIAL as the old code when used for
inheritance", but then I realized that the parent table starts as a
normal-looking table that only later acquires inheritors, so we wouldn't
know ahead of time that we need to treat that SERIAL column in a special
way.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La vida es para el que se aventura"