Thread: future of serial and identity columns
In PostgreSQL 10, we added identity columns, as an alternative to serial columns (since 6.something). They mostly work the same. Identity columns are SQL-conforming, have some more features (e.g., overriding clause), and are a bit more robust in schema management. Some of that was described in [0]. AFAICT, there have been no complaints since that identity columns lack features or are somehow a regression over serial columns. But clearly, the syntax "serial" is more handy, and most casual examples use that syntax. So it seems like we are stuck with maintaining these two variants in parallel forever. I was thinking we could nudge this a little by remapping "serial" internally to create an identity column instead. At least then over time, the use of the older serial mechanisms would go away. Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE + ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT). So if we did this, any existing databases would keep their old semantics, and those who really need it can manually create the old semantics as well. Attached is a demo patch how the implementation of this change would look like. This creates a bunch of regression test failures, but AFAICT, those are mainly display differences and some very peculiar test setups that are intentionally examining some edge cases. These would need to be investigated in more detail, of course. [0]: https://www.enterprisedb.com/blog/postgresql-10-identity-columns-explained
Attachment
On Tue, 2022-10-04 at 09:41 +0200, Peter Eisentraut wrote: > In PostgreSQL 10, we added identity columns, as an alternative to serial > columns (since 6.something). They mostly work the same. Identity > columns are SQL-conforming, have some more features (e.g., overriding > clause), and are a bit more robust in schema management. Some of that > was described in [0]. AFAICT, there have been no complaints since that > identity columns lack features or are somehow a regression over serial > columns. > > But clearly, the syntax "serial" is more handy, and most casual examples > use that syntax. So it seems like we are stuck with maintaining these > two variants in parallel forever. I was thinking we could nudge this a > little by remapping "serial" internally to create an identity column > instead. At least then over time, the use of the older serial > mechanisms would go away. I think that would be great. That might generate some confusion among users who follow old tutorials and are surprised that the eventual table definition differs, but I'd say that is a good thing. Yours, Laurenz Albe
On Tue, Oct 4, 2022 at 09:41:19AM +0200, Peter Eisentraut wrote: > In PostgreSQL 10, we added identity columns, as an alternative to serial > columns (since 6.something). They mostly work the same. Identity columns > are SQL-conforming, have some more features (e.g., overriding clause), and > are a bit more robust in schema management. Some of that was described in > [0]. AFAICT, there have been no complaints since that identity columns lack > features or are somehow a regression over serial columns. FYI, SERIAL came from Informix syntax, and it was already a macro, so making it a different macro seems fine. ;-) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
On 10/4/22 09:41, Peter Eisentraut wrote: > In PostgreSQL 10, we added identity columns, as an alternative to serial > columns (since 6.something). They mostly work the same. Identity > columns are SQL-conforming, have some more features (e.g., overriding > clause), and are a bit more robust in schema management. Some of that > was described in [0]. AFAICT, there have been no complaints since that > identity columns lack features or are somehow a regression over serial > columns. > > But clearly, the syntax "serial" is more handy, and most casual examples > use that syntax. So it seems like we are stuck with maintaining these > two variants in parallel forever. I was thinking we could nudge this a > little by remapping "serial" internally to create an identity column > instead. At least then over time, the use of the older serial > mechanisms would go away. > > Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE + > ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT). So if we > did this, any existing databases would keep their old semantics, and > those who really need it can manually create the old semantics as well. > > Attached is a demo patch how the implementation of this change would > look like. This creates a bunch of regression test failures, but > AFAICT, those are mainly display differences and some very peculiar test > setups that are intentionally examining some edge cases. These would > need to be investigated in more detail, of course. I haven't tested the patch yet, just read it. Is there any reason to use BY DEFAULT over ALWAYS? I tend to prefer the latter. -- Vik Fearing
On Fri, Oct 7, 2022 at 2:03 PM Vik Fearing <vik@postgresfriends.org> wrote:
On 10/4/22 09:41, Peter Eisentraut wrote:
> In PostgreSQL 10, we added identity columns, as an alternative to serial
> columns (since 6.something). They mostly work the same. Identity
> columns are SQL-conforming, have some more features (e.g., overriding
> clause), and are a bit more robust in schema management. Some of that
> was described in [0]. AFAICT, there have been no complaints since that
> identity columns lack features or are somehow a regression over serial
> columns.
>
> But clearly, the syntax "serial" is more handy, and most casual examples
> use that syntax. So it seems like we are stuck with maintaining these
> two variants in parallel forever. I was thinking we could nudge this a
> little by remapping "serial" internally to create an identity column
> instead. At least then over time, the use of the older serial
> mechanisms would go away.
>
> Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE +
> ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT). So if we
> did this, any existing databases would keep their old semantics, and
> those who really need it can manually create the old semantics as well.
>
> Attached is a demo patch how the implementation of this change would
> look like. This creates a bunch of regression test failures, but
> AFAICT, those are mainly display differences and some very peculiar test
> setups that are intentionally examining some edge cases. These would
> need to be investigated in more detail, of course.
I haven't tested the patch yet, just read it.
Is there any reason to use BY DEFAULT over ALWAYS? I tend to prefer the
latter.
I would assume to maintain backwards compatibility with the semantics of SERIAL today?
I do also prefer ALWAYS, but that would make it a compatibility break.
On 04.10.22 09:41, Peter Eisentraut wrote: > Attached is a demo patch how the implementation of this change would > look like. This creates a bunch of regression test failures, but > AFAICT, those are mainly display differences and some very peculiar test > setups that are intentionally examining some edge cases. These would > need to be investigated in more detail, of course. The feedback was pretty positive, so I dug through all the tests to at least get to the point where I could see the end of it. The attached patch 0001 is the actual code and documentation changes. The 0002 patch is just tests randomly updated or disabled to make the whole suite pass. This reveals that there are a few things that would warrant further investigation, in particular around extensions and partitioning. To be continued.
Attachment
The feedback was pretty positive, so I dug through all the tests to at
least get to the point where I could see the end of it. The attached
patch 0001 is the actual code and documentation changes. The 0002 patch
is just tests randomly updated or disabled to make the whole suite pass.
This reveals that there are a few things that would warrant further
investigation, in particular around extensions and partitioning. To be
continued.
I like what I see so far!
Question: the xref refers the reader to sql-createtable, which is a pretty big page, which could leave the reader lost. Would it make sense to create a SQL-CREATETABLE-IDENTITY anchor and link to that instead?
Question: the xref refers the reader to sql-createtable, which is a pretty big page, which could leave the reader lost. Would it make sense to create a SQL-CREATETABLE-IDENTITY anchor and link to that instead?
On 12.10.22 08:22, Corey Huinker wrote: > Question: the xref refers the reader to sql-createtable, which is a > pretty big page, which could leave the reader lost. Would it make sense > to create a SQL-CREATETABLE-IDENTITY anchor and link to that instead? Yes, I think that would be good.
On 2022-Oct-11, Peter Eisentraut wrote: > diff --git a/src/test/modules/test_ddl_deparse/expected/alter_table.out b/src/test/modules/test_ddl_deparse/expected/alter_table.out > index 87a1ab7aabce..30e3dbb8d08a 100644 > --- a/src/test/modules/test_ddl_deparse/expected/alter_table.out > +++ b/src/test/modules/test_ddl_deparse/expected/alter_table.out > @@ -25,12 +25,9 @@ NOTICE: DDL test: type simple, tag CREATE TABLE > CREATE TABLE grandchild () INHERITS (child); > NOTICE: DDL test: type simple, tag CREATE TABLE > ALTER TABLE parent ADD COLUMN b serial; > -NOTICE: DDL test: type simple, tag CREATE SEQUENCE > -NOTICE: DDL test: type alter table, tag ALTER TABLE > -NOTICE: subcommand: type ADD COLUMN (and recurse) desc column b of table parent > -NOTICE: DDL test: type simple, tag ALTER SEQUENCE > +ERROR: cannot recursively add identity column to table that has child tables I think this change merits some discussion. Surely we cannot simply disallow SERIAL from being used with inheritance. Do we need to have a way for identity columns to be used by children tables? (My first thought was "let's keep SERIAL as the old code when used for inheritance", but then I realized that the parent table starts as a normal-looking table that only later acquires inheritors, so we wouldn't know ahead of time that we need to treat that SERIAL column in a special way.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La vida es para el que se aventura"