partitioning and identity column - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | partitioning and identity column |
Date | |
Msg-id | CAExHW5uOykuTC+C6R1yDSp=o8Q83jr8xJdZxgPkxfZ1Ue5RRGg@mail.gmail.com Whole thread Raw |
Responses |
Re: partitioning and identity column
|
List | pgsql-hackers |
Hi All, Reading [1] I have been experimenting with behaviour of identity columns and serial column in case of partitioned tables. My observations related to serial column can be found at [2]. This email is about identity column behaviour with partitioned tables. Serial and identity columns have sufficiently different behaviour and implementation to have separate discussions on their behaviour. I don't want to mix this with [1] since that thread is about replacing serial with identity. The discussion in this and [2] will be useful to drive [1] forward. Behaviour 1 ========= If a partitioned table has an identity column, the partitions do not inherit identity property. #create table tpart (a int generated always as identity primary key, src varchar) partition by range(a); #create table t_p1 partition of tpart for values from (1) to (3); #\d tpart Partitioned table "public.tpart" Column | Type | Collation | Nullable | Default --------+-------------------+-----------+----------+------------------------------ a | integer | | not null | generated always as identity src | character varying | | | Partition key: RANGE (a) Indexes: "tpart_pkey" PRIMARY KEY, btree (a) Number of partitions: 2 (Use \d+ to list them.) #\d t_p1 Table "public.t_p1" Column | Type | Collation | Nullable | Default --------+-------------------+-----------+----------+--------- a | integer | | not null | src | character varying | | | Partition of: tpart FOR VALUES FROM (1) TO (3) Indexes: "t_p1_pkey" PRIMARY KEY, btree (a) Notice that the default column of t_p1. This means that a direct INSERT into partition will fail if it does not specify value for the identity column. As a consequence such a value may conflict with an existing value or a future value of the identity column. In the example, the identity column is a primary key and also a partition key, thus the conflict would result in an error. But when it's not a partition key (and hence a primary key), it will just allow those conflicting values. Behaviour 2 ========= If a table being attached as a partition to a partitioned table and both of them have column with same name as identity column, the ATTACH succeeds and allow both tables to use different sequences. #create table t_p5 (a int primary key, b int generated always as identity, src varchar); #alter table tpart attach partition t_p5 for values from (7) to (9); #\d t_p5 Table "public.t_p5" Column | Type | Collation | Nullable | Default --------+-------------------+-----------+----------+------------------------------ a | integer | | not null | b | integer | | not null | generated always as identity src | character varying | | | Partition of: tpart FOR VALUES FROM (7) TO (9) Indexes: "t_p5_pkey" PRIMARY KEY, btree (a) As a consequence a direct INSERT into the partition will result in a value for identity column which conflicts with an existing value or a future value in the partitioned table. Again, if the identity column in a primary key (and partition key), the conflicting INSERT will fail. But otherwise, the conflicting values will go unnoticed. I consulted Vik Fearing, offline, about SQL standard's take on identity columns in partitioned table. SQL standard does not specify partitioned tables as a separate entity. Thus a partitioned table is at par with a regular table. Hence an identity column in partitioned table should share the same identity space across all the partitions. Behaviour 3 ========= We allow identity column to be added to a partitioned table which doesn't have partitions but we disallow adding identity column to a partitioned table which has partitions. #create table tpart (a int primary key, src varchar) partition by range(a); #create table t_p1 partition of tpart for values from (1) to (3); #alter table tpart add column b int generated always as identity; ERROR: cannot recursively add identity column to table that has child tables I don't understand why is that prohibited. If we allow partitions to be added to a partitioned table with identity column, we should allow an identity column to be added to a partitioned table with partitions. Behaviour 4 ========= Even though we don't allow an identity column to be added to a partitioned table with partitions, we allow an existing column to be converted to an identity column in such a table. #create table tpart (a int primary key, src varchar) partition by range(a); #create table t_p1 partition of tpart for values from (1) to (3); #create table t_p2 partition of tpart for values from (3) to (5); #alter table tpart alter column a add generated always as identity; #\d tpart Partitioned table "public.tpart" Column | Type | Collation | Nullable | Default --------+-------------------+-----------+----------+------------------------------ a | integer | | not null | generated always as identity src | character varying | | | Partition key: RANGE (a) Indexes: "tpart_pkey" PRIMARY KEY, btree (a) Number of partitions: 2 (Use \d+ to list them.) #\d t_p1 Table "public.t_p1" Column | Type | Collation | Nullable | Default --------+-------------------+-----------+----------+--------- a | integer | | not null | src | character varying | | | Partition of: tpart FOR VALUES FROM (1) TO (3) Indexes: "t_p1_pkey" PRIMARY KEY, btree (a) Behaviour 3 and 4 are conflicting with each other themselves. I think we should fix these anomalies as follows 1. Allow identity columns to be added to the partitioned table irrespective of whether they have partitions of not. 2. Propagate identity property to partitions. 3. Use the same underlying sequence for getting default value of an identity column when INSERTing directly in a partition. 4. Disallow attaching a partition with identity column. 1 will fix inconsistencies in Behaviour 3 and 4. 2 and 3 will fix anomalies in Behaviour 1. 4 will fix Behaviour 2. Note on point 3: The current implementation uses pg_depend to find the sequence associated with the identity column. We don't necessarily need to add dependencies for individual partitions though. Instead we could use the dependency on the partitioned table itself. I haven't checked feasibility of this option, but it makes things simpler esp. for DETACH and DROP of partition. Note on point 4: The proposal again simplifies DETACH and DROP. If we decide to somehow coalesce the identity columns of partition and partitioned table, it would make DETACH and DROP complex. Also if the identity column of the partition being attached is not identity column in partition table, INSERT on partition table would fail in case of ALWAYS. Of course the risk is we will break backward compatibility. But given that the current behaviour is quite erroneous, I doubt if there are users relying on this behaviour. Thoughts? [1] https://www.postgresql.org/message-id/flat/70be435b-05db-06f2-7c01-9bb8ee2fccce%40enterprisedb.com [2] https://www.postgresql.org/message-id/CAExHW5toAsjc7uwSeSzX6sgvktFxsv7pd606zP6DnTX7Y6O4jg@mail.gmail.com -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: