Thread: BUG #14952: COPY fails to fill in IDENTITY column default value

BUG #14952: COPY fails to fill in IDENTITY column default value

From
steven.winfield@cantabcapital.com
Date:
The following bug has been logged on the website:

Bug reference:      14952
Logged by:          Steven Winfield
Email address:      steven.winfield@cantabcapital.com
PostgreSQL version: 10.0
Operating system:   Linux
Description:

COPYing data to a table with an IDENTITY column, where the column's value
isn't specified in the copied input, fails because COPY attempts to insert a
NULL value for the column:

test=# CREATE TABLE identity_test (id bigint GENERATED ALWAYS AS IDENTITY,
name text);
test=# COPY identity_test (name) FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> foo
>> \.
ERROR:  null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, foo).
CONTEXT:  COPY identity_test, line 1: "foo"

Compare this with a serial column:

test=# CREATE TABLE serial_test (id bigserial, name text);
test=# COPY serial_test (name) FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> foo
>> \.
COPY 1


test=# \d identity_test
                     Table "public.identity_test"
 Column |  Type  | Collation | Nullable |           Default
--------+--------+-----------+----------+------------------------------
 id     | bigint |           | not null | generated always as identity
 name   | text   |           |          |

test=# \d serial_test
                            Table "public.serial_test"
 Column |  Type  | Collation | Nullable |                 Default
--------+--------+-----------+----------+-----------------------------------------
 id     | bigint |           | not null |
nextval('serial_test_id_seq'::regclass)
 name   | text   |           |          |



Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Peter Eisentraut
Date:
On 12/7/17 06:29, steven.winfield@cantabcapital.com wrote:
> COPYing data to a table with an IDENTITY column, where the column's value
> isn't specified in the copied input, fails because COPY attempts to insert a
> NULL value for the column:

That indeed appears to be a bug.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Michael Paquier
Date:
On Fri, Dec 8, 2017 at 2:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 12/7/17 06:29, steven.winfield@cantabcapital.com wrote:
>> COPYing data to a table with an IDENTITY column, where the column's value
>> isn't specified in the copied input, fails because COPY attempts to insert a
>> NULL value for the column:
>
> That indeed appears to be a bug.

That's a bug. When doing a COPY with or without a list of columns, and
that a column is not listed and has a default expression, then this
expression is used. This is a role filled in by build_column_default()
but identity columns need to create a NextValueExpr expression
instead. As this expression is missing, the backend complains about a
NULL input for this column, which is logic without an expression.
Attached is a patch with a regression test.

Peter, what do you think?

Thanks,
-- 
Michael

Attachment

Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Peter Eisentraut
Date:
On 12/7/17 20:37, Michael Paquier wrote:
> On Fri, Dec 8, 2017 at 2:13 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 12/7/17 06:29, steven.winfield@cantabcapital.com wrote:
>>> COPYing data to a table with an IDENTITY column, where the column's value
>>> isn't specified in the copied input, fails because COPY attempts to insert a
>>> NULL value for the column:
>>
>> That indeed appears to be a bug.
> 
> That's a bug. When doing a COPY with or without a list of columns, and
> that a column is not listed and has a default expression, then this
> expression is used. This is a role filled in by build_column_default()
> but identity columns need to create a NextValueExpr expression
> instead. As this expression is missing, the backend complains about a
> NULL input for this column, which is logic without an expression.
> Attached is a patch with a regression test.
> 
> Peter, what do you think?

Committed.  I moved the tests to identity.sql though.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Fri, Dec 8, 2017 at 2:13 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> That indeed appears to be a bug.

> That's a bug. When doing a COPY with or without a list of columns, and
> that a column is not listed and has a default expression, then this
> expression is used. This is a role filled in by build_column_default()
> but identity columns need to create a NextValueExpr expression
> instead. As this expression is missing, the backend complains about a
> NULL input for this column, which is logic without an expression.
> Attached is a patch with a regression test.

Now that I look more closely at this patch, isn't it fixing things
in the wrong place?  Why is it that COPY needs to know about this,
rather than build_column_default()?  Aren't we going to have to
fix every other caller of build_column_default()?

For that matter, should build_column_default() know about it explicitly
either?  Why aren't we implementing IDENTITY by storing an appropriate
default expression in the catalogs?

            regards, tom lane


Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Michael Paquier
Date:
On Thu, Dec 14, 2017 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> For that matter, should build_column_default() know about it explicitly
> either?  Why aren't we implementing IDENTITY by storing an appropriate
> default expression in the catalogs?

Wait a minute... If I look at the callers of build_column_default(),
rewriteValuesRTE looks broken as well:
=# CREATE TABLE itest10 (a int GENERATED BY DEFAULT AS IDENTITY, b
text, c bigint);
CREATE TABLE
=# insert into itest10 values (default, 'foo', 0), (default, 'foo2', 1);
ERROR:  23502: null value in column "a" violates not-null constraint
DETAIL:  Failing row contains (null, foo, 0).
I would expect the second query to work properly.

It seems that slot_fill_defaults() would fail as well for a logical
worker, and so are ATExecAlterColumnType() and ATExecAddColumn()?
Peter, did you do the separation to help in handling better the cases
with INSERT OVERRIDE?
-- 
Michael


Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Peter Eisentraut
Date:
On 12/13/17 22:39, Tom Lane wrote:
> Now that I look more closely at this patch, isn't it fixing things
> in the wrong place?  Why is it that COPY needs to know about this,
> rather than build_column_default()?  Aren't we going to have to
> fix every other caller of build_column_default()?

Yeah, that seems to be a problem.  Attached is a patch that puts the
logic into build_column_default() and adds a few more tests covering
other omissions.

One problem is that this breaks the case ALTER TABLE ... ADD COLUMN ...
IDENTITY, because the sequence ownership isn't registered until after
the ALTER TABLE command.  (This only worked for empty tables.  For
pre-populated tables, it failed because of the above omission.)  The
second patch tries to fix that.  Needs more thought and documentation,
but it's an idea.

> For that matter, should build_column_default() know about it explicitly
> either?  Why aren't we implementing IDENTITY by storing an appropriate
> default expression in the catalogs?

Initial versions were coded that way, but it was pretty messy and buggy
because you have to touch a lot of places to teach them the difference
between a real default expression and a synthetic one.  Another problem
is that the NextValueExpr has special permission behavior that is not
really representable by a normal expression, thus introducing more
special cases, and possibly permission or security issues.

Obviously, it would have helped in the above case.  But it was really messy.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Michael Paquier
Date:
On Sat, Dec 16, 2017 at 2:37 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 12/13/17 22:39, Tom Lane wrote:
>> For that matter, should build_column_default() know about it explicitly
>> either?  Why aren't we implementing IDENTITY by storing an appropriate
>> default expression in the catalogs?
>
> Initial versions were coded that way, but it was pretty messy and buggy
> because you have to touch a lot of places to teach them the difference
> between a real default expression and a synthetic one.  Another problem
> is that the NextValueExpr has special permission behavior that is not
> really representable by a normal expression, thus introducing more
> special cases, and possibly permission or security issues.

I have no major objections about 0001. Here are some comments.

@@ -1123,6 +1113,16 @@ build_column_default(Relation rel, int attrno)
    Node       *expr = NULL;
    Oid         exprtype;

+   if (att_tup->attidentity)
+   {
I would add a comment explaining why this should come first.

+INSERT INTO itest3 VALUES (DEFAULT, 'a');
+INSERT INTO itest3 VALUES (DEFAULT, 'b'), (DEFAULT, 'c');
+SELECT * FROM itest3;
Thanks for adding a test with INSERT RTEs.

+-- ADD COLUMN
+CREATE TABLE itest13 (a int);
+-- add column to empty table
+ALTER TABLE itest13 ADD COLUMN b int GENERATED BY DEFAULT AS IDENTITY;
+ERROR:  no owned sequence found
Those are hitting an elog(), so I think that it would be better to
issue a proper ereport() instead with ERRCODE_FEATURE_NOT_SUPPORTED as
errcode. It is easier to understand the point you are going to
though...

Now coming to 0002...

-       defval = (Expr *) build_column_default(rel, attribute.attnum);
+       /*
+        * For an identity column, we can't use build_column_default(),
+        * because the sequence ownership isn't set yet.  So do it manually.
+        */
+       if (colDef->identity)
+       {
+           NextValueExpr *nve = makeNode(NextValueExpr);
+
+           nve->seqid = RangeVarGetRelid(colDef->identitySequence,
NoLock, false);
+           nve->typeId = typeOid;
+
+           defval = (Expr *) nve;
+       }
+       else
+           defval = (Expr *) build_column_default(rel, attribute.attnum);
I have a problem with this approach which is basically something
similar to what I complained in the thread about typed and partition
tables for identity columns
(https://www.postgresql.org/message-id/20171023074458.1473.25799@wrigleys.postgresql.org).
In my opinion, this ALTER TABLE handling should be done by treating
identity columns a way similar to default expressions in
transformColumnDefinition(), by storing the FuncExpr node at parsing
time instead of storing the information needed to rebuild it when
executing the query. In short the mapping should get closer to what
default does with nextval or serial.
-- 
Michael


Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Peter Eisentraut
Date:
On 12/18/17 00:49, Michael Paquier wrote:
> I have a problem with this approach which is basically something
> similar to what I complained in the thread about typed and partition
> tables for identity columns
> (https://www.postgresql.org/message-id/20171023074458.1473.25799@wrigleys.postgresql.org).
> In my opinion, this ALTER TABLE handling should be done by treating
> identity columns a way similar to default expressions in
> transformColumnDefinition(), by storing the FuncExpr node at parsing
> time instead of storing the information needed to rebuild it when
> executing the query. In short the mapping should get closer to what
> default does with nextval or serial.

The serial case works because it stores the sequence *name* in the
default value in the catalog.  That doesn't work because for the
identity case we don't store the expression in the catalog.  The
proposed patch works by storing the sequence *name* in the internal
structures so that it can be used in place of the stored default value.
So I think this approach is pretty consistent.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 12/18/17 00:49, Michael Paquier wrote:
>> In my opinion, this ALTER TABLE handling should be done by treating
>> identity columns a way similar to default expressions in
>> transformColumnDefinition(), by storing the FuncExpr node at parsing
>> time instead of storing the information needed to rebuild it when
>> executing the query. In short the mapping should get closer to what
>> default does with nextval or serial.

> The serial case works because it stores the sequence *name* in the
> default value in the catalog.

Just to clarify: what's *actually* stored for a regular serial column
is the sequence OID (as a regclass constant), not the name.  If it
were a name then ALTER SEQUENCE RENAME would break it.

> That doesn't work because for the
> identity case we don't store the expression in the catalog.  The
> proposed patch works by storing the sequence *name* in the internal
> structures so that it can be used in place of the stored default value.

I trust you really mean you're storing an OID ...

            regards, tom lane


Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Michael Paquier
Date:
On Wed, Dec 27, 2017 at 03:43:05PM -0500, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> That doesn't work because for the
>> identity case we don't store the expression in the catalog.  The
>> proposed patch works by storing the sequence *name* in the internal
>> structures so that it can be used in place of the stored default value.
>
> I trust you really mean you're storing an OID ...

A RangeVar is used for the proposed patch 0002. Still does it matter? In
the scenario proposed here, which is to fix ALTER TABLE ADD COLUMN
GENERATED, then the sequence gets created and visible only once the
transaction adding the column has been created.

     char        identity;        /* attidentity setting */
+    RangeVar   *identitySequence;
     CollateClause *collClause;    /* untransformed COLLATE spec, if any */
It mat be better to tell that this is used to create a sequence in the
same transaction as the one doing the parsing.
--
Michael

Attachment

Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Peter Eisentraut
Date:
On 12/30/17 18:26, Michael Paquier wrote:
> A RangeVar is used for the proposed patch 0002. Still does it matter? In
> the scenario proposed here, which is to fix ALTER TABLE ADD COLUMN
> GENERATED, then the sequence gets created and visible only once the
> transaction adding the column has been created.
> 
>      char        identity;        /* attidentity setting */
> +    RangeVar   *identitySequence;
>      CollateClause *collClause;    /* untransformed COLLATE spec, if any */
> It mat be better to tell that this is used to create a sequence in the
> same transaction as the one doing the parsing.

I would like to get this into next week's minor release as a bug fix.
Other than tweaking some of the comments, is there any more feedback on
this?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Michael Paquier
Date:
On Thu, Feb 01, 2018 at 10:24:48AM -0500, Peter Eisentraut wrote:
> On 12/30/17 18:26, Michael Paquier wrote:
>> A RangeVar is used for the proposed patch 0002. Still does it matter? In
>> the scenario proposed here, which is to fix ALTER TABLE ADD COLUMN
>> GENERATED, then the sequence gets created and visible only once the
>> transaction adding the column has been created.
>>
>>      char        identity;        /* attidentity setting */
>> +    RangeVar   *identitySequence;
>>      CollateClause *collClause;    /* untransformed COLLATE spec, if any */
>> It mat be better to tell that this is used to create a sequence in the
>> same transaction as the one doing the parsing.
>
> I would like to get this into next week's minor release as a bug fix.
> Other than tweaking some of the comments, is there any more feedback on
> this?

Wrapping again my mind on this one...  On top of the comment for
identitySequence, I think that it is important to mention that the
sequence name and a RangeVar are basically safe as they get created
hence they are not visible to other sessions yet.  0001 and 0002 should
be merged.

By the way, on HEAD with two sessions it is easy to bump into sequence
name conflicts with identity columns:
* Session 1:
begin;
create table itest13 (a int);
* Session 2:
create sequence itest13_b_seq;
* Session 1:
alter table itest13 add columns b int generated by default as identity; --blocks
* Session 2:
commit;

And then session 1 reports that:
ERROR:  23505: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(itest13_b_seq, 2200) already exists.
SCHEMA NAME:  pg_catalog
TABLE NAME:  pg_type
CONSTRAINT NAME:  pg_type_typname_nsp_index
LOCATION:  _bt_check_unique, nbtinsert.c:434

We surely want to be smarter with the choice of the sequence name for
identity columns.  Index names for primary keys are if I recall
correctly indexcmds.c and index.c.
--
Michael

Attachment

Re: BUG #14952: COPY fails to fill in IDENTITY column default value

From
Peter Eisentraut
Date:
On 2/1/18 20:49, Michael Paquier wrote:
> Wrapping again my mind on this one...  On top of the comment for
> identitySequence, I think that it is important to mention that the
> sequence name and a RangeVar are basically safe as they get created
> hence they are not visible to other sessions yet.  0001 and 0002 should
> be merged.

Committed with more comments and as one patch.

> By the way, on HEAD with two sessions it is easy to bump into sequence
> name conflicts with identity columns:
> * Session 1:
> begin;
> create table itest13 (a int);
> * Session 2:
> create sequence itest13_b_seq;
> * Session 1:
> alter table itest13 add columns b int generated by default as identity; --blocks
> * Session 2:
> commit;
> 
> And then session 1 reports that:
> ERROR:  23505: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
> DETAIL:  Key (typname, typnamespace)=(itest13_b_seq, 2200) already exists.
> SCHEMA NAME:  pg_catalog
> TABLE NAME:  pg_type
> CONSTRAINT NAME:  pg_type_typname_nsp_index
> LOCATION:  _bt_check_unique, nbtinsert.c:434

I think this is equivalent to pre-existing behavior with serial:

S1: begin;
S2: begin;
S2: create sequence t1_a_seq;
S1: create table t1 (a serial, b text); --blocks
S2: commit;
S1: ERROR

I suspect you can also construct situations like this with other
implicitly created objects such as triggers.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services