Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date
Msg-id 4c540fe3-495c-4bbf-8dcf-2c1e2b88bc3d@gmail.com
Whole thread Raw
In response to Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row  (jian he <jian.universality@gmail.com>)
List pgsql-hackers
On 22/01/26 11:45, jian he wrote:
>> Should FORCE_NOT_NULL be allowed to be used with ON_ERROR set_null? It
>> seems to me that ON_ERROR set_null overwrite the FORCE_NOT_NULL
>> behaviour:
>>
>> postgres=# create table t4(a int, b varchar(5));
>> CREATE TABLE
>>
>> postgres=# copy t4 from 'data.csv' with (FORCE_NOT_NULL(b), format csv, delimiter ',', NULL 'NULL', ON_ERROR
set_null);
>> NOTICE:  invalid values in 2 rows were replaced with null due to data type incompatibility
>> COPY 5
>>
>> postgres=# \pset null 'NULL'
>> Null display is "NULL".
>> postgres=# select * from t4;
>>   a |  b
>> ---+------
>>   1 | aaaa
>>   2 | bbbb
>>   2 | NULL
>>   2 | NULL
>>   5 | NULL
>> (5 rows)
>>
>> Note that only the ccccc rows on .csv file was inserted with a NULL
>> value on b column. The 5,NULL row was inserted with a "NULL" string as a
>> value:
>>
>> postgres=# select * from t4 where b is null;
>>   a |  b
>> ---+------
>>   2 | NULL
>>   2 | NULL
>> (2 rows)
>>
>> The contents of data.csv:
>>      1,aaaa
>>      2,bbbb
>>      2,ccccc
>>      2,ccccc
>>      5,NULL
>>
>> Perhaps we should block the usage of FORCE_NOT_NULL with ON_ERROR
>> SET_NULL?
>>
> FORCE_NOT_NULL is related to how we handle NULL string in column value.
> 
> We first process cstate->opts.force_notnull_flags, cstate->opts.force_null_flags
> then InputFunctionCallSafe.
> see copyfromparse.c, CopyFromTextLikeOneRow ``if (is_csv)``loop.
> 
> I think these two are unrelated things, FORCE_NOT_NULL should be fine with
> ON_ERROR SET_NULL.
> you can see related tests in
> https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/copy2.sql#n330
> 
> Am I missing something? 

Yeah, after some more thinking it seems ok to use both options 
together. I just found a bit strange when using integer columns. 
Consider this example:

cat data.csv
1,11
2,22
3,
4,44

postgres=# create table t(a int not null, b int);
CREATE TABLE

postgres=# copy t from 
'/Users/matheus/dev/pgdev/copy-on-error-set-null/data.csv' with 
(FORCE_NOT_NULL(b), format csv, delimiter ',',  ON_ERROR set_null);
NOTICE:  1 row was replaced with null due to data type incompatibility
COPY 4

postgres=# select * from t where b is null;
  a | b
---+---
  3 |
(1 row)

We are requiring a not null value on column b but we are still 
generating rows with null values on b.

The reasoning on this is that the row 3 would generate a "invalid 
input syntax for type integer" error and the ON_ERROR set_null fix 
this by inserting a NULL value. It make sense I think but I'm 
wondering if it could cause any confusion?


>> On monitoring.sgml we have the following for pg_stat_progress_copy
>> tuples_skipped:
>>         Number of tuples skipped because they contain malformed data.
>>         This counter only advances when a value other than
>>         <literal>stop</literal> is specified to the <literal>ON_ERROR</literal>
>>
>> IIUC we are not updating this view if we set a column to NULL due to an
>> error, perhaps this documentation should be updated to mention that it
>> will not be updated with ON_ERROR set_null?
>>
> 
> IMHO, we don't need to mention ON_ERROR set_null, since we do not support it.
> change to the following should be ok, i think.
> 
>        <para>
>         Number of tuples skipped because they contain malformed data.
>         This counter only advances when
>         <literal>ignore</literal> is specified to the <literal>ON_ERROR</literal>
>         option.
>        </para></entry> 

It looks good, I was thinking in something like this.

>>
>> I may have missing something, but we are still considering implementing
>> the REJECT_LIMIT + ON_ERROR set_null?
> Possibly as a separate patch later. 
Ok, good, thanks.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Haritabh Gupta
Date:
Subject: Re: [PATCH] pg_get_domain_ddl: DDL reconstruction function for CREATE DOMAIN statement
Next
From: Laurenz Albe
Date:
Subject: Re: Add SECURITY_INVOKER_VIEWS option to CREATE DATABASE