Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers
| From | Matheus Alcantara |
|---|---|
| Subject | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row |
| Date | |
| Msg-id | 4c540fe3-495c-4bbf-8dcf-2c1e2b88bc3d@gmail.com Whole thread Raw |
| In response to | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row (jian he <jian.universality@gmail.com>) |
| List | pgsql-hackers |
On 22/01/26 11:45, jian he wrote: >> Should FORCE_NOT_NULL be allowed to be used with ON_ERROR set_null? It >> seems to me that ON_ERROR set_null overwrite the FORCE_NOT_NULL >> behaviour: >> >> postgres=# create table t4(a int, b varchar(5)); >> CREATE TABLE >> >> postgres=# copy t4 from 'data.csv' with (FORCE_NOT_NULL(b), format csv, delimiter ',', NULL 'NULL', ON_ERROR set_null); >> NOTICE: invalid values in 2 rows were replaced with null due to data type incompatibility >> COPY 5 >> >> postgres=# \pset null 'NULL' >> Null display is "NULL". >> postgres=# select * from t4; >> a | b >> ---+------ >> 1 | aaaa >> 2 | bbbb >> 2 | NULL >> 2 | NULL >> 5 | NULL >> (5 rows) >> >> Note that only the ccccc rows on .csv file was inserted with a NULL >> value on b column. The 5,NULL row was inserted with a "NULL" string as a >> value: >> >> postgres=# select * from t4 where b is null; >> a | b >> ---+------ >> 2 | NULL >> 2 | NULL >> (2 rows) >> >> The contents of data.csv: >> 1,aaaa >> 2,bbbb >> 2,ccccc >> 2,ccccc >> 5,NULL >> >> Perhaps we should block the usage of FORCE_NOT_NULL with ON_ERROR >> SET_NULL? >> > FORCE_NOT_NULL is related to how we handle NULL string in column value. > > We first process cstate->opts.force_notnull_flags, cstate->opts.force_null_flags > then InputFunctionCallSafe. > see copyfromparse.c, CopyFromTextLikeOneRow ``if (is_csv)``loop. > > I think these two are unrelated things, FORCE_NOT_NULL should be fine with > ON_ERROR SET_NULL. > you can see related tests in > https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/copy2.sql#n330 > > Am I missing something? Yeah, after some more thinking it seems ok to use both options together. I just found a bit strange when using integer columns. Consider this example: cat data.csv 1,11 2,22 3, 4,44 postgres=# create table t(a int not null, b int); CREATE TABLE postgres=# copy t from '/Users/matheus/dev/pgdev/copy-on-error-set-null/data.csv' with (FORCE_NOT_NULL(b), format csv, delimiter ',', ON_ERROR set_null); NOTICE: 1 row was replaced with null due to data type incompatibility COPY 4 postgres=# select * from t where b is null; a | b ---+--- 3 | (1 row) We are requiring a not null value on column b but we are still generating rows with null values on b. The reasoning on this is that the row 3 would generate a "invalid input syntax for type integer" error and the ON_ERROR set_null fix this by inserting a NULL value. It make sense I think but I'm wondering if it could cause any confusion? >> On monitoring.sgml we have the following for pg_stat_progress_copy >> tuples_skipped: >> Number of tuples skipped because they contain malformed data. >> This counter only advances when a value other than >> <literal>stop</literal> is specified to the <literal>ON_ERROR</literal> >> >> IIUC we are not updating this view if we set a column to NULL due to an >> error, perhaps this documentation should be updated to mention that it >> will not be updated with ON_ERROR set_null? >> > > IMHO, we don't need to mention ON_ERROR set_null, since we do not support it. > change to the following should be ok, i think. > > <para> > Number of tuples skipped because they contain malformed data. > This counter only advances when > <literal>ignore</literal> is specified to the <literal>ON_ERROR</literal> > option. > </para></entry> It looks good, I was thinking in something like this. >> >> I may have missing something, but we are still considering implementing >> the REJECT_LIMIT + ON_ERROR set_null? > Possibly as a separate patch later. Ok, good, thanks. -- Matheus Alcantara EDB: https://www.enterprisedb.com
pgsql-hackers by date: