Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers

From Kirill Reshke
Subject Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date
Msg-id CALdSSPgLVBHNr97pYgmVfi+YdH4bqwk9YVvbuvhJ4sCr7eEVOA@mail.gmail.com
Whole thread Raw
In response to Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row  (jian he <jian.universality@gmail.com>)
List pgsql-hackers
On Tue, 19 Nov 2024, 13:52 jian he, <jian.universality@gmail.com> wrote:
>
> On Sat, Nov 16, 2024 at 5:55 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> >
> > I am attaching my v8 for reference.
> >
>
> in your v8.
>
>    <varlistentry>
>     <term><literal>REJECT_LIMIT</literal></term>
>     <listitem>
>      <para>
>       Specifies the maximum number of errors tolerated while converting a
>       column's input value to its data type, when <literal>ON_ERROR</literal> is
>       set to <literal>ignore</literal>.
>       If the input contains more erroneous rows than the specified
> value, the <command>COPY</command>
>       command fails, even with <literal>ON_ERROR</literal> set to
> <literal>ignore</literal>.
>      </para>
>     </listitem>
>    </varlistentry>
>
> then above description not meet with following example, (i think)
>
> create table t(a int not null);
> COPY t FROM STDIN WITH (on_error set_to_null, reject_limit 2);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> a
> >> \.
> ERROR:  null value in column "a" of relation "t" violates not-null constraint
> DETAIL:  Failing row contains (null).
> CONTEXT:  COPY t, line 1, column a: "a"


Sure, my v8 does not helps with column level NOT NULL constraint (or
other constraint)

> Overall, I think
> making the domain not-null align with column level not-null would be a
> good thing.

While this looks sane, it's actually a separate topic. Even on current
HEAD we have domain not-null vs column level not-null unalignment.

consider this example


```
reshke=# create table ftt2 (i int not null);
CREATE TABLE
reshke=# copy ftt2 from stdin with (reject_limit 1000, on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> \N
>> \.
ERROR:  null value in column "i" of relation "ftt2" violates not-null constraint
DETAIL:  Failing row contains (null).
CONTEXT:  COPY ftt2, line 1: "\N"
reshke=# create domain dd as int not null ;
CREATE DOMAIN
reshke=# create table ftt3(i dd);
CREATE TABLE
reshke=# copy ftt3 from stdin with (reject_limit 1000, on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> \N
>> \.
NOTICE:  1 row was skipped due to data type incompatibility
COPY 0
reshke=#
```
So, if we want this, we need to start another thread and deal with
REJECT_LIMIT + on_error ignore first.

The ExecConstraints function is the source of the error in scenario 1.
Therefore, we require something like "ExecConstraintsSafe" to
accommodate the aforementioned. That is a significant change. Not sure
it will be accepted by the community.

>
>      <para>
>       Specifies how to behave when encountering an error converting a column's
>       input value into its data type.
>       An <replaceable class="parameter">error_action</replaceable> value of
>       <literal>stop</literal> means fail the command,
>       <literal>ignore</literal> means discard the input row and
> continue with the next one, and
>       <literal>set_to_null</literal> means replace columns containing
> erroneous input values with <literal>null</literal> and move to the
> next row.
>
> "and move to the next row" is wrong?
> I think it should be " and move to the next field".

Yes, "and move to the next field" is correct.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export