Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers

From jian he
Subject Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date
Msg-id CACJufxGEHmijmP-QqvrmqU6cxmhgpdjY7ewQBQ=E9NmdyEcqmw@mail.gmail.com
Whole thread Raw
In response to Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Tue, Apr 8, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> BTW have you measured the overheads of calling InputFunctionCallSafe
> twice? If it's significant, we might want to find other ways to
> achieve it as it would not be good to incur overhead just for
> relatively rare cases.
>

Please check the attached two patches
v17-0001-COPY-on_error-set_null.original,
v17-0001-COPY-on_error-set_null.patch

for non-domain types, (on_error set_null), the performance of these
two are the same.
for domain type with or without constraint,
(on_error set_null): v17.original is slower than v17.patch.


test script:

create unlogged table t2(a text);
insert into t2 select 'a' from generate_Series(1, 10_000_000) g;
copy t2 to '/tmp/2.txt';
CREATE DOMAIN d1 AS INT ;
CREATE DOMAIN d2 AS INT check (value > 0);
create unlogged table t3(a int);
create unlogged table t4(a d1);
create unlogged table t5(a d2);


performance result:
v17-0001-COPY-on_error-set_null.patch
-- 764.903 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 779.253 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- Time: 750.390 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1

v17-0001-COPY-on_error-set_null.original
-- 774.943 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 867.671 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 927.685 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1


> Here are some comments:
>
> +               if (InputFunctionCallSafe(&in_functions[m],
> +                                         NULL,
> +                                         typioparams[m],
> +                                         att->atttypmod,
> +                                         NULL,
> +                                         &values[m]))
>
> Given that we pass NULL to escontext, does this function return false
> in an error case? Or can we use InputFunctionCall instead?
>
> I think we should mention that SET_NULL still could fail if the data
> type of the column doesn't accept NULL.
>
> How about restructuring the codes around handling data incompatibility
> errors like:
>
> else if (!InputFunctionCallSafe(...))
> {
>     if (cstate->opts.on_error == IGNORE)
>     {
>         cstate->num_errors++;
>         if (cstate->opts.log_verbosity == VERBOSE)
>             write a NOTICE message;
>         return true; // ignore whole row.
>     }
>     else if (cstate->opts.on_error == SET_NULL)
>     {
>         current_row_erroneous = true;
>         set NULL to the column;
>         if (cstate->opts.log_verbosity == VERBOSE)
>             write a NOTICE message;
>         continue; // go to the next column.
> }
>
> That way, we have similar structures for both on_error handling and
> don't need to reset cstate->cur_attname at the end of SET_NULL
> handling.
>

I think we still need to reset cstate->cur_attname.
the current code structure is
``
foreach(cur, cstate->attnumlist)
{
       if (condition x)
            continue;
        cstate->cur_attname = NULL;
        cstate->cur_attval = NULL;
}
``
In some cases (last column , condition x is satisfied), once we reach
the ``continue``, then we cannot reach.
``
        cstate->cur_attname = NULL;
        cstate->cur_attval = NULL;
``



> ---
> From the regression tests:
>
> --fail, column a is domain with not-null constraint
> COPY t_on_error_null FROM STDIN WITH (on_error set_null);
> a       11      14
> \.
> ERROR:  domain d_int_not_null does not allow null values
> CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
>
> I guess that the log messages could confuse users since while the
> actual error was caused by setting NULL to the non-NULL domain type
> column, the context message says the data 'a' was erroneous.
>

if the second function is InputFunctionCall, then we cannot customize
the error message.
we can't have both.
I guess we need a second InputFunctionCallSafe with escontext NOT NULL.

now i change it to
                if (!cstate->domain_with_constraint[m] ||
                    InputFunctionCallSafe(&in_functions[m],
                                          NULL,
                                          typioparams[m],
                                          att->atttypmod,
                                          (Node *) cstate->escontext,
                                          &values[m]))
                else if (string == NULL)
                    ereport(ERROR,
                            errcode(ERRCODE_NOT_NULL_VIOLATION),
                            errmsg("domain %s does not allow null
values", format_type_be(typioparams[m])),
                            errdatatype(typioparams[m]));
                else
                    ereport(ERROR,
                            errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                            errmsg("invalid input value for domain %s: \"%s\"",
                                   format_type_be(typioparams[m]), string));


do these ``ELSE IF``, ``ELSE`` error report messages make sense to you?

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Draft for basic NUMA observability
Next
From: Amit Kapila
Date:
Subject: Re: Large expressions in indexes can't be stored (non-TOASTable)