Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers

From jian he
Subject Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date
Msg-id CACJufxF6_YwAboiCaVYLRtNpO4kGbXqkXzH_7W=pUvrNXK8WuQ@mail.gmail.com
Whole thread Raw
In response to Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Sat, Apr 5, 2025 at 5:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Apr 4, 2025 at 4:55 AM jian he <jian.universality@gmail.com> wrote:
> >
> > On Tue, Mar 25, 2025 at 2:31 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > 2) Here in error we say column c1 violates not-null constraint and in
> > > the context we show column c2, should the context also display c2
> > > column:
> > > postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
> > > CREATE TABLE
> > > postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
> > > Enter data to be copied followed by a newline.
> > > End with a backslash and a period on a line by itself, or an EOF signal.
> > > >> a  b
> > > >> \.
> > > ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> > > DETAIL:  Failing row contains (null, null).
> > > CONTEXT:  COPY t3, line 1, column c2: "b"
> > >
> >
> > It took me a while to figure out why.
> > with the attached, now the error message becomes:
> >
> > ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> > DETAIL:  Failing row contains (null, null).
> > CONTEXT:  COPY t3, line 1: "a,b"
> >
> > while at it,
> > (on_error set_to_null, log_verbosity verbose)
> > error message CONTEXT will only emit out relation name,
> > this aligns with (on_error ignore, log_verbosity verbose).
> >
> > one of the message out example:
> > +NOTICE:  column "b" was set to null due to data type incompatibility at line 2
> > +CONTEXT:  COPY t_on_error_null
> >
> >
> >
> > > 3) typo becomen should be become:
> > > null will becomen reserved to non-reserved
> > fixed.
> >
> > > 4) There is a whitespace error while applying patch
> > > Applying: COPY (on_error set_to_null)
> > > .git/rebase-apply/patch:39: trailing whitespace.
> > >       a <literal>NOTICE</literal> message indicating the number of rows
> > > warning: 1 line adds whitespace errors.
> > fixed.
>
> I've reviewed the v15 patch and here are some comments:
>
> How about renaming the new option value to 'set_null"? The 'to' in the
> value name seems redundant to me.
>
> ---
> +        COPY_ON_ERROR_NULL,                    /* set error field to null */
>
> I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or
> COPY_ON_ERROR_SET_NULL if we change the option value name) for
> consistency with the value name.
>
> ---
> +                else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
> +                        ereport(NOTICE,
> +                                        errmsg_plural("invalid values
> in %" PRIu64 " row was replaced with null",
> +
> "invalid values in %" PRIu64 " rows were replaced with null",
> +
> cstate->num_errors,
> +
> cstate->num_errors));
>
> How about adding "due to data type incompatibility" at the end of the message?
>
> ---
> +                                    ereport(NOTICE,
> +                                                    errmsg("column
> \"%s\" was set to null due to data type incompatibility at line %"
> PRIu64 "",
> +
> cstate->cur_attname,
> +
> cstate->cur_lineno));
>
> Similar to the IGNORE case, we can show the data in question in the message.
>
> ---
> +                    else
> +                            ereport(ERROR,
> +
> errcode(ERRCODE_NOT_NULL_VIOLATION),
> +                                            errmsg("domain %s does
> not allow null values", format_type_be(typioparams[m])),
> +                                            errdatatype(typioparams[m]));
>
> If domain data type is the sole case where not to accept NULL, can we
> check it beforehand to avoid calling the second
> InputFunctionCallSafe() for non-domain data types? Also, if we want to
> end up with an error when setting NULL to a domain type with NOT NULL,
> I think we don't need to try to handle a soft error by passing
> econtext to InputFunctionCallSafe().
>

please check attached, hope i have addressed all the points you've mentioned.


> If domain data type is the sole case where not to accept NULL, can we
> check it beforehand to avoid calling the second
> InputFunctionCallSafe() for non-domain data types?

I doubt it.

we have
InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
                      Oid typioparam, int32 typmod,
                      fmNodePtr escontext,
                      Datum *result)
{
    LOCAL_FCINFO(fcinfo, 3);
    if (str == NULL && flinfo->fn_strict)
    {
        *result = (Datum) 0;    /* just return null result */
        return true;
    }
}

Most of the non-domain type input functions are strict.
see query result:

select proname, pt.typname, proisstrict,pt.typtype
from pg_type pt
join pg_proc pp on pp.oid = pt.typinput
where pt.typtype <> 'd'
and pt.typtype <> 'p'
and proisstrict is false;

so the second InputFunctionCallSafe will be faster for non-domain types.

before CopyFromTextLikeOneRow we don't know if this type is
domain_with_constraint or not.
Beforehand, we can conditionally call DomainHasConstraints to find out.
but DomainHasConstraints is expensive, which may carry extra
performance issues for non-domain types.

but the second InputFunctionCallSafe call will not be a big issue for
domain_with_constraint,
because the first time domain_in call already cached related structs.

Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
Next
From: Bertrand Drouvot
Date:
Subject: Re: Draft for basic NUMA observability