Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id 20230322175000.qbdctk7bnmifh5an@awork3.anarazel.de
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (torikoshia <torikoshia@oss.nttdata.com>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
List pgsql-hackers
Hi,

Tom, see below - I wonder if should provide one more piece of infrastructure
around the saved error stuff...


Have you measured whether this has negative performance effects when *NOT*
using the new option?


As-is this does not work with FORMAT BINARY - and converting the binary input
functions to support soft errors won't happen for 16. So I think you need to
raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified.


On 2023-03-22 22:34:20 +0900, torikoshia wrote:
> @@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)
>  
>          ExecClearTuple(myslot);
>  
> +        if (cstate->opts.ignore_datatype_errors)
> +        {
> +            escontext.details_wanted = true;
> +            cstate->escontext = escontext;
> +        }

I think it might be worth pulling this out of the loop. That does mean you'd
have to reset escontext.error_occurred after an error, but that doesn't seem
too bad, you need to do other cleanup anyway.


> @@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
>                  values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
>              }
>              else
> -                values[m] = InputFunctionCall(&in_functions[m],
> -                                              string,
> -                                              typioparams[m],
> -                                              att->atttypmod);
> +                /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */
> +                if (!InputFunctionCallSafe(&in_functions[m],
> +                                           string,
> +                                           typioparams[m],
> +                                           att->atttypmod,
> +                                           (Node *) &cstate->escontext,
> +                                           &values[m]))
> +                {
> +                    cstate->ignored_errors++;
> +
> +                    ereport(WARNING,
> +                            errmsg("%s", cstate->escontext.error_data->message));

That isn't right - you loose all the details of the message. As is you'd also
leak the error context.

I think the best bet for now is to do something like
    /* adjust elevel so we don't jump out */
    cstate->escontext.error_data->elevel = WARNING;
    /* despite the name, this won't raise an error if elevel < ERROR */
    ThrowErrorData(cstate->escontext.error_data);

I wonder if we ought to provide a wrapper for this? It could e.g. know to
mention the original elevel and such?


I don't think NextCopyFrom() is the right place to emit this warning - it
e.g. is also called from file_fdw.c, which might want to do something else
with the error. From a layering POV it seems cleaner to do this in
CopyFrom(). You already have a check for escontext.error_occurred there
anyway.



> @@ -3378,6 +3378,10 @@ copy_opt_item:
>                  {
>                      $$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
>                  }
> +            | IGNORE_DATATYPE_ERRORS
> +                {
> +                    $$ = makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1);
> +                }
>              | DELIMITER opt_as Sconst
>                  {
>                      $$ = makeDefElem("delimiter", (Node *) makeString($3), @1);


I think we shouldn't add a new keyword for this, but only support this via
/* new COPY option syntax */
copy_generic_opt_list:
            copy_generic_opt_elem

Further increasing the size of the grammar with random keywords when we have
more generic ways to represent them seems unnecessary.


> +-- tests for IGNORE_DATATYPE_ERRORS option
> +CREATE TABLE check_ign_err (n int, m int[], k int);
> +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
> +1    {1}    1
> +a    {2}    2
> +3    {3}    3333333333
> +4    {a, 4}    4
> +
> +5    {5}    5
> +\.
> +SELECT * FROM check_ign_err;
> +

I suggest adding a few more tests:

- COPY with a datatype error that can't be handled as a soft error
- test documenting that COPY FORMAT BINARY is incompatible with IGNORE_DATATYPE_ERRORS
- a soft error showing the error context - although that will require some
  care to avoid the function name + line in the output

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Add pg_walinspect function with block info columns
Next
From: Robert Haas
Date:
Subject: Re: On login trigger: take three