Home > mailing lists

Re: Add new error_action COPY ON_ERROR "log" - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: Add new error_action COPY ON_ERROR "log"
Date	March 11, 2024 05:46:17
Msg-id	Ze6aqaGJlxr9gH-2@paquier.xyz Whole thread Raw
In response to	Re: Add new error_action COPY ON_ERROR "log" (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses	Re: Add new error_action COPY ON_ERROR "log"
List	pgsql-hackers

Tree view

On Sat, Mar 09, 2024 at 12:01:49AM +0530, Bharath Rupireddy wrote:
> If NOTICE per attribute and incrementing num_errors per row is
> implemented, it ends up erroring out with ERROR:  missing data for
> column "m"  for all-column-empty-row. Shall we treat this ERROR softly
> too if on_error ignore is specified? Or shall we discuss this idea
> separately?
>
> ERROR:  missing data for column "m"
> CONTEXT:  COPY check_ign_err, line 5: ""

Hmm.  I have spent some time looking at the bevahior of ON_ERROR, and
there are two tests in copy2.sql, one for the case where there is more
data than attributes and a second where there is not enough data in a
row that checks for errors.

For example, take this table:
=# create table tab (a int, b int, c int);
CREATE TABLE

This case works, even if the row has clearly not enough attributes.
The first attribute can be parsed, not the second one and this causes
the remaining data of the row to be skipped:
=# copy tab from stdin (delimiter ',', on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,
>> \.
NOTICE:  00000: 1 row was skipped due to data type incompatibility
LOCATION:  CopyFrom, copyfrom.c:1314
COPY 0

This case fails.  The first and the second attributes can be parsed,
and the line fails because we are missing the last attribute as of a
lack of delimiter:
=# copy tab from stdin (delimiter ',', on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,1
>> \.
ERROR:  22P04: missing data for column "c"
CONTEXT:  COPY tab, line 1: "1,1"

This brings a weird edge case for the all-column-empty-row case you
are mentioning once if we try to get information about all the rows we
should expect, but this has as side effect to break a case that's
intended ro work with ON_ERROR, as far as I understand, which is to
skip entirely a raw on the first conversion error we find, even if
there is no data afterwards.  I was a bit confused by that first, but
I can also see why it is useful as-is on HEAD.

At the end of the day, this comes down to what is more helpful to the
user.  And I'd agree on the side what ON_ERROR does currently, which
is what your patch relies on: on the first conversion failure, give up
and skip the rest of the row because we cannot trust its contents.
That's my way of saying that I am fine with the proposal of your
patch, and that we cannot provide the full state of a row without
making the error stack of COPY more invasive.

Perhaps we could discuss this idea of ensuring that all the attributes
are on a row in a different thread, as you say, but I am not really
convinced that there's a strong need for it at this stage as ON_ERROR
is new to v17.  So it does not sound like a bad thing to let it brew
more before implementing more options and make the COPY paths more
complicated than they already are.  I suspect that this may trigger
some discussion during the beta/stability phase depending on the
initial feedback.  Perhaps I'm wrong, though.

+      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      containing the line number and column name for each discarded row is
+      emitted.

This should clarify that the column name refers to the attribute where
the input conversion has failed, I guess.  Specifying only "column
name" without more context is a bit confusing.
--
Michael

Attachment

signature.asc

pgsql-hackers by date:

From: Tender Wang
Date: 11 March 2024, 05:31:26
Subject: Re: "type with xxxx does not exist" when doing ExecMemoize()

From: Amit Kapila
Date: 11 March 2024, 05:55:57
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation

Re: Add new error_action COPY ON_ERROR "log" - Mailing list pgsql-hackers

Attachment

Previous

Next