Re: Add new error_action COPY ON_ERROR "log" - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Add new error_action COPY ON_ERROR "log" |
Date | |
Msg-id | Ze6aqaGJlxr9gH-2@paquier.xyz Whole thread Raw |
In response to | Re: Add new error_action COPY ON_ERROR "log" (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Add new error_action COPY ON_ERROR "log"
|
List | pgsql-hackers |
On Sat, Mar 09, 2024 at 12:01:49AM +0530, Bharath Rupireddy wrote: > If NOTICE per attribute and incrementing num_errors per row is > implemented, it ends up erroring out with ERROR: missing data for > column "m" for all-column-empty-row. Shall we treat this ERROR softly > too if on_error ignore is specified? Or shall we discuss this idea > separately? > > ERROR: missing data for column "m" > CONTEXT: COPY check_ign_err, line 5: "" Hmm. I have spent some time looking at the bevahior of ON_ERROR, and there are two tests in copy2.sql, one for the case where there is more data than attributes and a second where there is not enough data in a row that checks for errors. For example, take this table: =# create table tab (a int, b int, c int); CREATE TABLE This case works, even if the row has clearly not enough attributes. The first attribute can be parsed, not the second one and this causes the remaining data of the row to be skipped: =# copy tab from stdin (delimiter ',', on_error ignore); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1, >> \. NOTICE: 00000: 1 row was skipped due to data type incompatibility LOCATION: CopyFrom, copyfrom.c:1314 COPY 0 This case fails. The first and the second attributes can be parsed, and the line fails because we are missing the last attribute as of a lack of delimiter: =# copy tab from stdin (delimiter ',', on_error ignore); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,1 >> \. ERROR: 22P04: missing data for column "c" CONTEXT: COPY tab, line 1: "1,1" This brings a weird edge case for the all-column-empty-row case you are mentioning once if we try to get information about all the rows we should expect, but this has as side effect to break a case that's intended ro work with ON_ERROR, as far as I understand, which is to skip entirely a raw on the first conversion error we find, even if there is no data afterwards. I was a bit confused by that first, but I can also see why it is useful as-is on HEAD. At the end of the day, this comes down to what is more helpful to the user. And I'd agree on the side what ON_ERROR does currently, which is what your patch relies on: on the first conversion failure, give up and skip the rest of the row because we cannot trust its contents. That's my way of saying that I am fine with the proposal of your patch, and that we cannot provide the full state of a row without making the error stack of COPY more invasive. Perhaps we could discuss this idea of ensuring that all the attributes are on a row in a different thread, as you say, but I am not really convinced that there's a strong need for it at this stage as ON_ERROR is new to v17. So it does not sound like a bad thing to let it brew more before implementing more options and make the COPY paths more complicated than they already are. I suspect that this may trigger some discussion during the beta/stability phase depending on the initial feedback. Perhaps I'm wrong, though. + <literal>verbose</literal>, a <literal>NOTICE</literal> message + containing the line number and column name for each discarded row is + emitted. This should clarify that the column name refers to the attribute where the input conversion has failed, I guess. Specifying only "column name" without more context is a bit confusing. -- Michael
Attachment
pgsql-hackers by date: