Re: Add new error_action COPY ON_ERROR "log" - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Add new error_action COPY ON_ERROR "log" |
Date | |
Msg-id | CALj2ACUk700cYhx1ATRQyRw-fBM+aRo6auRAitKGff7XNmYfqQ@mail.gmail.com Whole thread Raw |
In response to | Re: Add new error_action COPY ON_ERROR "log" (torikoshia <torikoshia@oss.nttdata.com>) |
Responses |
Re: Add new error_action COPY ON_ERROR "log"
|
List | pgsql-hackers |
On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2024-01-27 00:43, David G. Johnston wrote: > > >> I'd like to have a new option "log", which skips soft errors and > >> logs > >> information that should have resulted in errors to PostgreSQL log. > > > > user-specified tables in the same database. Setting up temporary > > tables or unlogged tables probably is going to be a more acceptable > > methodology than trying to get to the log files. > > OTOH not everyone thinks saving table information is the best idea[2]. The added NOTICE message gives a summary of how many rows were skipped, but not the line numbers. It's hard for the users to find the malformed data, especially when they are bulk-inserting from data files of multiple GBs in size (hard to open such files in any editor too). I understand the value of writing the info to server log or tables of users' choice as being discussed in a couple of other threads. However, I'd prefer we do the simplest thing first before we go debate server log or table - let the users know what line numbers are containing the errors that COPY ignored something like [1] with a simple change like [2]. It not only helps users figure out which rows and attributes were malformed, but also helps them redirect them to server logs with setting log_min_messages = notice [3]. In the worst case scenario, a problem with this one NOTICE per malformed row is that it can overload the psql session if all the rows are malformed. I'm not sure if this is a big problem, but IMO better than a single summary NOTICE message and simpler than writing to tables of users' choice. Thoughts? FWIW, I presented the new COPY ... ON_ERROR option feature at Hyderabad PostgreSQL User Group meetup and it was well-received by the audience. The audience felt it's going to be extremely helpful for bulk-loading tasks. Thank you all for working on this feature. [1] postgres=# CREATE TABLE check_ign_err (n int, m int[], k int); CREATE TABLE postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 {1} 1 a {2} 2 3 {3} 3333333333 4 {a, 4} 4 5 {5}>> >> >> >> >> 5 \.>> NOTICE: detected data type incompatibility at line number 2 for column check_ign_err, COPY n NOTICE: detected data type incompatibility at line number 3 for column check_ign_err, COPY k NOTICE: detected data type incompatibility at line number 4 for column check_ign_err, COPY m NOTICE: detected data type incompatibility at line number 5 for column check_ign_err, COPY n NOTICE: 4 rows were skipped due to data type incompatibility COPY 2 [2] diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 906756362e..93ab5d4ebb 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, &values[m])) { cstate->num_errors++; - return true; + + if (cstate->opts.on_error != COPY_ON_ERROR_STOP) + { + ereport(NOTICE, + errmsg("detected data type incompatibility at line number %llu for column %s, COPY %s", + (unsigned long long) cstate->cur_lineno, + cstate->cur_relname, + cstate->cur_attname)); + return true; + } } cstate->cur_attname = NULL; [3] 2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type incompatibility at line number 2 for column check_ign_err, COPY n 2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err, line 2, column n: "a" 2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type incompatibility at line number 3 for column check_ign_err, COPY k 2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err, line 3, column k: "3333333333" 2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type incompatibility at line number 4 for column check_ign_err, COPY m 2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err, line 4, column m: "{a, 4}" 2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type incompatibility at line number 5 for column check_ign_err, COPY n 2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err, line 5, column n: "" 2024-02-12 06:20:29.363 UTC [427892] NOTICE: 4 rows were skipped due to data type incompatibility -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: