Re: Add new error_action COPY ON_ERROR "log" - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Add new error_action COPY ON_ERROR "log"
Date
Msg-id CALj2ACUk700cYhx1ATRQyRw-fBM+aRo6auRAitKGff7XNmYfqQ@mail.gmail.com
Whole thread Raw
In response to Re: Add new error_action COPY ON_ERROR "log"  (torikoshia <torikoshia@oss.nttdata.com>)
Responses Re: Add new error_action COPY ON_ERROR "log"
List pgsql-hackers
On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2024-01-27 00:43, David G. Johnston wrote:
>
> >> I'd like to have a new option "log", which skips soft errors and
> >> logs
> >> information that should have resulted in errors to PostgreSQL log.
> >
> > user-specified tables in the same database.  Setting up temporary
> > tables or unlogged tables probably is going to be a more acceptable
> > methodology than trying to get to the log files.
>
> OTOH not everyone thinks saving table information is the best idea[2].

The added NOTICE message gives a summary of how many rows were
skipped, but not the line numbers. It's hard for the users to find the
malformed data, especially when they are bulk-inserting from data
files of multiple GBs in size (hard to open such files in any editor
too).

I understand the value of writing the info to server log or tables of
users' choice as being discussed in a couple of other threads.
However, I'd prefer we do the simplest thing first before we go debate
server log or table - let the users know what line numbers are
containing the errors that COPY ignored something like [1] with a
simple change like [2]. It not only helps users figure out which rows
and attributes were malformed, but also helps them redirect them to
server logs with setting log_min_messages = notice [3]. In the worst
case scenario, a problem with this one NOTICE per malformed row is
that it can overload the psql session if all the rows are malformed.
I'm not sure if this is a big problem, but IMO better than a single
summary NOTICE message and simpler than writing to tables of users'
choice.

Thoughts?

FWIW, I presented the new COPY ... ON_ERROR option feature at
Hyderabad PostgreSQL User Group meetup and it was well-received by the
audience. The audience felt it's going to be extremely helpful for
bulk-loading tasks. Thank you all for working on this feature.

[1]
postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);
CREATE TABLE
postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1    {1}     1
a       {2}     2
3       {3}     3333333333
4       {a, 4}  4

5       {5}>> >> >> >> >>       5
\.>>
NOTICE:  detected data type incompatibility at line number 2 for
column check_ign_err, COPY n
NOTICE:  detected data type incompatibility at line number 3 for
column check_ign_err, COPY k
NOTICE:  detected data type incompatibility at line number 4 for
column check_ign_err, COPY m
NOTICE:  detected data type incompatibility at line number 5 for
column check_ign_err, COPY n
NOTICE:  4 rows were skipped due to data type incompatibility
COPY 2

[2]
diff --git a/src/backend/commands/copyfromparse.c
b/src/backend/commands/copyfromparse.c
index 906756362e..93ab5d4ebb 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,

                 &values[m]))
                        {
                                cstate->num_errors++;
-                               return true;
+
+                               if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
+                               {
+                                       ereport(NOTICE,
+
errmsg("detected data type incompatibility at line number %llu for
column %s, COPY %s",
+
(unsigned long long) cstate->cur_lineno,
+
cstate->cur_relname,
+
cstate->cur_attname));
+                                       return true;
+                               }
                        }

                        cstate->cur_attname = NULL;

[3]
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 2 for column check_ign_err, COPY n
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 2, column n: "a"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 3 for column check_ign_err, COPY k
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 3, column k: "3333333333"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 4 for column check_ign_err, COPY m
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 4, column m: "{a, 4}"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 5 for column check_ign_err, COPY n
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 5, column n: ""
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  4 rows were skipped due
to data type incompatibility

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: jian he
Date:
Subject: Re: RFC: Logging plan of the running query