Re: Add new error_action COPY ON_ERROR "log" - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Add new error_action COPY ON_ERROR "log"
Date
Msg-id CALj2ACXNA0focNeriYRvQQaCGc4CsTuOnFbzF9LqTKNWxuJdhA@mail.gmail.com
Whole thread Raw
In response to Re: Add new error_action COPY ON_ERROR "log"  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Add new error_action COPY ON_ERROR "log"
List pgsql-hackers
On Fri, Mar 1, 2024 at 10:22 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> > Nice catch. When COPY_ON_ERROR_STOP is specified, we use ereport's
> > soft error mechanism. An assertion seems a good choice to validate the
> > state is what we expect. Done that way.
>
> Hmm.  I am not really on board with this patch, that would generate
> one NOTICE message each time a row is incompatible in the soft error
> mode.  If you have a couple of billion rows to bulk-load into the
> backend and even 0.01% of them are corrupted, you could finish with a
> more than 100k log entries, and all systems should be careful about
> the log quantity generated, especially if we use the syslogger which
> could become easily a bottleneck.

Hm. I was having some concerns about it as mentioned upthread. But,
thanks a lot for illustrating it.

> The existing ON_ERROR controls what to do on error.  I think that we'd
> better control the amount of information reported with a completely
> separate option, an option even different than where to redirect
> errors (if required, which would be either the logs, the client, a
> pipe, a combination of these or even all of them).

How about an extra option to error_action ignore-with-verbose which is
similar to ignore but when specified emits one NOTICE per malformed
row? With this, one can say COPY x FROM stdin (ON_ERROR
ignore-with-verbose);.

Alternatively, we can think of adding a new option verbose altogether
which can be used for not only this but for reporting some other COPY
related info/errors etc. With this, one can say COPY x FROM stdin
(VERBOSE on, ON_ERROR ignore);.

There's also another way of having a separate GUC, but -100 from me
for it. Because, it not only increases the total number of GUCs by 1,
but also might set a wrong precedent to have a new GUC for controlling
command level outputs.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Michael Paquier
Date:
Subject: Re: Improve readability by using designated initializers when possible