Thread: Please implement a catch-all error handler per row, for COPY

Please implement a catch-all error handler per row, for COPY

From

me nefcanto

Date:

01 March, 18:18:41

Hello

Please consider these scenarios:

- I want to create a million fake products, sometimes even 100 million (we're on MariaDB now and we plan to migrate to Postgres). My team uses fake data for performance tests and other use cases.

- Another scenario is translations. Even in production, we have translation files for more than 20 languages, and for more than 2 thousand keys. That means we need to insert 40 thousand translation records in the production.

- Another scenario is updating nested model values for a large hierarchical table. For example, the categories table. Anytime the user changes a record in that table we need to recalculate the nested model for the entire categories and bulk update the results.

All of these scenarios are such that data sanitation is difficult if not possible before doing the bulk operation (copy).

I realized that when we specify `on_error ignore` it just handles a handful of errors. I thought this was a bug and sent an email to the pgsql-bugs maling list. But they said it's the intended behavior.

Can you please provide a row-level catch-all handler for the copy command?

Regards
Saeed

Re: Please implement a catch-all error handler per row, for COPY

From

Tom Lane

Date:

01 March, 19:20:12

me nefcanto <sn.1361@gmail.com> writes:
> Can you please provide a row-level catch-all handler for the copy command?

Very unlikely to happen.  COPY is not intended as a general purpose
ETL mechanism, and we don't want to load it down with features that
would create substantial performance penalties.  Which that would.

Even ignoring the performance angle, this request seems remarkably
ill-specified.  What is a "row-level handler" for errors that have
to do with identifying row boundaries?

            regards, tom lane

Re: Please implement a catch-all error handler per row, for COPY

From

Adrian Klaver

Date:

01 March, 19:31:08

On 3/1/25 07:18, me nefcanto wrote:
> Hello
> 
>
> All of these scenarios are such that data sanitation is difficult if not 
> possible before doing the bulk operation (copy).
> 
> I realized that when we specify `on_error ignore` it just handles a 
> handful of errors. I thought this was a bug and sent an email to the 
> pgsql-bugs maling list. But they said it's the intended behavior.
> 

Take a look at:

http://ossc-db.github.io/pg_bulkload/index.html

and

https://pgloader.io/

They may be able to meet your needs.

> 
> Regards
> Saeed

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Please implement a catch-all error handler per row, for COPY

From

"David G. Johnston"

Date:

01 March, 19:35:54

On Sat, Mar 1, 2025 at 9:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

me nefcanto <sn.1361@gmail.com> writes:
> Can you please provide a row-level catch-all handler for the copy command?

Very unlikely to happen. COPY is not intended as a general purpose
ETL mechanism, and we don't want to load it down with features that
would create substantial performance penalties. Which that would.

Maybe it isn't a general purpose ETL tool but there is no reasonable way to do some things unless COPY can be put into a mode that doesn't have the same performance requirements it needs to serve as our dump/restore tool of choice.

I have to imagine such a mode, if not enabled, would have little to no impact on how COPY behaves compared to today. It's kinda like VACUUM FULL existing doesn't impact how VACUUM behaves. Seems more desirable than inventing a whole new SQL Command to do this and copy-paste all of the COPY code since it does mostly the same thing.

In short, it's probably worth giving it a try if someone wants to. Rejecting it without seeing the proposal seems premature.

Now, it's probably challenging enough that if the person requesting the feature isn't driving its development the odds of it getting worked on is fairly low.

David J.

Re: Please implement a catch-all error handler per row, for COPY

From

Greg Sabino Mullane

Date:

01 March, 20:14:37

FYI the -bugs thread in question:

https://www.postgresql.org/message-id/flat/CAEHBEOBCweDWGNHDaUk4%3D10HG0QXXJJAGXbEnFLMB30M%2BQw%2Bdg%40mail.gmail.com

seems to imply the primary blocker was a unique constraint.

Cheers,

Greg

Crunchy Data - https://www.crunchydata.com

Enterprise Postgres Software Products & Tech Support

Re: Please implement a catch-all error handler per row, for COPY

From

me nefcanto

Date:

03 March, 06:44:52

Thank you all for responding. With respect, I don't understand why COPY gets related to ETL. All of the requirements I mentioned above have nothing to do with ETL. We have a table of categories. A hierarchical table. This is a multitenant app. Hundreds of thousands of records are in it. We want to calculate the hierarchical properties (nested set models, node depth, determining leaves, materialized paths, etc.) and then update the entire table. What does this have to do with ETL? Or as I said we create millions of records for products, attributes of products, and pricing and media of products to load test our system. Again, what does that have to do with ETL?

The point is, that there is already an `on_error ignore` clause there. This means that somewhere there is a try/catch per row. If I'm wrong, please let me know. But when the `on_error ignore` catches problem x for each row, then it can catch all problems for each row without any performance problem.

Let me give you an example in C#:

try

{

}

catch (SomeException ex)

{

}

becomes:

try

{

}

catch (Exception ex)

{

if (ex is SomeException)

{

}

The last catch clause catches everything. How does it affect performance? Running a simple if for hundreds of millions of iterations is literally nothing in time complexity.

As I have specified in the bug thread, from 11 RDBMSs, 7 support this. Thus it's not an uncommon weird request.

Regards

Saeed

On Sat, Mar 1, 2025 at 8:45 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:

FYI the -bugs thread in question:

https://www.postgresql.org/message-id/flat/CAEHBEOBCweDWGNHDaUk4%3D10HG0QXXJJAGXbEnFLMB30M%2BQw%2Bdg%40mail.gmail.com

seems to imply the primary blocker was a unique constraint.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: Please implement a catch-all error handler per row, for COPY

From

Christophe Pettus

Date:

03 March, 06:54:20

> On Mar 2, 2025, at 19:44, me nefcanto <sn.1361@gmail.com> wrote:
>
> As I have specified in the bug thread, from 11 RDBMSs, 7 support this. Thus it's not an uncommon weird request.

If your organization is interested in producing a design and a patch, or paying a developer or organization to do so,
thatwould be the best way forward.  Everyone who works on the core PostgreSQL code is either a volunteer or extremely
busy. Those that are paid to work on PostgreSQL usually have their priorities mapped out for months (years?) already.
Thebest way to convince the community to adopt a feature to come bearing a high-quality patch.  That's no guarantee, of
course,but the probability is much higher that way.

Re: Please implement a catch-all error handler per row, for COPY

From

Tom Lane

Date:

03 March, 07:07:27

me nefcanto <sn.1361@gmail.com> writes:
> The point is, that there is already an `on_error ignore` clause there. This
> means that somewhere there is a try/catch per row. If I'm wrong, please let
> me know.

You are wrong.  See

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=d9f7f5d32

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=9e2d87011

and a ton of related commits.

It's possible that the specific case of unique-index violations
could be handled in a similar style.  But "catch any error whatever"
is simply not going to happen here, because a subtransaction per
row is not practical.

            regards, tom lane

Re: Please implement a catch-all error handler per row, for COPY

From

Adrian Klaver

Date:

03 March, 08:33:07

On 3/2/25 19:44, me nefcanto wrote:
> Thank you all for responding. With respect, I don't understand why COPY 
> gets related to ETL. All of the requirements I mentioned above have 
> nothing to do with ETL. We have a table of categories. A hierarchical 
> table. This is a multitenant app. Hundreds of thousands of records are 
> in it. We want to calculate the hierarchical properties (nested set 
> models, node depth, determining leaves, materialized paths, etc.) and 
> then update the entire table. What does this have to do with ETL? Or as

Pretty sure it defines ETL, extract data from a source, transform it to 
meet the properties and load it.

At any rate:

1) ON_ERROR IGNORE only just appeared in Postgres 17.

2) Any further changes to it would constitute a feature change that can 
only happen in major version upgrade. The next major version is due this 
Fall and development is already well under way. The chances of it making 
it in given the time frame is slim, so that pushes a practicable 
deadline until Fall of 2026.

3) As I mentioned in a previous post there are packages already out 
there that may get you want you want now.

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Please implement a catch-all error handler per row, for COPY

From

Laurenz Albe

Date:

03 March, 11:55:03

On Mon, 2025-03-03 at 07:14 +0330, me nefcanto wrote:
> The point is, that there is already an `on_error ignore` clause there. This means that
> somewhere there is a try/catch per row. If I'm wrong, please let me know.

The crucial point that Tom referred to is the "per row".  What is a row?

Imagine you have a COPY FROM statement for a table with three columns,
and the data look like this:

1,Smith,John
2,Lewis,Jerry
Lee
3,Prince
4,Albe,Laurenz

We may be able to guess what is meant, but how shall the machine know
where the line boundaries are, which data to ignore and which to process?

Currently, that is no problem, because errors are only identified
after the data have been successfully parsed.

Yours,
Laurenz Albe