Re: Bug in copy - Mailing list pgsql-bugs

From me nefcanto
Subject Re: Bug in copy
Date
Msg-id CAEHBEOC_xWMiq03LWm_WmCgJRAko-uzKK5cPRHK8CCMuSKJ7Rg@mail.gmail.com
Whole thread Raw
In response to Re: Bug in copy  (Greg Sabino Mullane <htamfids@gmail.com>)
Responses Re: Bug in copy
List pgsql-bugs
I didn't understand how putting a catch-all handler turns the Postgres into an ETL application. And about all those numerous Postgres-specific solutions, can you please enumerate them? Because there are not numerous options, just a couple of options, and none of them work. Inserting the data into a table without constraint and then performing a merge won't work. Because merge also does not have a catch-all handler. What other solutions have been proposed? pg_bulkload?

So basically two solutions have been suggested, Merge and pg_bulkload. One does not work (just handles more error and fails again on a single error) and the other is a command-line utility.

I also did a simple research using AI and came up with this table:


RDBMS System - Row-Level Error Handling for Bulk Copy
Oracle Database - Yes (e.g., SQL*Loader can direct bad records)
Microsoft SQL Server - No (BULK INSERT generally aborts on row errors)
MySQL - Yes (using LOAD DATA with IGNORE options)
PostgreSQL - No (COPY stops on error without workarounds)
IBM DB2 - Yes (LOAD utility supports error logging)
Teradata - Yes (has options for capturing reject rows)
SAP HANA - No (bulk load typically aborts on errors)
MariaDB - Yes (similar to MySQL with error-handling options)
SQLite - No (no dedicated bulk load command with error handling)
Amazon Redshift - Yes (COPY command allows error tolerance settings)
Google BigQuery - Yes (load jobs can be configured to skip bad rows)
Netezza - Yes (supports reject files for bulk loads)
Snowflake - Yes (COPY INTO offers error handling parameters)
Vertica - Yes (COPY command logs errors and continues)

10 RDBMS systems support catch-all problems, and only 4 do not. This means that the majority have found it to be a useful feature.

Greg, may I ask what's your argument against having a catch-all error in bulk operations? Because the ETL argument is not a valid one. As I mentioned previously, none of my real-world requirements were related to the ETL.

Regards
Saeed

On Mon, Feb 24, 2025 at 7:25 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Mon, Feb 24, 2025 at 9:09 AM me nefcanto <sn.1361@gmail.com> wrote:
The point is, the database schema is not in our hands. We don't know what rules exist on each table and what rules change. And it's not practical and feasible to spend resources on keeping our bulk insertion logic with the database changes.

That's a company problem, and not one that can be solved by changing the way COPY works.
 
Is there a problem in implementing this? After all one expects the most advanced open source database to support this real-world requirement.

We're not going to change Postgres into an ETL application. This thread has given you numerous Postgres-specific solutions, but there is also no shortage of ETL applications you can try out.

Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18824: Inconsistent results for isolation level settings
Next
From: Kritika Agarwal
Date:
Subject: Re: error -10825