Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id B082C6BE-D2F2-4DD4-8649-E81E24A6840E@yesql.se
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> On 8 Nov 2023, at 19:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think an actually usable feature of this sort would involve
> copying all the failed lines to some alternate output medium,
> perhaps a second table with a TEXT column to receive the original
> data line.  (Or maybe an array of text that could receive the
> broken-down field values?)  Maybe we could dump the message info,
> line number, field name etc into additional columns.

I agree that the errors should be easily visible to the user in some way.  The
feature is for sure interesting, especially in data warehouse type jobs where
dirty data is often ingested.

As a data point, Greenplum has this feature with additional SQL syntax to
control it:

    COPY .. LOG ERRORS SEGMENT REJECT LIMIT xyz ROWS;

LOG ERRORS instructs the database to log the faulty rows and SEGMENT REJECT
LIMIT xyz ROWS sets the limit of how many rows can be faulty before the
operation errors out.  I'm not at all advocating that we should mimic this,
just wanted to add a reference to postgres derivative where this has been
implemented.

--
Daniel Gustafsson




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: XX000: tuple concurrently deleted during DROP STATISTICS
Next
From: Tom Lane
Date:
Subject: Re: XID-wraparound hazards in LISTEN/NOTIFY