Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id 739953.1699467519@sss.pgh.pa.us
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Damir <dam.bel07@gmail.com>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Daniel Gustafsson <daniel@yesql.se>)
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Damir <dam.bel07@gmail.com> writes:
> [ v7-0002-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch ]

Sorry for being so late to the party, but ... I don't think this
is a well-designed feature as it stands.  Simply dropping failed rows
seems like an unusable definition for any application that has
pretensions of robustness.  "But", you say, "we're emitting WARNING
messages about it".  That's *useless*.  For most applications WARNING
messages just go into the bit bucket, or worse they cause memory leaks
(because the app never reads them).  An app that tried to read them
would have to cope with all sorts of fun such as translated messages.
Furthermore, as best I can tell from the provided test cases, the
messages completely lack basic context such as which field or line
the problem occurred in.  An app trying to use this to understand
which input lines had failed would not get far.

I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line.  (Or maybe an array of text that could receive the
broken-down field values?)  Maybe we could dump the message info,
line number, field name etc into additional columns.

Also it'd be a good idea to have a vision of how the feature
could be extended to cope with lower-level errors, such as
lines that have the wrong number of columns or other problems
with line-level syntax.  I don't say we need to cope with that
immediately, but it's going to be something people will want
to add, I think.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Next
From: Tomas Vondra
Date:
Subject: Re: Syncrep and improving latency due to WAL throttling