On 2023-12-15 05:48, Masahiko Sawada wrote:
Thanks for joining this discussion!
> I've read this thread and the latest patch. IIUC with SAVE_ERROR
> option, COPY FROM creates an error table for the target table and
> writes error information there.
>
> While I agree that the final shape of this feature would be something
> like that design, I'm concerned some features are missing in order to
> make this feature useful in practice. For instance, error logs are
> inserted to error tables without bounds, meaning that users who want
> to tolerate errors during COPY FROM will have to truncate or drop the
> error tables periodically, or the database will grow with error logs
> without limit. Ideally such maintenance work should be done by the
> database. There might be some users who want to log such conversion
> errors in server logs to avoid such maintenance work. I think we
> should provide an option for where to write, at least. Also, since the
> error tables are normal user tables internally, error logs are also
> replicated to subscribers if there is a publication FOR ALL TABLES,
> unlike system catalogs. I think some users would not like such
> behavior.
>
> Looking at SAVE_ERROR feature closely, I think it consists of two
> separate features. That is, it enables COPY FROM to load data while
> (1) tolerating errors and (2) logging errors to somewhere (i.e., an
> error table). If we implement only (1), it would be like COPY FROM
> tolerate errors infinitely and log errors to /dev/null. The user
> cannot see the error details but I guess it could still help some
> cases as Andres mentioned[1] (it might be a good idea to send the
> number of rows successfully loaded in a NOTICE message if some rows
> could not be loaded). Then with (2), COPY FROM can log error
> information to somewhere such as tables and server logs and the user
> can select it.
+1.
I may be biased since I wrote some ~v6 patches which just output the
soft errors and number of skipped rows to log, but I think just (1)
would be worth implementing as you pointed out and I like if users could
choose where to log output.
I think there would be situations where it is preferable to save errors
to server log even considering problems which were pointed out in [1],
i.e. manually loading data.
[1]
https://www.postgresql.org/message-id/739953.1699467519%40sss.pgh.pa.us
> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would
> be to support logging destinations such as server logs and tables.
>
> Regards,
>
> [1]
> https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation