Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From torikoshia
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id 51328f620d58dcb5e554dbe32d1c0533@oss.nttdata.com
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On 2023-12-15 05:48, Masahiko Sawada wrote:

Thanks for joining this discussion!

> I've read this thread and the latest patch. IIUC with SAVE_ERROR
> option, COPY FROM creates an error table for the target table and
> writes error information there.
> 
> While I agree that the final shape of this feature would be something
> like that design, I'm concerned some features are missing in order to
> make this feature useful in practice. For instance, error logs are
> inserted to error tables without bounds, meaning that users who want
> to tolerate errors during COPY FROM  will have to truncate or drop the
> error tables periodically, or the database will grow with error logs
> without limit. Ideally such maintenance work should be done by the
> database. There might be some users who want to log such conversion
> errors in server logs to avoid such maintenance work. I think we
> should provide an option for where to write, at least. Also, since the
> error tables are normal user tables internally, error logs are also
> replicated to subscribers if there is a publication FOR ALL TABLES,
> unlike system catalogs. I think some users would not like such
> behavior.
> 
> Looking at SAVE_ERROR feature closely, I think it consists of two
> separate features. That is, it enables COPY FROM to load data while
> (1) tolerating errors and (2) logging errors to somewhere (i.e., an
> error table). If we implement only (1), it would be like COPY FROM
> tolerate errors infinitely and log errors to /dev/null. The user
> cannot see the error details but I guess it could still help some
> cases as Andres mentioned[1] (it might be a good idea to send the
> number of rows successfully loaded in a NOTICE message if some rows
> could not be loaded). Then with (2), COPY FROM can log error
> information to somewhere such as tables and server logs and the user
> can select it.

+1.
I may be biased since I wrote some ~v6 patches which just output the 
soft errors and number of skipped rows to log, but I think just (1) 
would be worth implementing as you pointed out and I like if users could 
choose where to log output.

I think there would be situations where it is preferable to save errors 
to server log even considering problems which were pointed out in [1], 
i.e. manually loading data.

[1] 
https://www.postgresql.org/message-id/739953.1699467519%40sss.pgh.pa.us

> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would
> be to support logging destinations such as server logs and tables.
> 
> Regards,
> 
> [1] 
> https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: "pgoutput" options missing on documentation
Next
From: Amit Kapila
Date:
Subject: Re: "pgoutput" options missing on documentation