Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) |
Date | |
Msg-id | 752672.1699474336@sss.pgh.pa.us Whole thread Raw |
In response to | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) |
List | pgsql-hackers |
Daniel Gustafsson <daniel@yesql.se> writes: >> On 8 Nov 2023, at 19:18, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think an actually usable feature of this sort would involve >> copying all the failed lines to some alternate output medium, >> perhaps a second table with a TEXT column to receive the original >> data line. (Or maybe an array of text that could receive the >> broken-down field values?) Maybe we could dump the message info, >> line number, field name etc into additional columns. > I agree that the errors should be easily visible to the user in some way. The > feature is for sure interesting, especially in data warehouse type jobs where > dirty data is often ingested. I agree it's interesting, but we need to get it right the first time. Here is a very straw-man-level sketch of what I think might work. The option to COPY FROM looks something like ERRORS TO other_table_name (item [, item [, ...]]) where the "items" are keywords identifying the information item we will insert into each successive column of the target table. This design allows the user to decide which items are of use to them. I envision items like LINENO bigint COPY line number, counting from 1 LINE text raw text of line (after encoding conversion) FIELDS text[] separated, de-escaped string fields (the data that was or would be fed to input functions) FIELD text name of troublesome field, if field-specific MESSAGE text error message text DETAIL text error message detail, if any SQLSTATE text error SQLSTATE code Some of these would have to be populated as NULL if we didn't get that far in processing the line. In the worst case, which is encoding conversion failure, I think we couldn't populate any of the data items except LINENO. Not sure if we need to insist that the target table columns be exactly the data types I show above. It'd be nice to allow the LINENO target to be plain int, perhaps. OTOH, do we really want to have to deal with issues like conversion failures while trying to report an error? > As a data point, Greenplum has this feature with additional SQL syntax to > control it: > COPY .. LOG ERRORS SEGMENT REJECT LIMIT xyz ROWS; > LOG ERRORS instructs the database to log the faulty rows and SEGMENT REJECT > LIMIT xyz ROWS sets the limit of how many rows can be faulty before the > operation errors out. I'm not at all advocating that we should mimic this, > just wanted to add a reference to postgres derivative where this has been > implemented. Hm. A "reject limit" might be a useful add-on, but I wouldn't advocate including it in the initial patch. regards, tom lane
pgsql-hackers by date: