Re: Practical error logging for very large COPY - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Practical error logging for very large COPY
Date
Msg-id 87zmnwmuxk.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Practical error logging for very large COPY  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > It would be nice to be able to have the former loaded into an actual table
> > where it can be queried and perhaps fixed and reloaded.
> 
> > The latter clearly cannot.
> 
> Sure it can --- you just have to dump it as raw text (or perhaps bytea,
> as someone suggested upthread).

I didn't just say "loaded into an actual table" I said "loaded into an actual
table where it can be queried and perhaps fixed and reloaded". From a
practical point of view having the data in the already parsed format is a
whole lot more useful. You can then do a query to look up the record it
conflicted with or look up possible foreign key values that would work instead
of the failed reference. You can also insert it directly into the table
instead of having to dump it out to a text file and load it with COPY again.

Actually I think it would be useful to be able to do this to constraints
generally, not just during COPY. If I update or insert a record and it fails
due to a constraint violation it would be handy to be able to view the failed
record.

Perhaps what's really needed is something like CREATE TRIGGER AFTER CONSTRAINT
VIOLATION which can then go ahead and insert the record into some other table
if it feels like.

COPY then would just need an option to proceed even after an error. Presumably
only to be used if you're inserting into a clean ETL table, not directly into
production tables.

> I think the distinction you are proposing between constraint errors
> and datatype errors is entirely artificial.  Who's to say what is a
> constraint error and what is a datatype error, especially when you
> start thinking about cases like varchar length constraints or
> domain-type constraints?  If we create a mechanism that behaves
> differently depending on whether the error is detected before or after
> we try to form a tuple containing the data, we're going to have
> something that is exceedingly awkward to use, because the behavior will
> be nearly arbitrary from the user's viewpoint.

Well sure from a theoretical point of view. However from a practical point of
view there's a whole lot more that can be done with the data once it's in a
meaningful format. There's not much you can do with text other than stare at
it (and you can't even necessarily do that with bytea).

-- 
greg



pgsql-hackers by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: server closed connection on a select query
Next
From: Tom Lane
Date:
Subject: Re: server closed connection on a select query