Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From Damir Belyalov
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id CALH1LguAEsoTYJTCsXNB-7z2Hu9UGEpsXA4kj0FOTmoP=6Wp3Q@mail.gmail.com
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (torikoshia <torikoshia@oss.nttdata.com>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
List pgsql-hackers
FWIW, Greenplum has a similar construct (but which also logs the errors
in the
db) where data type errors are skipped as long as the number of errors
don't
exceed a reject limit.  If the reject limit is reached then the COPY
fails:
>
>       LOG ERRORS [ SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
>
IIRC the gist of this was to catch then the user copies the wrong input
data or
plain has a broken file.  Rather than finding out after copying n rows
which
are likely to be garbage the process can be restarted.
 
I think this is a matter for discussion. The same question is: "Where to log errors to separate files or to the system logfile?".
IMO it's better for users to log short-detailed error message to system logfile and not output errors to the terminal. 
 

This version of the patch has a compiler error in the error message:
Yes, corrected it. Changed "ignored_errors" to int64 because "processed" (used for counting copy rows) is int64.


I felt just logging "Error: %ld" would make people wonder the meaning of
the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.
 
Thanks. For more clearance change the message to: "Errors were found: %". 

Regards, Damir Belyalov
Postgres Professional
Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Raising the SCRAM iteration count
Next
From: David Rowley
Date:
Subject: Re: using memoize in in paralel query decreases performance