Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From torikoshia
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id 8ad8492ff9fae3481d87c7aab4e0aed0@oss.nttdata.com
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Damir Belyalov <dam.bel07@gmail.com>)
List pgsql-hackers
On 2023-03-06 23:03, Daniel Gustafsson wrote:
>> On 28 Feb 2023, at 15:28, Damir Belyalov <dam.bel07@gmail.com> wrote:
> 
>> Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION. 
>> As expected it works.
>> Also added a description to copy.sgml and made a review on patch.
Thanks for your tests and improvements!

>> I added 'ignored_errors' integer parameter that should be output after 
>> the option is finished.
>> All errors were added to the system logfile with full detailed 
>> context. Maybe it's better to log only error message.
Certainly.

> FWIW, Greenplum has a similar construct (but which also logs the errors 
> in the
> db) where data type errors are skipped as long as the number of errors 
> don't
> exceed a reject limit.  If the reject limit is reached then the COPY 
> fails:
> 
>     LOG ERRORS [ SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
> 
> IIRC the gist of this was to catch then the user copies the wrong input 
> data or
> plain has a broken file.  Rather than finding out after copying n rows 
> which
> are likely to be garbage the process can be restarted.
> 
> This version of the patch has a compiler error in the error message:
> 
> copyfrom.c: In function ‘CopyFrom’:
> copyfrom.c:1008:29: error: format ‘%ld’ expects argument of type ‘long
> int’, but argument 2 has type ‘uint64’ {aka ‘long long unsigned int’}
> [-Werror=format=]
> 1008 | ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
>      |                          ^~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
>      |                                              |
>      |                                              uint64 {aka long
> long unsigned int}
> 
> 
> On that note though, it seems to me that this error message leaves a 
> bit to be
> desired with regards to the level of detail.
+1.
I felt just logging "Error: %ld" would make people wonder the meaning of 
the %ld. Logging something like ""Error: %ld data type errors were 
found" might be clearer.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Katsuragi Yuta
Date:
Subject: Re: [Proposal] Add foreign-server health checks infrastructure
Next
From: Alexander Korotkov
Date:
Subject: Re: POC: Lock updated tuples in tuple_update() and tuple_delete()