Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers
From | Yugo Nagata |
---|---|
Subject | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row |
Date | |
Msg-id | 20241112140350.e6599cece83d59387d28b5c0@sraoss.co.jp Whole thread Raw |
In response to | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row (Kirill Reshke <reshkekirill@gmail.com>) |
Responses |
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
|
List | pgsql-hackers |
On Tue, 12 Nov 2024 01:27:53 +0500 Kirill Reshke <reshkekirill@gmail.com> wrote: > On Mon, 11 Nov 2024 at 16:11, torikoshia <torikoshia@oss.nttdata.com> wrote: > > > > On 2024-11-09 21:55, Kirill Reshke wrote: > > > > Thanks for working on this! > > Thanks for reviewing the v7 patch series! > > > > On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com> > > > wrote: > > >> > > >> > > >> > > >> On 2024/10/26 6:03, Kirill Reshke wrote: > > >> > when the REJECT LIMIT is set to some non-zero number and the number of > > >> > row NULL replacements exceeds the limit, is it OK to fail. Because > > >> > there WAS errors, and we should not tolerate more than $limit errors . > > >> > I do find this behavior to be consistent. > > >> > > >> +1 > > >> > > >> > > >> > But what if we don't set a REJECT LIMIT, it is sane to do all > > >> > replacements, as if REJECT LIMIT is inf. > > >> > > >> +1 > > > > > > After thinking for a while, I'm now more opposed to this approach. I > > > think we should count rows with erroneous data as errors only if > > > null substitution for these rows failed, not the total number of rows > > > which were modified. > > > Then, to respect the REJECT LIMIT option, we compare this number with > > > the limit. This is actually simpler approach IMHO. What do You think? > > > > IMHO I prefer the previous interpretation. > > I'm not sure this is what people expect, but I assume that REJECT_LIMIT > > is used to specify how many malformed rows are acceptable in the > > "original" data source. I also prefer the previous version. > I do like the first version of interpretation, but I have a struggle > with it. According to this interpretation, we will fail COPY command > if the number > of malformed data rows exceeds the limit, not the number of rejected > rows (some percentage of malformed rows are accepted with null > substitution) > So, a proper name for the limit will be MALFORMED_LIMIT, or something. > However, we are unable to change the name since the REJECT_LIMIT > option has already been committed. > I guess I'll just have to put up with this contradiction. I will send > an updated patch shortly... I think we can rename the REJECT_LIMIT option because it is not yet released. The documentation says that REJECT_LIMIT "Specifies the maximum number of errors", and there are no wording "reject" in the description, so I wonder it is unclear what means in "REJECT" in REJECT_LIMIT. It may be proper to use ERROR_LIMIT since it is supposed to be used with ON_ERROR. Alternatively, if we emphasize that errors are handled other than terminating the command,perhaps MALFORMED_LIMIT as proposed above or TOLERANCE_LIMIT may be good, for example. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
pgsql-hackers by date: