Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers
From | Yugo NAGATA |
---|---|
Subject | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row |
Date | |
Msg-id | 20241113220235.85b223c663c98da88ae91b24@sraoss.co.jp Whole thread Raw |
In response to | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row (torikoshia <torikoshia@oss.nttdata.com>) |
Responses |
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
|
List | pgsql-hackers |
On Tue, 12 Nov 2024 17:38:25 +0900 torikoshia <torikoshia@oss.nttdata.com> wrote: > On 2024-11-12 14:17, Yugo Nagata wrote: > > On Tue, 12 Nov 2024 14:03:50 +0900 > > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > >> On Tue, 12 Nov 2024 01:27:53 +0500 > >> Kirill Reshke <reshkekirill@gmail.com> wrote: > >> > >> > On Mon, 11 Nov 2024 at 16:11, torikoshia <torikoshia@oss.nttdata.com> wrote: > >> > > > >> > > On 2024-11-09 21:55, Kirill Reshke wrote: > >> > > > >> > > Thanks for working on this! > >> > > >> > Thanks for reviewing the v7 patch series! > >> > > >> > > > On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com> > >> > > > wrote: > >> > > >> > >> > > >> > >> > > >> > >> > > >> On 2024/10/26 6:03, Kirill Reshke wrote: > >> > > >> > when the REJECT LIMIT is set to some non-zero number and the number of > >> > > >> > row NULL replacements exceeds the limit, is it OK to fail. Because > >> > > >> > there WAS errors, and we should not tolerate more than $limit errors . > >> > > >> > I do find this behavior to be consistent. > >> > > >> > >> > > >> +1 > >> > > >> > >> > > >> > >> > > >> > But what if we don't set a REJECT LIMIT, it is sane to do all > >> > > >> > replacements, as if REJECT LIMIT is inf. > >> > > >> > >> > > >> +1 > >> > > > > >> > > > After thinking for a while, I'm now more opposed to this approach. I > >> > > > think we should count rows with erroneous data as errors only if > >> > > > null substitution for these rows failed, not the total number of rows > >> > > > which were modified. > >> > > > Then, to respect the REJECT LIMIT option, we compare this number with > >> > > > the limit. This is actually simpler approach IMHO. What do You think? > >> > > > >> > > IMHO I prefer the previous interpretation. > >> > > I'm not sure this is what people expect, but I assume that REJECT_LIMIT > >> > > is used to specify how many malformed rows are acceptable in the > >> > > "original" data source. > >> > >> I also prefer the previous version. > >> > >> > I do like the first version of interpretation, but I have a struggle > >> > with it. According to this interpretation, we will fail COPY command > >> > if the number > >> > of malformed data rows exceeds the limit, not the number of rejected > >> > rows (some percentage of malformed rows are accepted with nulnot-null constraintl > >> > substitution) > > I feel your concern is valid. > Currently 'reject' can occur only when converting a column's input value > to its data type, but if we introduce set_to_null option 'reject' also > occurs when inserting null, i.e. not null constraint. I can suppose "reject" means failure of COPY command here, that is, a reject of executing the command, not an error of data input. If so, we can interpret REJECT_LIMIT as "the number of malformed rows allowed before the COPY command is REJECTed" (not the number of rejected rows). In this case, I think we don't have to rename the option name. Of course, if there is more proper name that makes it easy for users to understand the behaviour of the option, renaming should be nice. > >> The documentation says that REJECT_LIMIT "Specifies the maximum number > >> of errors", > >> and there are no wording "reject" in the description, so I wonder it > >> is unclear > >> what means in "REJECT" in REJECT_LIMIT. It may be proper to use > >> ERROR_LIMIT > >> since it is supposed to be used with ON_ERROR. > >> > >> Alternatively, if we emphasize that errors are handled other than > >> terminating > >> the command,perhaps MALFORMED_LIMIT as proposed above or > >> TOLERANCE_LIMIT may be > >> good, for example. > > > > I might misunderstand the meaning of the name. If REJECT_LIMIT means "a > > limit on > > the number of rows with any malformed value allowed before the COPY > > command is > > rejected", we would not have to rename it. > > The meaning of REJECT_LIMIT is what you described, and I think Kirill > worries about cases when malformed rows are accepted(=not REJECTed) with > null substitution. REJECT_LIMIT counts this case as REJECTed. I am a bit confused. You mean "REJECT" is raising a soft error of data input here instead of terminating COPY? Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
pgsql-hackers by date: