Re: Add new COPY option REJECT_LIMIT - Mailing list pgsql-hackers

From Junwang Zhao
Subject Re: Add new COPY option REJECT_LIMIT
Date
Msg-id CAEG8a3JPdGkYKXadAEOphAoiUkY3hHZagk9=UPnyOq386e6gYg@mail.gmail.com
Whole thread Raw
In response to Re: Add new COPY option REJECT_LIMIT  (torikoshia <torikoshia@oss.nttdata.com>)
List pgsql-hackers
Hi Torikoshia,

On Wed, Jul 17, 2024 at 9:21 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2024-07-03 02:07, Fujii Masao wrote:
> > However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR
> > option is still necessary.
>
> I remembered another reason for the necessity of ON_ERROR.
>
> ON_ERROR defines how to behave when encountering an error and it just
> accepts 'ignore' and 'stop' currently, but is expected to support other
> options such as saving details of errors to a table[1].
> ON_ERROR=stop is a synonym for REJECT_LIMIT=infinity, but I imagine
> REJECT_LIMIT would not replace future options of ON_ERROR.
>
> Considering this and the option we want to add this time is to specify
> an upper limit on the number or ratio of errors, the name of this option
> like "reject_limit" seems better than "ignore_errors".
>
> On Fri, Jul 5, 2024 at 4:13 PM torikoshia <torikoshia@oss.nttdata.com>
> wrote:
> > On 2024-07-05 12:59, Fujii Masao wrote:
> >> On 2024/07/04 12:05, torikoshia wrote:
> >>> I'm going to update it after discussing the option format as
> >>> described
> >>> below.
>
> Updated the patch.
> 0001 sets limit by the absolute number of error rows and 0002 sets limit
> by ratio of the error.

In patch 0002, the ratio is calculated by the already skipped/processed
rows, but what if a user wants to copy 1000 rows, and he/she can tolerate
10 error rows, so he/she might set *reject_limit 0.01*, but one bad row in the
first 100 rows will fail the entire command, this might surprise the user.

This case can be resolved by 0001 *reject_limit 10*, so I think the *by ratio*
is less useful.

>
> >> If we choose "all" as the keyword, renaming the option to
> >> IGNORE_ERRORS
> >> might be more intuitive and easier to understand than REJECT_LIMIT.
>
> > I feel that 'infinite' and 'unlimited' are unfamiliar values for
> > PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would
> > be a better parameter name as your suggestion.
>
> As described above, attached patch adopts REJECT_LIMIT, so it uses
> "infinity".
>
> >> This makes me think it might be better to treat REJECT_LIMIT as
> >> an additional option for ON_ERROR=stop instead of ON_ERROR=ignore
> >> if we adopt your patch. Since ON_ERROR=stop is the default,
> >> users could set the maximum number of allowed errors by specifying
> >> only REJECT_LIMIT. Otherwise, they would need to specify both
> >> ON_ERROR=ignore and REJECT_LIMIT.
>
> > That makes sense.
>
> On my second thought, whatever value ON_ERROR is specified(e.g. ignore,
> stop, table), it seems fine to use REJECT_LIMIT.
> I feel REJECT_LIMIT has both "ignore" and "stop" characteristics,
> meaning it ignores errors until it reaches REJECT_LIMIT and stops when
> it exceeds the REJECT_LIMIT.
> And REJECT_LIMIT seems orthogonal to 'table',  which specifies where to
> save error details.
>
> Attached patch allows using REJECT_LIMIT regardless of the ON_ERROR
> option value.
>
>
> [1]
> https://www.postgresql.org/message-id/flat/CACJufxH_OJpVra=0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q@mail.gmail.com
>
> --
> Regards,
>
> --
> Atsushi Torikoshi
> NTT DATA Group Corporation



--
Regards
Junwang Zhao



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: behavior of GROUP BY with VOLATILE expressions
Next
From: Robert Haas
Date:
Subject: Re: Incremental backup from a streaming replication standby fails