Re: Add new COPY option REJECT_LIMIT - Mailing list pgsql-hackers
From | Junwang Zhao |
---|---|
Subject | Re: Add new COPY option REJECT_LIMIT |
Date | |
Msg-id | CAEG8a3JPdGkYKXadAEOphAoiUkY3hHZagk9=UPnyOq386e6gYg@mail.gmail.com Whole thread Raw |
In response to | Re: Add new COPY option REJECT_LIMIT (torikoshia <torikoshia@oss.nttdata.com>) |
List | pgsql-hackers |
Hi Torikoshia, On Wed, Jul 17, 2024 at 9:21 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2024-07-03 02:07, Fujii Masao wrote: > > However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR > > option is still necessary. > > I remembered another reason for the necessity of ON_ERROR. > > ON_ERROR defines how to behave when encountering an error and it just > accepts 'ignore' and 'stop' currently, but is expected to support other > options such as saving details of errors to a table[1]. > ON_ERROR=stop is a synonym for REJECT_LIMIT=infinity, but I imagine > REJECT_LIMIT would not replace future options of ON_ERROR. > > Considering this and the option we want to add this time is to specify > an upper limit on the number or ratio of errors, the name of this option > like "reject_limit" seems better than "ignore_errors". > > On Fri, Jul 5, 2024 at 4:13 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: > > On 2024-07-05 12:59, Fujii Masao wrote: > >> On 2024/07/04 12:05, torikoshia wrote: > >>> I'm going to update it after discussing the option format as > >>> described > >>> below. > > Updated the patch. > 0001 sets limit by the absolute number of error rows and 0002 sets limit > by ratio of the error. In patch 0002, the ratio is calculated by the already skipped/processed rows, but what if a user wants to copy 1000 rows, and he/she can tolerate 10 error rows, so he/she might set *reject_limit 0.01*, but one bad row in the first 100 rows will fail the entire command, this might surprise the user. This case can be resolved by 0001 *reject_limit 10*, so I think the *by ratio* is less useful. > > >> If we choose "all" as the keyword, renaming the option to > >> IGNORE_ERRORS > >> might be more intuitive and easier to understand than REJECT_LIMIT. > > > I feel that 'infinite' and 'unlimited' are unfamiliar values for > > PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would > > be a better parameter name as your suggestion. > > As described above, attached patch adopts REJECT_LIMIT, so it uses > "infinity". > > >> This makes me think it might be better to treat REJECT_LIMIT as > >> an additional option for ON_ERROR=stop instead of ON_ERROR=ignore > >> if we adopt your patch. Since ON_ERROR=stop is the default, > >> users could set the maximum number of allowed errors by specifying > >> only REJECT_LIMIT. Otherwise, they would need to specify both > >> ON_ERROR=ignore and REJECT_LIMIT. > > > That makes sense. > > On my second thought, whatever value ON_ERROR is specified(e.g. ignore, > stop, table), it seems fine to use REJECT_LIMIT. > I feel REJECT_LIMIT has both "ignore" and "stop" characteristics, > meaning it ignores errors until it reaches REJECT_LIMIT and stops when > it exceeds the REJECT_LIMIT. > And REJECT_LIMIT seems orthogonal to 'table', which specifies where to > save error details. > > Attached patch allows using REJECT_LIMIT regardless of the ON_ERROR > option value. > > > [1] > https://www.postgresql.org/message-id/flat/CACJufxH_OJpVra=0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q@mail.gmail.com > > -- > Regards, > > -- > Atsushi Torikoshi > NTT DATA Group Corporation -- Regards Junwang Zhao
pgsql-hackers by date: