Re: Add new COPY option REJECT_LIMIT - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: Add new COPY option REJECT_LIMIT |
Date | |
Msg-id | f9feda0ecd4e2e09620a6f9ca8f4aac1@oss.nttdata.com Whole thread Raw |
In response to | Re: Add new COPY option REJECT_LIMIT (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add new COPY option REJECT_LIMIT
Re: Add new COPY option REJECT_LIMIT |
List | pgsql-hackers |
On Fri, Jul 19, 2024 at 11:48 PM Junwang Zhao <zhjwpku@gmail.com> wrote: Thanks for the comment. > In patch 0002, the ratio is calculated by the already skipped/processed > rows, but what if a user wants to copy 1000 rows, and he/she can > tolerate > 10 error rows, so he/she might set *reject_limit 0.01*, but one bad row > in the > first 100 rows will fail the entire command, this might surprise the > user. Since the ratio is calculated after all data is processed, the case "one bad row in the first 100 rows will fail the entire command" doesn't happen: =# \! wc -l 1000rows-with-10err.data 1000 1000rows-with-10err.data =# COPY t1 from '1000rows-with-10err.data' with (log_verbosity verbose, reject_limit 0.01); NOTICE: skipping row due to data type incompatibility at line 10 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 11 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 12 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 13 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 14 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 15 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 16 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 17 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 18 for column i: "a" NOTICE: skipping row due to data type incompatibility at line 19 for column i: "a" NOTICE: 10 rows were skipped due to data type incompatibility COPY 990 On 2024-07-20 02:08, Fujii Masao wrote: > On 2024/07/19 22:03, Fujii Masao wrote: >> >> >> On 2024/07/17 22:21, torikoshia wrote: >>> On 2024-07-03 02:07, Fujii Masao wrote: >>>> However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR >>>> option is still necessary. >>> >>> I remembered another reason for the necessity of ON_ERROR. >>> >>> ON_ERROR defines how to behave when encountering an error and it just >>> accepts 'ignore' and 'stop' currently, but is expected to support >>> other options such as saving details of errors to a table[1]. >> >> Wouldn't it be better to separate the option specifying where >> error details are output from the ON_ERROR option >> (which determines behavior when encountering errors)? >> "table" seems valid for both ON_ERROR=ignore and ON_ERROR=stop. > > I still find it odd to accept "table" as a value for ON_ERROR. However, > "set_to_null" or "replace-column" proposed in [1] seem valid for > ON_ERROR. So, I'm okay with keeping the ON_ERROR option. Agreed. >> On my second thought, whatever value ON_ERROR is specified(e.g. >> ignore, stop, table), it seems fine to use REJECT_LIMIT. >> I feel REJECT_LIMIT has both "ignore" and "stop" characteristics, >> meaning it ignores errors until it reaches REJECT_LIMIT and stops when >> it exceeds the REJECT_LIMIT. > > ON_ERROR specifies how to handle errors, and "stop" means to fail > the command. So, if ON_ERROR=stop, REJECT_LIMIT should have no effect, > and the command should fail immediately upon encountering an error. > > As in your original proposal, I now think REJECT_LIMIT should only > apply when ON_ERROR=ignore. The command would ignore errors and > continue processing, but if the number of errors exceeds REJECT_LIMIT, > the command should fail. Thought? Makes sense. Updated the patch. > BTW if "set_to_null" is supported someday, REJECT_LIMIT can also > apply. The command would cinsert NULL into the target table upon > encountering errors and continue, but fail if the number of errors > exceed REJECT_LIMIT. Agreed. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
pgsql-hackers by date: