Re: Add new COPY option REJECT_LIMIT - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: Add new COPY option REJECT_LIMIT |
Date | |
Msg-id | 979c6d306343f3d665d14eab2cd1b7bc@oss.nttdata.com Whole thread Raw |
In response to | Re: Add new COPY option REJECT_LIMIT (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add new COPY option REJECT_LIMIT
|
List | pgsql-hackers |
On 2024-07-05 12:59, Fujii Masao wrote: > On 2024/07/04 12:05, torikoshia wrote: >> I'm going to update it after discussing the option format as described >> below. > > Thanks! > >> I agree that it's possible to use only REJECT_LIMIT without ON_ERROR. >> I also think it's easy to understand that REJECT_LIMIT=0 is >> ON_ERROR=stop. >> However, expressing REJECT_LIMIT='infinity' needs some definition like >> "setting REJECT_LIMIT to -1 means 'infinity'", doesn't it? If so, I >> think this might not so intuitive. > > How about allowing REJECT_LIMIT to accept the keywords "infinity", > "unlimited", > or "all" in addition to a number? This way, users can specify one of > these > keywords instead of -1 to ignore all errors. The server code would then > internally set the REJECT_LIMIT to -1 or another appropriate value when > these keywords are used, but users wouldn't need to worry about this > detail. Agreed. > If we choose "all" as the keyword, renaming the option to IGNORE_ERRORS > might be more intuitive and easier to understand than REJECT_LIMIT. I feel that 'infinite' and 'unlimited' are unfamiliar values for PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would be a better parameter name as your suggestion. >> Also, since it seems Snowflake and Redshift have both options >> equivalent to REJECT_LIMIT and ON_ERROR, having both of them in >> PostgreSQL COPY might not be surprising: >> - Snowflake's ON_ERROR accepts "CONTINUE | SKIP_FILE | SKIP_FILE_num | >> 'SKIP_FILE_num%' | ABORT_STATEMENT"[1] >> - Redshift has MAXERROR and IGNOREALLERRORS options[2] > > Ok, so here's a summary of the options and their behaviors: > > To ignore all errors and continue to the end: > > - Snowflake: ON_ERROR=CONTINUE > - Redshift: IGNOREALLERRORS > - Postgres (with your patch): ON_ERROR=ignore > - Postgres (with my idea): IGNORE_ERRORS=all > > To fail as soon as an error is found: > > - Snowflake: ON_ERROR=ABORT_STATEMENT (default) / SKIP_FILE > - Redshift: MAXERROR=0 (default) > - Postgres (with your patch): ON_ERROR=stop (default) > - Postgres (with my idea): IGNORE_ERRORS=0 (default) > > To fail when NNN or more errors are found: > > - Snowflake: ON_ERROR=SKIP_FILE_NNN > - Redshift: MAXERROR=NNN > - Postgres (with your patch): REJECT_LIMIT=NNN-1 and ON_ERROR=ignore > - Postgres (with my idea): IGNORE_ERRORS=NNN Thanks for the summary. > This makes me think it might be better to treat REJECT_LIMIT as > an additional option for ON_ERROR=stop instead of ON_ERROR=ignore > if we adopt your patch. Since ON_ERROR=stop is the default, > users could set the maximum number of allowed errors by specifying > only REJECT_LIMIT. Otherwise, they would need to specify both > ON_ERROR=ignore and REJECT_LIMIT. That makes sense. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
pgsql-hackers by date: