Thread: Re: Add reject_limit option to file_fdw
On 2024/10/17 22:45, torikoshia wrote: > Hi, > > 4ac2a9bec introduced reject_limit option to the COPY command, and I was wondering if it might be beneficial to add thesame option to file_fdw. > > Although there may be fewer practical use cases compared to COPY, it could still be useful in situations where the filebeing read via file_fdw is subject to modifications and there is a need to tolerate a limited number of errors. Agreed. > > What do you think? > > I've attached a patch. Thanks for the patch! Could you add it to the next CommitFest? +ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1'); +SELECT * FROM agg_bad; + a | b +-----+-------- + 100 | 99.097 + 42 | 324.78 +(2 rows) Wouldn't it be better to include a test where a SELECT query fails, even with on_error set to "ignore," because the number of errors exceeds reject_limit? + if (cstate->opts.reject_limit > 0 && \ The trailing backslash isn't needed here. + <varlistentry> + <term><literal>reject_limit</literal></term> This entry should be placed right after the on_error option, following the same order as in the COPY command documentation. > Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the value for the foreign table's option must be single-quoted.I’m not entirely sure if this is the correct approach, but in order to accommodate this, the patch modifiesthe value of reject_limit option to accept not only numeric values but also strings. > I don't have a better approach for this, so I'm okay with your solution. Just one note: it would be helpful to explain and comment why defGetCopyRejectLimitOption() accepts and parses both int64 and string values. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2024/11/05 22:30, torikoshia wrote: >> Thanks for the patch! Could you add it to the next CommitFest? > > Added an entry for this patch: > https://commitfest.postgresql.org/50/5331/ Thanks! >> +ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1'); >> +SELECT * FROM agg_bad; >> + a | b >> +-----+-------- >> + 100 | 99.097 >> + 42 | 324.78 >> +(2 rows) >> >> Wouldn't it be better to include a test where a SELECT query fails, even with >> on_error set to "ignore," because the number of errors exceeds reject_limit? > > Agreed. As for the agg.bad2 file that your latest patch added: Instead of adding a new bad input file, what do you think about simply appending "1;@bbb@" to the existing agg.bad file and adjusting sql/file_fdw.sql as follows? This approach might keep things simpler. ------------------- -\set filename :abs_srcdir '/data/agg.bad2' -ALTER FOREIGN TABLE agg_bad OPTIONS (SET filename :'filename', ADD reject_limit '1'); +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); SELECT * FROM agg_bad; ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2'); ------------------- >> + <varlistentry> >> + <term><literal>reject_limit</literal></term> >> >> This entry should be placed right after the on_error option, >> following the same order as in the COPY command documentation. >> >> >>> Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the value for the foreign table's option must be single-quoted.I’m not entirely sure if this is the correct approach, but in order to accommodate this, the patch modifiesthe value of reject_limit option to accept not only numeric values but also strings. >>> >> >> I don't have a better approach for this, so I'm okay with your solution. >> Just one note: it would be helpful to explain and comment >> why defGetCopyRejectLimitOption() accepts and parses both int64 and >> string values. > > Added a comment for this. Thanks for adding the comment. It clearly states that REJECT_LIMIT can be a single-quoted string. However, it might also be helpful to mention that it can be provided as an int64 in the COPY command option. How about updating it like this? ------------------------------------ REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command option or as a single-quoted string for the foreign table option using file_fdw. Therefore this function needs to handle both formats. ------------------------------------ Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2024-11-08 01:44, Fujii Masao wrote: Thanks for your review! > On 2024/11/05 22:30, torikoshia wrote: >>> Thanks for the patch! Could you add it to the next CommitFest? >> >> Added an entry for this patch: >> https://commitfest.postgresql.org/50/5331/ > > Thanks! > > >>> +ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1'); >>> +SELECT * FROM agg_bad; >>> + a | b >>> +-----+-------- >>> + 100 | 99.097 >>> + 42 | 324.78 >>> +(2 rows) >>> >>> Wouldn't it be better to include a test where a SELECT query fails, >>> even with >>> on_error set to "ignore," because the number of errors exceeds >>> reject_limit? >> >> Agreed. > > As for the agg.bad2 file that your latest patch added: > > Instead of adding a new bad input file, what do you think about simply > appending > "1;@bbb@" to the existing agg.bad file and adjusting sql/file_fdw.sql > as follows? > This approach might keep things simpler. That seems better. Agreed. > ------------------- > -\set filename :abs_srcdir '/data/agg.bad2' > -ALTER FOREIGN TABLE agg_bad OPTIONS (SET filename :'filename', ADD > reject_limit '1'); > +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); > SELECT * FROM agg_bad; > ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2'); > ------------------- > >>> + <varlistentry> >>> + <term><literal>reject_limit</literal></term> >>> >>> This entry should be placed right after the on_error option, >>> following the same order as in the COPY command documentation. >>> >>> >>>> Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, >>>> the value for the foreign table's option must be single-quoted. I’m >>>> not entirely sure if this is the correct approach, but in order to >>>> accommodate this, the patch modifies the value of reject_limit >>>> option to accept not only numeric values but also strings. >>>> >>> >>> I don't have a better approach for this, so I'm okay with your >>> solution. >>> Just one note: it would be helpful to explain and comment >>> why defGetCopyRejectLimitOption() accepts and parses both int64 and >>> string values. >> >> Added a comment for this. > > Thanks for adding the comment. It clearly states that REJECT_LIMIT can > be > a single-quoted string. However, it might also be helpful to mention > that > it can be provided as an int64 in the COPY command option. How about > updating it like this? > > ------------------------------------ > REJECT_LIMIT can be specified in two ways: as an int64 for the COPY > command > option or as a single-quoted string for the foreign table option using > file_fdw. Therefore this function needs to handle both formats. > ------------------------------------ Thanks! it seems better. Attached v3 patch. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On 2024/11/11 21:45, torikoshia wrote: >> Thanks for adding the comment. It clearly states that REJECT_LIMIT can be >> a single-quoted string. However, it might also be helpful to mention that >> it can be provided as an int64 in the COPY command option. How about >> updating it like this? >> >> ------------------------------------ >> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command >> option or as a single-quoted string for the foreign table option using >> file_fdw. Therefore this function needs to handle both formats. >> ------------------------------------ > > Thanks! it seems better. > > > Attached v3 patch. Thanks for updating the patch! It looks like you forgot to attach it, though. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, 12 Nov 2024 at 06:17, torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2024-11-12 01:49, Fujii Masao wrote: > > On 2024/11/11 21:45, torikoshia wrote: > >>> Thanks for adding the comment. It clearly states that REJECT_LIMIT > >>> can be > >>> a single-quoted string. However, it might also be helpful to mention > >>> that > >>> it can be provided as an int64 in the COPY command option. How about > >>> updating it like this? > >>> > >>> ------------------------------------ > >>> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY > >>> command > >>> option or as a single-quoted string for the foreign table option > >>> using > >>> file_fdw. Therefore this function needs to handle both formats. > >>> ------------------------------------ > >> > >> Thanks! it seems better. > >> > >> > >> Attached v3 patch. > > > > Thanks for updating the patch! It looks like you forgot to attach it, > > though. > > Oops, thanks for pointing it out. > Here it is. > > > -- > Regards, > > -- > Atsushi Torikoshi > Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K. Hi! A little question from me. This is your doc for reject_limit: + <varlistentry> + <term><literal>reject_limit</literal></term> + + <listitem> + <para> + Specifies the maximum number of errors tolerated while converting a column's + input value to its data type, the same as <command>COPY</command>'s + <literal>REJECT_LIMIT</literal> option. + </para> + </listitem> + </varlistentry> + This is how it looks on the current HEAD for copy. <varlistentry> <term><literal>REJECT_LIMIT</literal></term> <listitem> <para> Specifies the maximum number of errors tolerated while converting a column's input value to its data type, when <literal>ON_ERROR</literal> is set to <literal>ignore</literal>. If the input causes more errors than the specified value, the <command>COPY</command> command fails, even with <literal>ON_ERROR</literal> set to <literal>ignore</literal>. This clause must be used with <literal>ON_ERROR</literal>=<literal>ignore</literal> and <replaceable class="parameter">maxerror</replaceable> must be positive <type>bigint</type>. If not specified, <literal>ON_ERROR</literal>=<literal>ignore</literal> allows an unlimited number of errors, meaning <command>COPY</command> will skip all erroneous data. </para> </listitem> </varlistentry> There is a difference. Should we add REJECT_LIMIT vs ON_ERROR clarification for file_fdw too? or maybe we put a reference for COPY doc. -- Best regards, Kirill Reshke
On 2024-11-12 15:23, Kirill Reshke wrote: Thanks for your review! > On Tue, 12 Nov 2024 at 06:17, torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> On 2024-11-12 01:49, Fujii Masao wrote: >> > On 2024/11/11 21:45, torikoshia wrote: >> >>> Thanks for adding the comment. It clearly states that REJECT_LIMIT >> >>> can be >> >>> a single-quoted string. However, it might also be helpful to mention >> >>> that >> >>> it can be provided as an int64 in the COPY command option. How about >> >>> updating it like this? >> >>> >> >>> ------------------------------------ >> >>> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY >> >>> command >> >>> option or as a single-quoted string for the foreign table option >> >>> using >> >>> file_fdw. Therefore this function needs to handle both formats. >> >>> ------------------------------------ >> >> >> >> Thanks! it seems better. >> >> >> >> >> >> Attached v3 patch. >> > >> > Thanks for updating the patch! It looks like you forgot to attach it, >> > though. >> >> Oops, thanks for pointing it out. >> Here it is. >> >> >> -- >> Regards, >> >> -- >> Atsushi Torikoshi >> Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K. > > Hi! > > A little question from me. > > This is your doc for reject_limit: > > + <varlistentry> > + <term><literal>reject_limit</literal></term> > + > + <listitem> > + <para> > + Specifies the maximum number of errors tolerated while > converting a column's > + input value to its data type, the same as > <command>COPY</command>'s > + <literal>REJECT_LIMIT</literal> option. > + </para> > + </listitem> > + </varlistentry> > + > > This is how it looks on the current HEAD for copy. > > <varlistentry> > <term><literal>REJECT_LIMIT</literal></term> > <listitem> > <para> > Specifies the maximum number of errors tolerated while converting > a > column's input value to its data type, when > <literal>ON_ERROR</literal> is > set to <literal>ignore</literal>. > If the input causes more errors than the specified value, the > <command>COPY</command> > command fails, even with <literal>ON_ERROR</literal> set to > <literal>ignore</literal>. > This clause must be used with > <literal>ON_ERROR</literal>=<literal>ignore</literal> > and <replaceable class="parameter">maxerror</replaceable> must > be positive <type>bigint</type>. > If not specified, > <literal>ON_ERROR</literal>=<literal>ignore</literal> > allows an unlimited number of errors, meaning > <command>COPY</command> will > skip all erroneous data. > </para> > </listitem> > </varlistentry> > > There is a difference. Should we add REJECT_LIMIT vs ON_ERROR > clarification for file_fdw too? or maybe we put a reference for COPY > doc. As you may know, some options for file_fdw are the same as those for COPY. While the manual provides detailed descriptions of these options in the COPY section, the explanations in file_fdw are shorter and less detailed. I intended to follow this approach, but do you think it should be changed? -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On 2024-11-21 00:43, Fujii Masao wrote: > On 2024/11/19 21:40, torikoshia wrote: >>> These messages may be unexpected for some users because the >>> documentation of >>> fild_fdw does not explicitly describe that file_fdw uses COPY >>> internally. >>> (I can find several wordings like "as COPY", though.) >>> However, since the current file_fdw already has such messages, I came >>> to think >>> making the messages on "reject_limit" consistent with COPY is >>> reasonable. >>> I mean, the previous version of the message seems fine. >> >> Agreed. Thanks for your investigation. > > I've pushed the 0001 patch. Thanks! Thanks a lot! >>> As another comment, do we need to add validator test for on_error and >>> reject_limit as similar to other options? >> >> That might be better. >> Added some queries which had wrong options to cause errors. >> >> I was unsure whether to add it to "validator test" section or >> "on_error, log_verbosity and reject_limit tests" section, but I chose >> "validator test" as I believe it makes things clearer. >> >> Added 0002 patch since some of the tests are not related to >> reject_limit but just on_error. > > How about adding validator tests for log_verbosity and reject_limit=0, > similar to the tests in the copy2.sql regression test? I've added those > two tests, and the latest version of the patch is attached. Thanks for the update! Agreed. I think the validator tests related to the on_error option you added are sufficient. As for the other file_fdw options, I'm a bit concerned whether it would be better to add validator tests for cases like when 'encoding' is 'unsupported', but that should be discussed in another thread. > Regards, -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.