Thread: Re: Add reject_limit option to file_fdw

Re: Add reject_limit option to file_fdw

From
Fujii Masao
Date:

On 2024/10/17 22:45, torikoshia wrote:
> Hi,
> 
> 4ac2a9bec introduced reject_limit option to the COPY command, and I was wondering if it might be beneficial to add
thesame option to file_fdw.
 
> 
> Although there may be fewer practical use cases compared to COPY, it could still be useful in situations where the
filebeing read via file_fdw is subject to modifications and there is a need to tolerate a limited number of errors.
 

Agreed.


> 
> What do you think?
> 
> I've attached a patch.

Thanks for the patch! Could you add it to the next CommitFest?


+ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
+SELECT * FROM agg_bad;
+  a  |   b
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)

Wouldn't it be better to include a test where a SELECT query fails, even with
on_error set to "ignore," because the number of errors exceeds reject_limit?


+            if (cstate->opts.reject_limit > 0 && \

The trailing backslash isn't needed here.


+  <varlistentry>
+   <term><literal>reject_limit</literal></term>

This entry should be placed right after the on_error option,
following the same order as in the COPY command documentation.


> Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the value for the foreign table's option must be
single-quoted.I’m not entirely sure if this is the correct approach, but in order to accommodate this, the patch
modifiesthe value of reject_limit option to accept not only numeric values but also strings.
 
> 

I don't have a better approach for this, so I'm okay with your solution.
Just one note: it would be helpful to explain and comment
why defGetCopyRejectLimitOption() accepts and parses both int64 and
string values.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add reject_limit option to file_fdw

From
Fujii Masao
Date:

On 2024/11/05 22:30, torikoshia wrote:
>> Thanks for the patch! Could you add it to the next CommitFest?
> 
> Added an entry for this patch:
> https://commitfest.postgresql.org/50/5331/

Thanks!


>> +ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
>> +SELECT * FROM agg_bad;
>> +  a  |   b
>> +-----+--------
>> + 100 | 99.097
>> +  42 | 324.78
>> +(2 rows)
>>
>> Wouldn't it be better to include a test where a SELECT query fails, even with
>> on_error set to "ignore," because the number of errors exceeds reject_limit?
> 
> Agreed.

As for the agg.bad2 file that your latest patch added:

Instead of adding a new bad input file, what do you think about simply appending
"1;@bbb@" to the existing agg.bad file and adjusting sql/file_fdw.sql as follows?
This approach might keep things simpler.

-------------------
-\set filename :abs_srcdir '/data/agg.bad2'
-ALTER FOREIGN TABLE agg_bad OPTIONS (SET filename :'filename', ADD reject_limit '1');
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1');
  SELECT * FROM agg_bad;
  ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
-------------------

>> +  <varlistentry>
>> +   <term><literal>reject_limit</literal></term>
>>
>> This entry should be placed right after the on_error option,
>> following the same order as in the COPY command documentation.
>>
>>
>>> Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the value for the foreign table's option must be
single-quoted.I’m not entirely sure if this is the correct approach, but in order to accommodate this, the patch
modifiesthe value of reject_limit option to accept not only numeric values but also strings.
 
>>>
>>
>> I don't have a better approach for this, so I'm okay with your solution.
>> Just one note: it would be helpful to explain and comment
>> why defGetCopyRejectLimitOption() accepts and parses both int64 and
>> string values.
> 
> Added a comment for this.

Thanks for adding the comment. It clearly states that REJECT_LIMIT can be
a single-quoted string. However, it might also be helpful to mention that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command
option or as a single-quoted string for the foreign table option using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add reject_limit option to file_fdw

From
torikoshia
Date:
On 2024-11-08 01:44, Fujii Masao wrote:
Thanks for your review!

> On 2024/11/05 22:30, torikoshia wrote:
>>> Thanks for the patch! Could you add it to the next CommitFest?
>> 
>> Added an entry for this patch:
>> https://commitfest.postgresql.org/50/5331/
> 
> Thanks!
> 
> 
>>> +ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
>>> +SELECT * FROM agg_bad;
>>> +  a  |   b
>>> +-----+--------
>>> + 100 | 99.097
>>> +  42 | 324.78
>>> +(2 rows)
>>> 
>>> Wouldn't it be better to include a test where a SELECT query fails, 
>>> even with
>>> on_error set to "ignore," because the number of errors exceeds 
>>> reject_limit?
>> 
>> Agreed.
> 
> As for the agg.bad2 file that your latest patch added:
> 
> Instead of adding a new bad input file, what do you think about simply 
> appending
> "1;@bbb@" to the existing agg.bad file and adjusting sql/file_fdw.sql
> as follows?
> This approach might keep things simpler.

That seems better. Agreed.

> -------------------
> -\set filename :abs_srcdir '/data/agg.bad2'
> -ALTER FOREIGN TABLE agg_bad OPTIONS (SET filename :'filename', ADD
> reject_limit '1');
> +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1');
>  SELECT * FROM agg_bad;
>  ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
> -------------------
> 
>>> +  <varlistentry>
>>> +   <term><literal>reject_limit</literal></term>
>>> 
>>> This entry should be placed right after the on_error option,
>>> following the same order as in the COPY command documentation.
>>> 
>>> 
>>>> Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, 
>>>> the value for the foreign table's option must be single-quoted. I’m 
>>>> not entirely sure if this is the correct approach, but in order to 
>>>> accommodate this, the patch modifies the value of reject_limit 
>>>> option to accept not only numeric values but also strings.
>>>> 
>>> 
>>> I don't have a better approach for this, so I'm okay with your 
>>> solution.
>>> Just one note: it would be helpful to explain and comment
>>> why defGetCopyRejectLimitOption() accepts and parses both int64 and
>>> string values.
>> 
>> Added a comment for this.
> 
> Thanks for adding the comment. It clearly states that REJECT_LIMIT can 
> be
> a single-quoted string. However, it might also be helpful to mention 
> that
> it can be provided as an int64 in the COPY command option. How about
> updating it like this?
> 
> ------------------------------------
> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY 
> command
> option or as a single-quoted string for the foreign table option using
> file_fdw. Therefore this function needs to handle both formats.
> ------------------------------------

Thanks! it seems better.


Attached v3 patch.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.



Re: Add reject_limit option to file_fdw

From
Fujii Masao
Date:

On 2024/11/11 21:45, torikoshia wrote:
>> Thanks for adding the comment. It clearly states that REJECT_LIMIT can be
>> a single-quoted string. However, it might also be helpful to mention that
>> it can be provided as an int64 in the COPY command option. How about
>> updating it like this?
>>
>> ------------------------------------
>> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command
>> option or as a single-quoted string for the foreign table option using
>> file_fdw. Therefore this function needs to handle both formats.
>> ------------------------------------
> 
> Thanks! it seems better.
> 
> 
> Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it, though.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add reject_limit option to file_fdw

From
Kirill Reshke
Date:
On Tue, 12 Nov 2024 at 06:17, torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2024-11-12 01:49, Fujii Masao wrote:
> > On 2024/11/11 21:45, torikoshia wrote:
> >>> Thanks for adding the comment. It clearly states that REJECT_LIMIT
> >>> can be
> >>> a single-quoted string. However, it might also be helpful to mention
> >>> that
> >>> it can be provided as an int64 in the COPY command option. How about
> >>> updating it like this?
> >>>
> >>> ------------------------------------
> >>> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
> >>> command
> >>> option or as a single-quoted string for the foreign table option
> >>> using
> >>> file_fdw. Therefore this function needs to handle both formats.
> >>> ------------------------------------
> >>
> >> Thanks! it seems better.
> >>
> >>
> >> Attached v3 patch.
> >
> > Thanks for updating the patch! It looks like you forgot to attach it,
> > though.
>
> Oops, thanks for pointing it out.
> Here it is.
>
>
> --
> Regards,
>
> --
> Atsushi Torikoshi
> Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Hi!

A little question from me.

This is your doc for reject_limit:

+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while
converting a column's
+     input value to its data type, the same as <command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+

This is how it looks on the current HEAD for copy.

<varlistentry>
    <term><literal>REJECT_LIMIT</literal></term>
    <listitem>
     <para>
      Specifies the maximum number of errors tolerated while converting a
      column's input value to its data type, when <literal>ON_ERROR</literal> is
      set to <literal>ignore</literal>.
      If the input causes more errors than the specified value, the
<command>COPY</command>
      command fails, even with <literal>ON_ERROR</literal> set to
<literal>ignore</literal>.
      This clause must be used with
<literal>ON_ERROR</literal>=<literal>ignore</literal>
      and <replaceable class="parameter">maxerror</replaceable> must
be positive <type>bigint</type>.
      If not specified, <literal>ON_ERROR</literal>=<literal>ignore</literal>
      allows an unlimited number of errors, meaning <command>COPY</command> will
      skip all erroneous data.
     </para>
    </listitem>
   </varlistentry>

There is a difference. Should we add REJECT_LIMIT vs ON_ERROR
clarification for file_fdw too? or maybe we put a reference for COPY
doc.

-- 
Best regards,
Kirill Reshke



Re: Add reject_limit option to file_fdw

From
torikoshia
Date:
On 2024-11-12 15:23, Kirill Reshke wrote:

Thanks for your review!

> On Tue, 12 Nov 2024 at 06:17, torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On 2024-11-12 01:49, Fujii Masao wrote:
>> > On 2024/11/11 21:45, torikoshia wrote:
>> >>> Thanks for adding the comment. It clearly states that REJECT_LIMIT
>> >>> can be
>> >>> a single-quoted string. However, it might also be helpful to mention
>> >>> that
>> >>> it can be provided as an int64 in the COPY command option. How about
>> >>> updating it like this?
>> >>>
>> >>> ------------------------------------
>> >>> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
>> >>> command
>> >>> option or as a single-quoted string for the foreign table option
>> >>> using
>> >>> file_fdw. Therefore this function needs to handle both formats.
>> >>> ------------------------------------
>> >>
>> >> Thanks! it seems better.
>> >>
>> >>
>> >> Attached v3 patch.
>> >
>> > Thanks for updating the patch! It looks like you forgot to attach it,
>> > though.
>> 
>> Oops, thanks for pointing it out.
>> Here it is.
>> 
>> 
>> --
>> Regards,
>> 
>> --
>> Atsushi Torikoshi
>> Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
> 
> Hi!
> 
> A little question from me.
> 
> This is your doc for reject_limit:
> 
> +  <varlistentry>
> +   <term><literal>reject_limit</literal></term>
> +
> +   <listitem>
> +    <para>
> +     Specifies the maximum number of errors tolerated while
> converting a column's
> +     input value to its data type, the same as 
> <command>COPY</command>'s
> +    <literal>REJECT_LIMIT</literal> option.
> +    </para>
> +   </listitem>
> +  </varlistentry>
> +
> 
> This is how it looks on the current HEAD for copy.
> 
> <varlistentry>
>     <term><literal>REJECT_LIMIT</literal></term>
>     <listitem>
>      <para>
>       Specifies the maximum number of errors tolerated while converting 
> a
>       column's input value to its data type, when 
> <literal>ON_ERROR</literal> is
>       set to <literal>ignore</literal>.
>       If the input causes more errors than the specified value, the
> <command>COPY</command>
>       command fails, even with <literal>ON_ERROR</literal> set to
> <literal>ignore</literal>.
>       This clause must be used with
> <literal>ON_ERROR</literal>=<literal>ignore</literal>
>       and <replaceable class="parameter">maxerror</replaceable> must
> be positive <type>bigint</type>.
>       If not specified, 
> <literal>ON_ERROR</literal>=<literal>ignore</literal>
>       allows an unlimited number of errors, meaning 
> <command>COPY</command> will
>       skip all erroneous data.
>      </para>
>     </listitem>
>    </varlistentry>
> 
> There is a difference. Should we add REJECT_LIMIT vs ON_ERROR
> clarification for file_fdw too? or maybe we put a reference for COPY
> doc.

As you may know, some options for file_fdw are the same as those for 
COPY. While the manual provides detailed descriptions of these options 
in the COPY section, the explanations in file_fdw are shorter and less 
detailed.

I intended to follow this approach, but do you think it should be 
changed?

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.



Re: Add reject_limit option to file_fdw

From
torikoshia
Date:
On 2024-11-21 00:43, Fujii Masao wrote:
> On 2024/11/19 21:40, torikoshia wrote:
>>> These messages may be unexpected for some users because the 
>>> documentation of
>>> fild_fdw does not explicitly describe that file_fdw uses COPY 
>>> internally.
>>> (I can find several wordings like "as COPY", though.)
>>> However, since the current file_fdw already has such messages, I came 
>>> to think
>>> making the messages on "reject_limit" consistent with COPY is 
>>> reasonable.
>>> I mean, the previous version of the message seems fine.
>> 
>> Agreed. Thanks for your investigation.
> 
> I've pushed the 0001 patch. Thanks!

Thanks a lot!

>>> As another comment, do we need to add validator test for on_error and
>>> reject_limit as similar to other options?
>> 
>> That might be better.
>> Added some queries which had wrong options to cause errors.
>> 
>> I was unsure whether to add it to "validator test" section or 
>> "on_error, log_verbosity and reject_limit tests" section, but I chose 
>> "validator test" as I believe it makes things clearer.
>> 
>> Added 0002 patch since some of the tests are not related to 
>> reject_limit but just on_error.
> 
> How about adding validator tests for log_verbosity and reject_limit=0,
> similar to the tests in the copy2.sql regression test? I've added those
> two tests, and the latest version of the patch is attached.

Thanks for the update! Agreed.
I think the validator tests related to the on_error option you added are 
sufficient.

As for the other file_fdw options, I'm a bit concerned whether it would 
be better to add validator tests for cases like when 'encoding' is 
'unsupported', but that should be discussed in another thread.

> Regards,

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.