On 2024-01-24 17:05, Masahiko Sawada wrote:
> On Tue, Jan 23, 2024 at 1:02 AM torikoshia <torikoshia@oss.nttdata.com>
> wrote:
>>
>> On 2024-01-17 14:47, Masahiko Sawada wrote:
>> > On Wed, Jan 17, 2024 at 2:22 PM torikoshia <torikoshia@oss.nttdata.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to
>> >> skip malformed data, but there is no way to watch the number of
>> >> skipped
>> >> rows during COPY.
>> >>
>> >> Attached patch adds tuples_skipped to pg_stat_progress_copy, which
>> >> counts the number of skipped tuples because source data is malformed.
>> >> If SAVE_ERROR_TO is not specified, this column remains zero.
>> >>
>> >> The advantage would be that users can quickly notice and stop COPYing
>> >> when there is a larger amount of skipped data than expected, for
>> >> example.
>> >>
>> >> As described in commit log, it is expected to add more choices for
>> >> SAVE_ERROR_TO like 'log' and using such options may enable us to know
>> >> the number of skipped tuples during COPY, but exposed in
>> >> pg_stat_progress_copy would be easier to monitor.
>> >>
>> >>
>> >> What do you think?
>> >
>> > +1
>> >
>> > The patch is pretty simple. Here is a comment:
>> >
>> > + (if <literal>SAVE_ERROR_TO</literal> is specified, otherwise
>> > zero).
>> > + </para></entry>
>> > + </row>
>> >
>> > To be precise, this counter only advances when a value other than
>> > 'ERROR' is specified to SAVE_ERROR_TO option.
>>
>> Thanks for your comment and review!
>>
>> Updated the patch according to your comment and option name change by
>> b725b7eec.
>
> Thanks! The patch looks good to me. I'm going to push it tomorrow,
> barring any objections.
Thanks!
>>
>> BTW, based on this patch, I think we can add another option which
>> specifies the maximum tolerable number of malformed rows.
>> I remember this was discussed in [1], and feel it would be useful when
>> loading 'dirty' data but there is a limit to how dirty it can be.
>> Attached 0002 is WIP patch for this(I haven't added doc yet).
>
> Yeah, it could be a good option.
>
>> This may be better discussed in another thread, but any comments(e.g.
>> necessity of this option, option name) are welcome.
>
> I'd recommend forking a new thread for this option. As far as I
> remember, there also was an opinion that "reject limit" stuff is not
> very useful.
OK, I'll make another thread for this.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation