Re: Add tuples_skipped to pg_stat_progress_copy - Mailing list pgsql-hackers

From torikoshia
Subject Re: Add tuples_skipped to pg_stat_progress_copy
Date
Msg-id 6f037b4201e4515c858f1a6eac18b2d2@oss.nttdata.com
Whole thread Raw
In response to Re: Add tuples_skipped to pg_stat_progress_copy  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Add tuples_skipped to pg_stat_progress_copy
List pgsql-hackers
On 2024-01-17 14:47, Masahiko Sawada wrote:
> On Wed, Jan 17, 2024 at 2:22 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> Hi,
>> 
>> 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to
>> skip malformed data, but there is no way to watch the number of 
>> skipped
>> rows during COPY.
>> 
>> Attached patch adds tuples_skipped to pg_stat_progress_copy, which
>> counts the number of skipped tuples because source data is malformed.
>> If SAVE_ERROR_TO is not specified, this column remains zero.
>> 
>> The advantage would be that users can quickly notice and stop COPYing
>> when there is a larger amount of skipped data than expected, for
>> example.
>> 
>> As described in commit log, it is expected to add more choices for
>> SAVE_ERROR_TO like 'log' and using such options may enable us to know
>> the number of skipped tuples during COPY, but exposed in
>> pg_stat_progress_copy would be easier to monitor.
>> 
>> 
>> What do you think?
> 
> +1
> 
> The patch is pretty simple. Here is a comment:
> 
> +       (if <literal>SAVE_ERROR_TO</literal> is specified, otherwise 
> zero).
> +      </para></entry>
> +     </row>
> 
> To be precise, this counter only advances when a value other than
> 'ERROR' is specified to SAVE_ERROR_TO option.

Thanks for your comment and review!

Updated the patch according to your comment and option name change by 
b725b7eec.


BTW, based on this patch, I think we can add another option which 
specifies the maximum tolerable number of malformed rows.
I remember this was discussed in [1], and feel it would be useful when 
loading 'dirty' data but there is a limit to how dirty it can be.
Attached 0002 is WIP patch for this(I haven't added doc yet).

This may be better discussed in another thread, but any comments(e.g. 
necessity of this option, option name) are welcome.


[1] 
https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: remaining sql/json patches
Next
From: Heikki Linnakangas
Date:
Subject: Re: Adding facility for injection points (or probe points?) for more advanced tests