Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From torikoshia
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id 6076bebc06624713efda7421f8ed1ad0@oss.nttdata.com
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (jian he <jian.universality@gmail.com>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (jian he <jian.universality@gmail.com>)
List pgsql-hackers
On Wed, Jan 10, 2024 at 4:42 PM Masahiko Sawada <sawada.mshk@gmail.com> 
wrote:

> Yeah, I'm still thinking it's better to implement this feature
> incrementally. Given we're closing to feature freeze, I think it's
> unlikely to get the whole feature into PG17 since there are still many
> design discussions we need in addition to what Torikoshi-san pointed
> out. The feature like "ignore errors" or "logging errors" would have
> higher possibilities. Even if we get only these parts of the whole
> "error table" feature into PG17, it will make it much easier to
implement "error tables" feature.

+1.
I'm also going to make patch for "logging errors", since this 
functionality is isolated from v7 patch.

> Seems promising. I'll look at the patch.
Thanks a lot!
Sorry to attach v2 if you already reviewed v1..

On 2024-01-11 12:13, jian he wrote:
> On Tue, Jan 9, 2024 at 10:36 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada 
>> <sawada.mshk@gmail.com>
>> wrote:
>> > If we want only such a feature we need to implement it together (the
>> > patch could be split, though). But if some parts of the feature are
>> > useful for users as well, I'd recommend implementing it incrementally.
>> > That way, the patches can get small and it would be easy for reviewers
>> > and committers to review/commit them.
>> 
>> Jian, how do you think this comment?
>> 
>> Looking back at the discussion so far, it seems that not everyone 
>> thinks
>> saving table information is the best idea[1] and some people think 
>> just
>> skipping error data is useful.[2]
>> 
>> Since there are issues to be considered from the design such as
>> physical/logical replication treatment, putting error information to
>> table is likely to take time for consensus building and development.
>> 
>> Wouldn't it be better to follow the following advice and develop the
>> functionality incrementally?
>> 
>> On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
>> <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> > So I'm thinking we may be able to implement this
>> > feature incrementally. The first step would be something like an
>> > option to ignore all errors or an option to specify the maximum number
>> > of errors to tolerate before raising an ERROR. The second step would
>> > be to support logging destinations such as server logs and tables.
>> 
>> 
>> Attached a patch for this "first step" with reference to v7 patch, 
>> which
>> logged errors and simpler than latest one.
>> - This patch adds new option SAVE_ERROR_TO, but currently only 
>> supports
>> 'none', which means just skips error data. It is expected to support
>> 'log' and 'table'.
>> - This patch Skips just soft errors and don't handle other errors such
>> as missing column data.
> 
> Hi.
> I made the following change based on your patch
> (v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch)
> 
> * when specified SAVE_ERROR_TO, move the initialization of
> ErrorSaveContext to the function BeginCopyFrom.
> I think that's the right place to initialize struct CopyFromState 
> field.
> * I think your patch when there are N rows have malformed data, then it
> will initialize N ErrorSaveContext.
> In the struct CopyFromStateData, I changed it to ErrorSaveContext 
> *escontext.
> So if an error occurred, you can just set the escontext accordingly.
> * doc: mention "If this option is omitted, <command>COPY</command>
> stops operation at the first error."
> * Since we only support 'none' for now, 'none' means we don't want
> ErrorSaveContext metadata,
>  so we should set cstate->escontext->details_wanted to false.
> 
>> BTW I have question and comment about v15 patch:
>> 
>> > +           {
>> > +               /*
>> > +               *
>> > +               * InputFunctionCall is more faster than
>> > InputFunctionCallSafe.
>> > +               *
>> > +               */
>> 
>> Have you measured this?
>> When I tested it in an older patch, there were no big difference[3].
> Thanks for pointing it out, I probably was over thinking.
> 
>>   > -   SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P 
>> SECURITY
>> SELECT
>>   > +   SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH 
>> SECOND_P
>> SECURITY SELECT
>> 
>> There was a comment that we shouldn't add new keyword for this[4].
>> 
> Thanks for pointing it out.

Thanks for reviewing!

Updated the patch merging your suggestions except below points:

> +   cstate->num_errors = 0;

Since cstate is already initialized in below lines, this may be 
redundant.

|     /* Allocate workspace and zero all fields */
|     cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));


>  +                   Assert(!cstate->escontext->details_wanted);

I'm not sure this is necessary, considering we're going to add other 
options like 'table' and 'log', which need details_wanted soon.


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: initdb --no-locale=C doesn't work as specified when the environment is not C
Next
From: Amit Kapila
Date:
Subject: Re: Synchronizing slots from primary to standby