Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From torikoshia
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id 686e46d6718f9535bc39fac7cad7408b@oss.nttdata.com
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Damir Belyalov <dam.bel07@gmail.com>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
List pgsql-hackers
On 2022-08-25 01:54, Damir Belyalov wrote:
>>> +               /* Buffer was filled, commit subtransaction and
>> prepare to replay */
>>> +               ReleaseCurrentSubTransaction();
>> What is actually being committed by this
>> ReleaseCurrentSubTransaction()?
>> It seems to me that just safecstate->replay_buffer is fulfilled
>> before
>> this commit.
> 
> All tuples are collected in replay_buffer, which is created in
> ''oldcontext'' of CopyFrom() (not context of a subtransaction). That's
> why it didn't clean up when you used
> RollbackAndReleaseCurrentSubTransaction().
> Subtransactions are needed for better safety. There is no error when
> copying from a file to the replay_buffer, but an error may occur at
> the next stage - when adding a tuple to the table. Also there may be
> other errors that are not obvious at first glance.

Thanks for the explanation and updating patch.
I now understand that the data being COPYed are not the target of 
COMMIT.

Although in past discussions[1] it seems the data to be COPYed were also 
subject to COMMIT, but I understand this patch has adopted another 
design.

350 +               /* Buffer was filled, commit subtransaction and 
prepare to replay */
351 +               ReleaseCurrentSubTransaction();
352 +               CurrentResourceOwner = safecstate->oldowner;
353 +
354 +               safecstate->replay_is_active = true;

BTW in v4 patch, data are loaded into the buffer one by one, and when 
the buffer fills up, the data in the buffer are 'replayed' also one by 
one, right?
Wouldn't this have more overhead than a normal COPY?

As a test, I COPYed slightly larger data with and without ignore_errors 
option.
There might be other reasons, but I found a performance difference.

```
=# copy test from '/tmp/10000000.data' ;
COPY 10000000
Time: 6001.325 ms (00:06.001)

=# copy test from '/tmp/10000000.data' with (ignore_errors);
NOTICE:  FIND 0 ERRORS
COPY 10000000
Time: 7711.555 ms (00:07.712)
```

>> I feel it might be better to have it as a parameter rather than
>> fixed at
>> 1000.
> 
> Yes, I thought about it too. So I created another version with the GUC
> parameter - replay_buffer_size. Attached it. But I think users won't
> need to change replay_buffer_size.
> Also replay_buffer does the same thing as MultiInsert buffer does and
> MultiInsert buffer defined by const = 1000.

Yeah, when the data being COPYed are not the target of COMMIT,  I also 
think users won't neet to change it.
> 
>> NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in
>> copyfrom.c.
>> Since safeNextCopyFrom() is analog of NextCopyFrom() as commented,
>> would
>> it be natural to put them in the same file?
> 
> Sure, corrected it.

Thanks.

> 
>>> 188 +               safecstate->errors++;
>>> 189 +               if (safecstate->errors <= 100)
>>> 190 +                   ereport(WARNING,
>>> 191 +                           (errcode(errdata->sqlerrcode),
>>> 192 +                           errmsg("%s", errdata->context)));
>> 
>> It would be better to state in the manual that errors exceeding 100
>> are
>> not displayed.
>> Or, it might be acceptable to output all errors that exceed 100.
> 
> It'll be too complicated to create a new parameter just for showing
> the given number of errors. I think 100 is an optimal size.

Yeah, I may have misled you, but I also don't think this needs a new 
parameter.
I just thought 'either' of the following would be better:
- Put in the documentation that the warnings will not be output for more 
than 101 cases.
- Output all the warnings.

>> It would be helpful to add comments about skip_row, etc.
> 
> Corrected it.

Thanks.

> 
>> WARNING:  FIND 0 ERRORS
>> When there were no errors, this WARNING seems not necessary.
> 
> Corrected it.

Thanks.


I applied v4 patch and when canceled the COPY, there was a case I found 
myself left in a transaction.
Should this situation be prevented from occurring?

```
=# copy test from '/tmp/10000000.data' with (ignore_errors );

^CCancel request sent
ERROR:  canceling statement due to user request

=# truncate test;
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
```


[1] 
https://www.postgresql.org/message-id/1197677930.1536.18.camel%40dell.linuxdev.us.dell.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: SQL/JSON features for v15
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: SQL/JSON features for v15