Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) |
Date | |
Msg-id | 686e46d6718f9535bc39fac7cad7408b@oss.nttdata.com Whole thread Raw |
In response to | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) (Damir Belyalov <dam.bel07@gmail.com>) |
Responses |
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
|
List | pgsql-hackers |
On 2022-08-25 01:54, Damir Belyalov wrote: >>> + /* Buffer was filled, commit subtransaction and >> prepare to replay */ >>> + ReleaseCurrentSubTransaction(); >> What is actually being committed by this >> ReleaseCurrentSubTransaction()? >> It seems to me that just safecstate->replay_buffer is fulfilled >> before >> this commit. > > All tuples are collected in replay_buffer, which is created in > ''oldcontext'' of CopyFrom() (not context of a subtransaction). That's > why it didn't clean up when you used > RollbackAndReleaseCurrentSubTransaction(). > Subtransactions are needed for better safety. There is no error when > copying from a file to the replay_buffer, but an error may occur at > the next stage - when adding a tuple to the table. Also there may be > other errors that are not obvious at first glance. Thanks for the explanation and updating patch. I now understand that the data being COPYed are not the target of COMMIT. Although in past discussions[1] it seems the data to be COPYed were also subject to COMMIT, but I understand this patch has adopted another design. 350 + /* Buffer was filled, commit subtransaction and prepare to replay */ 351 + ReleaseCurrentSubTransaction(); 352 + CurrentResourceOwner = safecstate->oldowner; 353 + 354 + safecstate->replay_is_active = true; BTW in v4 patch, data are loaded into the buffer one by one, and when the buffer fills up, the data in the buffer are 'replayed' also one by one, right? Wouldn't this have more overhead than a normal COPY? As a test, I COPYed slightly larger data with and without ignore_errors option. There might be other reasons, but I found a performance difference. ``` =# copy test from '/tmp/10000000.data' ; COPY 10000000 Time: 6001.325 ms (00:06.001) =# copy test from '/tmp/10000000.data' with (ignore_errors); NOTICE: FIND 0 ERRORS COPY 10000000 Time: 7711.555 ms (00:07.712) ``` >> I feel it might be better to have it as a parameter rather than >> fixed at >> 1000. > > Yes, I thought about it too. So I created another version with the GUC > parameter - replay_buffer_size. Attached it. But I think users won't > need to change replay_buffer_size. > Also replay_buffer does the same thing as MultiInsert buffer does and > MultiInsert buffer defined by const = 1000. Yeah, when the data being COPYed are not the target of COMMIT, I also think users won't neet to change it. > >> NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in >> copyfrom.c. >> Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, >> would >> it be natural to put them in the same file? > > Sure, corrected it. Thanks. > >>> 188 + safecstate->errors++; >>> 189 + if (safecstate->errors <= 100) >>> 190 + ereport(WARNING, >>> 191 + (errcode(errdata->sqlerrcode), >>> 192 + errmsg("%s", errdata->context))); >> >> It would be better to state in the manual that errors exceeding 100 >> are >> not displayed. >> Or, it might be acceptable to output all errors that exceed 100. > > It'll be too complicated to create a new parameter just for showing > the given number of errors. I think 100 is an optimal size. Yeah, I may have misled you, but I also don't think this needs a new parameter. I just thought 'either' of the following would be better: - Put in the documentation that the warnings will not be output for more than 101 cases. - Output all the warnings. >> It would be helpful to add comments about skip_row, etc. > > Corrected it. Thanks. > >> WARNING: FIND 0 ERRORS >> When there were no errors, this WARNING seems not necessary. > > Corrected it. Thanks. I applied v4 patch and when canceled the COPY, there was a case I found myself left in a transaction. Should this situation be prevented from occurring? ``` =# copy test from '/tmp/10000000.data' with (ignore_errors ); ^CCancel request sent ERROR: canceling statement due to user request =# truncate test; ERROR: current transaction is aborted, commands ignored until end of transaction block ``` [1] https://www.postgresql.org/message-id/1197677930.1536.18.camel%40dell.linuxdev.us.dell.com -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
pgsql-hackers by date: