Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) |
Date | |
Msg-id | dc876e926f284ee60eef4ece755fa419@oss.nttdata.com Whole thread Raw |
In response to | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) (Damir Belyalov <dam.bel07@gmail.com>) |
Responses |
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
|
List | pgsql-hackers |
On 2022-08-15 22:23, Damir Belyalov wrote: >> I expected I could see 9999 rows after COPY, but just saw 999 rows. > Also I implemented your case and it worked correctly. Thanks for the new patch! Here are some comments on it. > + if (safecstate->saved_tuples < REPLAY_BUFFER_SIZE) > + { > + valid_row = NextCopyFrom(cstate, econtext, values, > nulls); > + if (valid_row) > + { > + /* Fill replay_buffer in oldcontext*/ > + MemoryContextSwitchTo(safecstate->oldcontext); > + > safecstate->replay_buffer[safecstate->saved_tuples++] = > heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls); > > + /* Buffer was filled, commit subtransaction and prepare > to replay */ > + ReleaseCurrentSubTransaction(); What is actually being committed by this ReleaseCurrentSubTransaction()? It seems to me that just safecstate->replay_buffer is fulfilled before this commit. As a test, I rewrote this ReleaseCurrentSubTransaction() to RollbackAndReleaseCurrentSubTransaction() and COPYed over 1000 rows of data, but same data were loaded. > +#define REPLAY_BUFFER_SIZE 1000 I feel it might be better to have it as a parameter rather than fixed at 1000. > +/* > + * Analog of NextCopyFrom() but ignore rows with errors while copying. > + */ > +static bool > +safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum > *values, bool *nulls) NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in copyfrom.c. Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, would it be natural to put them in the same file? > 188 + safecstate->errors++; > 189 + if (safecstate->errors <= 100) > 190 + ereport(WARNING, > 191 + (errcode(errdata->sqlerrcode), > 192 + errmsg("%s", errdata->context))); It would be better to state in the manual that errors exceeding 100 are not displayed. Or, it might be acceptable to output all errors that exceed 100. > +typedef struct SafeCopyFromState > +{ > +#define REPLAY_BUFFER_SIZE 1000 > + HeapTuple replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates > tuples for replaying it after an error */ > + int saved_tuples; /* # of tuples in > replay_buffer */ > + int replayed_tuples; /* # of tuples was > replayed from buffer */ > + int errors; /* total # of errors */ > + bool replay_is_active; > + bool begin_subtransaction; > + bool processed_remaining_tuples; /* for case of > replaying last tuples */ > + bool skip_row; It would be helpful to add comments about skip_row, etc. ``` $ git apply ../patch/0003-COPY_IGNORE_ERRORS.patch ../patch/0003-COPY_IGNORE_ERRORS.patch:86: indent with spaces. Datum *values, bool *nulls); warning: 1 line adds whitespace errors. ``` There was a warning when applying the patch. ``` =# copy test from '/tmp/10000.data' with (ignore_errors); WARNING: FIND 0 ERRORS COPY 1003 ``` When there were no errors, this WARNING seems not necessary. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
pgsql-hackers by date: