Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From torikoshia
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id dc876e926f284ee60eef4ece755fa419@oss.nttdata.com
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Damir Belyalov <dam.bel07@gmail.com>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
List pgsql-hackers
On 2022-08-15 22:23, Damir Belyalov wrote:
>> I expected I could see 9999 rows after COPY, but just saw 999 rows.
> Also I implemented your case and it worked correctly.

Thanks for the new patch!

Here are some comments on it.

> +           if (safecstate->saved_tuples < REPLAY_BUFFER_SIZE)
> +           {
> +               valid_row = NextCopyFrom(cstate, econtext, values, 
> nulls);
> +               if (valid_row)
> +               {
> +                   /* Fill replay_buffer in oldcontext*/
> +                   MemoryContextSwitchTo(safecstate->oldcontext);
> +                   
> safecstate->replay_buffer[safecstate->saved_tuples++] = 
> heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls);
> 
> +               /* Buffer was filled, commit subtransaction and prepare 
> to replay */
> +               ReleaseCurrentSubTransaction();

What is actually being committed by this ReleaseCurrentSubTransaction()?
It seems to me that just safecstate->replay_buffer is fulfilled before 
this commit.
As a test, I rewrote this ReleaseCurrentSubTransaction() to 
RollbackAndReleaseCurrentSubTransaction() and COPYed over 1000 rows of 
data, but same data were loaded.

> +#define            REPLAY_BUFFER_SIZE 1000

I feel it might be better to have it as a parameter rather than fixed at 
1000.

> +/*
> + * Analog of NextCopyFrom() but ignore rows with errors while copying.
> + */
> +static bool
> +safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum 
> *values, bool *nulls)

NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in 
copyfrom.c.
Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, would 
it be natural to put them in the same file?

> 188 +               safecstate->errors++;
> 189 +               if (safecstate->errors <= 100)
> 190 +                   ereport(WARNING,
> 191 +                           (errcode(errdata->sqlerrcode),
> 192 +                           errmsg("%s", errdata->context)));

It would be better to state in the manual that errors exceeding 100 are 
not displayed.
Or, it might be acceptable to output all errors that exceed 100.

> +typedef struct SafeCopyFromState
> +{
> +#define            REPLAY_BUFFER_SIZE 1000
> +   HeapTuple       replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates 
> tuples for replaying it after an error */
> +   int             saved_tuples;               /* # of tuples in 
> replay_buffer */
> +   int             replayed_tuples;            /* # of tuples was 
> replayed from buffer */
> +   int             errors;                     /* total # of errors */
> +   bool            replay_is_active;
> +   bool            begin_subtransaction;
> +   bool            processed_remaining_tuples; /* for case of 
> replaying last tuples */
> +   bool            skip_row;

It would be helpful to add comments about skip_row, etc.

```
$ git apply ../patch/0003-COPY_IGNORE_ERRORS.patch

../patch/0003-COPY_IGNORE_ERRORS.patch:86: indent with spaces.
                             Datum *values, bool *nulls);
warning: 1 line adds whitespace errors.
```

There was a warning when applying the patch.


```
=# copy test from '/tmp/10000.data' with (ignore_errors);
WARNING:  FIND 0 ERRORS
COPY 1003
```

When there were no errors, this WARNING seems not necessary.


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: [PATCH] ALTER TABLE ... SET STORAGE default
Next
From: Aleksander Alekseev
Date:
Subject: [PATCH] Tab completion for SET COMPRESSION