Re: Conflict handling for COPY FROM - Mailing list pgsql-hackers

From Surafel Temesgen
Subject Re: Conflict handling for COPY FROM
Date
Msg-id CALAY4q_Yjeh6=TuroMsxTb6rm2iTxycCQmEqkD6aohX7y2_bTw@mail.gmail.com
Whole thread Raw
In response to Re: Conflict handling for COPY FROM  (Alexey Kondratov <a.kondratov@postgrespro.ru>)
Responses Re: Conflict handling for COPY FROM  (Alexey Kondratov <a.kondratov@postgrespro.ru>)
List pgsql-hackers


On Fri, Nov 15, 2019 at 6:24 PM Alexey Kondratov <a.kondratov@postgrespro.ru> wrote:
On 11.11.2019 16:00, Surafel Temesgen wrote:
>
>
>     Next, you use DestRemoteSimple for returning conflicting tuples back:
>
>     +        dest = CreateDestReceiver(DestRemoteSimple);
>     +        dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
>
>     However, printsimple supports very limited subset of built-in
>     types, so
>
>     CREATE TABLE large_test (id integer primary key, num1 bigint, num2
>     double precision);
>     COPY large_test FROM '/path/to/copy-test.tsv';
>     COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;
>
>     fails with following error 'ERROR:  unsupported type OID: 701', which
>     seems to be very confusing from the end user perspective. I've
>     tried to
>     switch to DestRemote, but couldn't figure it out quickly.
>
>
> fixed

Thanks, now it works with my tests.

1) Maybe it is fine, but now I do not like this part:

+    portal = GetPortalByName("");
+    dest = CreateDestReceiver(DestRemote);
+    SetRemoteDestReceiverParams(dest, portal);
+    dest->rStartup(dest, (int) CMD_SELECT, tupDesc);

Here you implicitly use the fact that portal with a blank name is always
created in exec_simple_query before we get to this point. Next, you
create new DestReceiver and set it to this portal, but it is also
already created and set in the exec_simple_query.

Would it be better if you just explicitly pass ready DestReceiver to
DoCopy (similarly to how it is done for T_ExecuteStmt / ExecuteQuery),


Good idea .Thank you
 

2) My second concern is that you use three internal flags to track
errors limit:

+    int            error_limit;    /* total number of error to ignore */
+    bool        ignore_error;    /* is ignore error specified? */
+    bool        ignore_all_error;    /* is error_limit -1 (ignore all
error)
+                                     * specified? */

Though it seems that we can just leave error_limit as a user-defined
constant and track errors with something like errors_count. In that case
you do not need auxiliary ignore_all_error flag. But probably it is a
matter of personal choice.

 
using bool flags will save as from using integer type as a boolean and hold the fact
error limit was specified even if it became zero and it seems to me it is straightforward
to treat ignore_all_error separately.
Attache is the patch that use already created DestReceiver

regards
Surafel  
Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: cost based vacuum (parallel)
Next
From: Jinbao Chen
Date:
Subject: Planner chose a much slower plan in hashjoin, using a large table asthe inner table.