Re: Conflict handling for COPY FROM - Mailing list pgsql-hackers

From Anthony Nowocien
Subject Re: Conflict handling for COPY FROM
Date
Msg-id CAH5RRoNctxFUupW8EpxCu+_nz6vda1xFZqLv-f-8MK+YQmkZ9Q@mail.gmail.com
Whole thread Raw
In response to Re: Conflict handling for COPY FROM  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers

Hi,


sorry for answering a bit later than I hoped. Here is my review so far:

 

Contents

======

 

This patch starts to address in my opinion one of COPY's shortcoming, which is error handling.  PK and exclusion errors are taken care of, but not (yet?) other types of errors.

Documentation is updated, "\h copy" also and some regression tests are added.

 

Initial Run

=======

 

Patch applies (i've tested v6) cleanly.

make: OK

make install: OK

make check: OK

make installcheck: OK

 

Performance

========


I've tested the patch on a 1.1G file with 10 000 000 lines. Each test was done 15 times on a small local VM. Table is without constraints.

head: 38,93s

head + patch: 38,76s


Another test was one a 0.1GB file with 1 000 000 lines. Each test done 10 times on a small local VM and the table has a pk.                  

COPY                                                                4,550s

COPY CONFLICT                                               4,595s

COPY CONFLICT with only one pk error         10,529s

COPY CONFLICT pk error every 100 lines      10,859s

COPY CONFLICT pk error every 1000 lines    10,879s


I did not test exclusions so far.

 

Thoughts

======

 

I find the feature useful in itself. One big question for me is can it be improved later on to handle other types of errors (like check constraints for example) ? A "-1" for the error limit would be very useful in my opinion.

I am also afraid that the name "error_limit" might mislead users into thinking that all error types are handled. But I do not have a better suggestion without making this clause much longer...


I've had a short look at the code, but this will need review by someone else.

Anyway, thanks a lot for taking the time to work on it.

 

Anthony


On Sun, Jul 14, 2019 at 3:48 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen <surafel3000@gmail.com> wrote:
> Here are the patch that contain all the comment given except adding a way to specify
> to ignoring all error because specifying a highest number can do the work and may be
> try to store such badly structure data is a bad idea

Hi Surafel,

FYI GCC warns:

copy.c: In function ‘CopyFrom’:
copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
        (void) dest->receiveSlot(myslot, dest);
        ^
copy.c:2702:16: note: ‘dest’ was declared here
  DestReceiver *dest;
                ^

--
Thomas Munro
https://enterprisedb.com


pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Fix typos and inconsistencies for HEAD (take 6)
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench - implement strict TPC-B benchmark