Re: Conflict handling for COPY FROM - Mailing list pgsql-hackers

From Alexey Kondratov
Subject Re: Conflict handling for COPY FROM
Date
Msg-id f55afd63-8d64-e8df-0d54-13c116b8d23e@postgrespro.ru
Whole thread Raw
In response to Re: Conflict handling for COPY FROM  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Conflict handling for COPY FROM  (Surafel Temesgen <surafel3000@gmail.com>)
Re: Conflict handling for COPY FROM  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers
On 28.06.2019 16:12, Alvaro Herrera wrote:
>> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:
>>> Or even just return it as a row. CopyBoth is relatively widely supported
>>> these days.
>> i think generating warning about it also sufficiently meet its propose of
>> notifying user about skipped record with existing logging facility
>> and we use it for similar propose in other place too. The different
>> i see is the number of warning that can be generated
> Warnings seem useless for this purpose.  I'm with Andres: returning rows
> would make this a fine feature.  If the user wants the rows in a table
> as Andrew suggests, she can use wrap the whole thing in an insert.

I agree with previous commentators that returning rows will make this 
feature more versatile. Though, having a possibility to simply skip 
conflicting/malformed rows is worth of doing from my perspective. 
However, pushing every single skipped row to the client as a separated 
WARNING will be too much for a bulk import. So maybe just overall stats 
about skipped rows number will be enough?

Also, I would prefer having an option to ignore all errors, e.g. with 
option ERROR_LIMIT set to -1. Because it is rather difficult to estimate 
a number of future errors if you are playing with some badly structured 
data, while always setting it to 100500k looks ugly.

Anyway, below are some issues with existing code after a brief review of 
the patch:

1) Calculation of processed rows isn't correct (I've checked). You do it 
in two places, and

-            processed++;
+            if (!cstate->error_limit)
+                processed++;

is never incremented if ERROR_LIMIT is specified and no errors 
occurred/no constraints exist, so the result will always be 0. However, 
if primary column with constraints exists, then processed is calculated 
correctly, since another code path is used:

+                        if (specConflict)
+                        {
+                            ...
+                        }
+                        else
+                            processed++;

I would prefer this calculation in a single place (as it was before 
patch) for simplicity and in order to avoid such problems.


2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT 
is specified and was exceeded, which doesn't seem to be correct, does it?

-                        if (resultRelInfo->ri_NumIndices > 0)
+                        if (resultRelInfo->ri_NumIndices > 0 && 
cstate->error_limit == 0)
                              recheckIndexes = ExecInsertIndexTuples(myslot,


3) Trailing whitespaces added to error messages and tests for some reason:

+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("skipping \"%s\" --- missing data 
for column \"%s\" ",

+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
  CONTEXT:  COPY x, line 1: "2000    230    23    23"

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
  CONTEXT:  COPY x, line 1: "2001    231    \N    \N"


Otherwise, the patch applies/compiles cleanly and regression tests are 
passed.


Regards

-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Next
From: David Fetter
Date:
Subject: Re: New EXPLAIN option: ALL