Re: Copy Bulk Ignore Duplicated - Mailing list pgsql-general

From Leandro Guimarães
Subject Re: Copy Bulk Ignore Duplicated
Date
Msg-id CAJV35FMvQQ65Pdb11WoU5wPSgGG+=+8_os-CyO6HwPS1Pe5h5A@mail.gmail.com
Whole thread Raw
In response to Re: Copy Bulk Ignore Duplicated  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Hi Adrian,
  I'll take a look about pg_bulkload, but I populate the database via a Java application with JDBC.

  I'll try the query you kindly sent to me!

Thanks!
Leandro Guimarães



On Fri, Jun 14, 2019 at 6:59 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 6/14/19 2:04 PM, Leandro Guimarães wrote:
> Hi,
>     I have a scenario with a large table and I'm trying to insert it via
> a COPY command with a csv file.
>
>     Everything works, but sometimes my source .csv file has duplicated
> data in the previously fulfilled table. If I add a check constraint and
> try to run the COPY command I have an error that stops the whole insertion.
>
>    I've tried to put the data in a tmp table and fill the main using
> distinct this way (the fields and names are just examples):
>
> INSERT INTO final_table values (name, document)
>     SELECT DISTINCT name, document
>     FROM tmp_TABLE t1
>     WHERE NOT EXISTS (
>     SELECT 1 FROM final_table t2
>     WHERE (t2.name <http://t2.name>, t2.document)
>     IS NOT DISTINCT FROM (t1.name <http://t1.name>, t1.document))
>
> The problem is that my final_table is a large (and partitioned) table
> and this query is taking a long time to execute.
>
> Someone have any idea (really guys anything would be great) how to solve
> this situation? I need to ignore duplicates instead to have some error.

pg_bulkload?:

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html
"PARSE_ERRORS = n
     The number of ingored tuples that throw errors during parsing,
encoding checks, encoding conversion, FILTER function, CHECK constraint
checks, NOT NULL checks, or data type conversion. Invalid input tuples
are not loaded and recorded in the PARSE BADFILE. The default is 0. If
there are equal or more parse errors than the value, already loaded data
is committed and the remaining tuples are not loaded. 0 means to allow
no errors, and -1 and INFINITE mean to ignore all errors. "

>
> I'm using*PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is
> not an option.
>
> Thanks and Kind Regards!
>
>
> Leandro Guimarães
>


--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

Previous
From: Leandro Guimarães
Date:
Subject: Re: Copy Bulk Ignore Duplicated
Next
From: Jean Louis
Date:
Subject: how to concat/concat_ws all fields without braces