Re: Best strategy for bulk inserts where some violate unique constraint? - Mailing list psycopg

From Daniele Varrazzo
Subject Re: Best strategy for bulk inserts where some violate unique constraint?
Date
Msg-id CA+mi_8bqAV_zK3kZr=yYpCwjkgJ8ie5ABXZtbwQv07G7L02paw@mail.gmail.com
Whole thread Raw
In response to Best strategy for bulk inserts where some violate unique constraint?  (Denis Papathanasiou <denis.papathanasiou@gmail.com>)
Responses Re: Best strategy for bulk inserts where some violate unique constraint?
Re: Best strategy for bulk inserts where some violate unique constraint?
List psycopg
On Tue, Nov 5, 2013 at 8:14 PM, Denis Papathanasiou
<denis.papathanasiou@gmail.com> wrote:
> I'm using Pyscopg2 to insert multiple rows at once, using a dictionary as
> shown at the bottom of this page:
> https://wiki.postgresql.org/wiki/Psycopg2_Tutorial
>
> I notice that even if just one entry violates a table unique constraint (an
> IntegrityError in psycopg2 parlance), the entire contents of the dictionary
> do not get inserted into the table.
>
> Is there any way to tell postgres: it's ok to reject individual entries that
> may violate table unique constraints, but at the same time allow those which
> do not?
>
> It's too expensive for me to do the inserts one at a time where I could just
> catch the psycopg2.IntegrityError.

As the last example do you mean the executemany() example? That is not
going to be much faster than repeated execute().

The easiest thing you can do is to switch to autocommit=True and do
repeated execute with insert. If one fail you can just ignore the
IntegrityError and go on.

About as easy you can prepare a statement and execute it repeatedly
using PREPARE/EXECUTE: see
<http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is
no builtin support for that in psycopg but you can just execute()
these statements. It may save you something. You can also take a look
at this example of a PREPAREing cursor:
<https://gist.github.com/dvarrazzo/3797445>.

However, the fastest way to insert data into Postgres is COPY, see
<http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will
have to present your data as a file. I can't remember what happens
when a record fails the integrity test: I think the other would still
be inserted but you will have to check. A much more robust strategy is
to create a temporary table with the right schema but without
constraints, load the data there using COPY and then move the data to
the final table using INSERT INTO ... SELECT * FROM temp_table WHERE
... and specify a condition to avoid the records that would fail the
constraint.

I would go for COPY, it's by far faster than execute[many], even
including prepare.

-- Daniele


psycopg by date:

Previous
From: Denis Papathanasiou
Date:
Subject: Best strategy for bulk inserts where some violate unique constraint?
Next
From: Adrian Klaver
Date:
Subject: Re: Best strategy for bulk inserts where some violate unique constraint?