Re: Best strategy for bulk inserts where some violate unique constraint? - Mailing list psycopg

From Denis Papathanasiou
Subject Re: Best strategy for bulk inserts where some violate unique constraint?
Date
Msg-id CAEpnaGwf4DazeGy_tGTPZg7w9wixJvufDM0HJNkBLZ-zB2PHmw@mail.gmail.com
Whole thread Raw
In response to Re: Best strategy for bulk inserts where some violate unique constraint?  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Responses Re: Best strategy for bulk inserts where some violate unique constraint?
List psycopg
On Tue, Nov 5, 2013 at 6:31 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
As the last example do you mean the executemany() example?

Yes, I was referring to this code example:

> If you had the following:
> namedict = ({"first_name":"Joshua", "last_name":"Drake"},
>             {"first_name":"Steven", "last_name":"Foo"},
>             {"first_name":"David", "last_name":"Bar"})
> You could easily insert all three rows within the dictionary by using:
> cur = conn.cursor()
> cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
> The cur.executemany statement will automatically iterate through the dictionary and execute the INSERT query for each row.

What I wound up doing in my own code is this:

    # process each {columns=values} dict
    try:
        cur.executemany("INSERT INTO ...", insert_dict)
    except psycopg2.IntegrityError:
        conn.commit()

Hoping that by catching the unique constraint error with a commit, it would at least preserve the inserts from the dict that do not violate the table constraints, but that is not how postgres works.
 
That is not
going to be much faster than repeated execute().

Really?
 
The easiest thing you can do is to switch to autocommit=True and do
repeated execute with insert. If one fail you can just ignore the
IntegrityError and go on.

Ok, I'll try that, thanks.  

About as easy you can prepare a statement and execute it repeatedly
using PREPARE/EXECUTE: see
<http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is
no builtin support for that in psycopg but you can just execute()
these statements. It may save you something. You can also take a look
at this example of a PREPAREing cursor:
<https://gist.github.com/dvarrazzo/3797445>.

Thanks for the example.
 
However, the fastest way to insert data into Postgres is COPY, see
<http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will
have to present your data as a file. I can't remember what happens
when a record fails the integrity test: I think the other would still
be inserted but you will have to check. A much more robust strategy is
to create a temporary table with the right schema but without
constraints, load the data there using COPY and then move the data to
the final table using INSERT INTO ... SELECT * FROM temp_table WHERE
... and specify a condition to avoid the records that would fail the
constraint.

I would go for COPY, it's by far faster than execute[many], even
including prepare.

I saw Adrian's reply about how this still won't do what I need in terms of ignoring the constraint violators and keeping the valid entries; will switching autocommit to True first have the desired effect? 

psycopg by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Best strategy for bulk inserts where some violate unique constraint?
Next
From: Denis Papathanasiou
Date:
Subject: Re: Best strategy for bulk inserts where some violate unique constraint?