How should I deal with disconnects during insert? - Mailing list pgsql-general

From Sergey Samokhin
Subject How should I deal with disconnects during insert?
Date
Msg-id e42595410905201341k4ab2a4d2nd19d445775b75dc4@mail.gmail.com
Whole thread Raw
Responses Re: How should I deal with disconnects during insert?  (David Wilson <david.t.wilson@gmail.com>)
Re: How should I deal with disconnects during insert?  (Sam Mason <sam@samason.me.uk>)
Re: How should I deal with disconnects during insert?  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-general
Hello!

Problem I'm talking about in this letter is related to how we deal
with extremely big amounts of data to be inserted into DB. Recently I
switched to using buffers.

The best approach to make a buffer I know (and the only I can use with
my driver) is to create a long query (possibly made of some
statements) like:

"INSERT INTO test VALUES (1, 2), (3, 4) ... (N, M); INSERT INTO test
VALUES (7, 8), (9, 10) ... (N2, M2);"

...and then pass it to query() function your driver has (no matter
what driver you use, it should be quite fast).

But what will happen if the connection to PostgreSQL is lost during
the execution?

There are two different things which may happen with such a buffer:

1) I've found that once PostgreSQL has started writing, connection can
safely be lost without affecting the data passed. In this case all the
data will be inserted, because PostgreSQL has the whole query to
process.

2) Data can't be inserted in the case the connection is lost before
PostgreSQL has started such a writing (i.e. during transmission of the
query).

Unfortunatelly, my driver can't say me at which stage the connection
has been lost, so I don't know should I insert it again or not after
reconnect.

Way #1. One of the ways to solve this problem is to guarantee that
query I execute fails completely when connection is lost no matter at
which stage. In this case I can safely insert a buffer after reconnect
without any checks. Is there a way to guarantee that?

Way #2. Another way I see is to have additional table containing
values that show which buffers were inserted successfully. For
example, the last statement in my buffer could be:

INSERT INTO insert_history VALUES (<BUFFER_ID>, <SUCCESSFULLY_INSERTED_FLAG>)

After reconnect, I can check if buffer I'm dealing with has been
inserted or not.

Which way do you prefer and why?

Is there a way to do #1 (it's a bit cleaner and simpler to implement)?

Thanks.

--
Sergey Samokhin

pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: HOT question - insert/delete
Next
From: David Wilson
Date:
Subject: Re: How should I deal with disconnects during insert?