Thread: Atomicity of INSERT INTO ... SELECT ... WHERE NOT EXISTS ...

Atomicity of INSERT INTO ... SELECT ... WHERE NOT EXISTS ...

From
Jim Garrison
Date:
Given (pseudocode)

    CREATE TABLE kvstore (
        k varchar primary key,
        v varchar);

    CREATE OR REPLACE FUNCTION store_key_value_pair(k varchar, v varchar) returns boolean as $$
    BEGIN
      INSERT INTO kvstore (k, v)
      SELECT :k, :v
      WHERE NOT EXISTS (select 1 from kvstore where k = :k);
      RETURN FOUND;
    END;
    $$ LANGUAGE plpgsql;

I have a few questions:

1) Does INSERT statement set FOUND based on whether or not the row was inserted?
2) If this is invoked without a transaction in progress, is there any guarantee of atomicity between checking the
EXISTSand attempting to insert the row?  If this is being executed in two (or more) sessions, can the SELECT succeed
butthen have the INSERT fail with a duplicate-key exception? 
3) Will the behavior be different if the invoking processes have a transaction in progress?



Re: Atomicity of INSERT INTO ... SELECT ... WHERE NOT EXISTS ...

From
David G Johnston
Date:
Jim Garrison wrote
> Given (pseudocode)
>
>     CREATE TABLE kvstore (
>         k varchar primary key,
>         v varchar);
>
>     CREATE OR REPLACE FUNCTION store_key_value_pair(k varchar, v varchar)
> returns boolean as $$
>     BEGIN
>       INSERT INTO kvstore (k, v)
>       SELECT :k, :v
>       WHERE NOT EXISTS (select 1 from kvstore where k = :k);
>       RETURN FOUND;
>     END;
>     $$ LANGUAGE plpgsql;
>
> I have a few questions:
>
> 1) Does INSERT statement set FOUND based on whether or not the row was
> inserted?
> 2) If this is invoked without a transaction in progress, is there any
> guarantee of atomicity between checking the EXISTS and attempting to
> insert the row?  If this is being executed in two (or more) sessions, can
> the SELECT succeed but then have the INSERT fail with a duplicate-key
> exception?
> 3) Will the behavior be different if the invoking processes have a
> transaction in progress?

1) The top-level query controls FOUND; so yes
2) Impossible - functions always execute in a transaction.  Actually,
everything executes in a transaction the choice is whether you want to
auto-commit.
3) see #2


Still not super fluent wrt concurrency reasoning but here it goes:

Since we are dealing with MVCC here, and the default READ COMMITTED
isolation level, the data that each session/statement would see would be
stable and not include any INSERTs concurrently performed by the other.
Thus if two sessions try to simultaneously insert the same (k,v) the one
that commits second will error.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Atomicity-of-INSERT-INTO-SELECT-WHERE-NOT-EXISTS-tp5816655p5816668.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Atomicity of INSERT INTO ... SELECT ... WHERE NOT EXISTS ...

From
Jerry Sievers
Date:
Jim Garrison <jim.garrison@nwea.org> writes:

> Given (pseudocode)
>
>     CREATE TABLE kvstore (
>         k varchar primary key,
>         v varchar);
>
>     CREATE OR REPLACE FUNCTION store_key_value_pair(k varchar, v varchar) returns boolean as $$
>     BEGIN
>       INSERT INTO kvstore (k, v)
>       SELECT :k, :v
>       WHERE NOT EXISTS (select 1 from kvstore where k = :k);
>       RETURN FOUND;
>     END;
>     $$ LANGUAGE plpgsql;
>
> I have a few questions:
>
> 1) Does INSERT statement set FOUND based on whether or not the row was inserted?

Yes unless triggers/rules are in volved..

> 2) If this is invoked without a transaction in progress, is there any
> guarantee of atomicity between checking the EXISTS and attempting to
> insert the row?  If this is being executed in two (or more) sessions,
> can the SELECT succeed but then have the INSERT fail with a
> duplicate-key exception?


You will either be at risk of a race condition or more likely  your
insert will try the dupe insert and block waiting for another session
that's already inserted same row to commit/abort.  And if other guy does
commit then you will raise a dupe key exception.

Your code is wise trying to avoid a dupe insert if the row is already
existing and visible but  you will still need to trap the dupe key
exception when you get one which will happen eventually   if there will
be other sessions trying this same insert.

> 3) Will the behavior be different if the invoking processes have a
> transaction in progress?

Every statement you run is a transaction of its own but you are far more
at risk of testing negative for an existing row, proceeding to try the
insert but then hanging because there is the same insert already
pending... if there are longer running complex transactions involved.

Suppose...

session A
begin;
insert into your table key=1
... do more work here...

meanwhile...
session B
test for row and I don't see it
try insert and hang here till commit/abort of session A

session A
commit

session B
Doh!!  dupe key error

HTH


--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: Atomicity of INSERT INTO ... SELECT ... WHERE NOT EXISTS ...

From
Craig Ringer
Date:
On 08/28/2014 06:22 AM, Jim Garrison wrote:
> Given (pseudocode)
>
>     CREATE TABLE kvstore (
>         k varchar primary key,
>         v varchar);
>
>     CREATE OR REPLACE FUNCTION store_key_value_pair(k varchar, v varchar) returns boolean as $$
>     BEGIN
>       INSERT INTO kvstore (k, v)
>       SELECT :k, :v
>       WHERE NOT EXISTS (select 1 from kvstore where k = :k);
>       RETURN FOUND;
>     END;
>     $$ LANGUAGE plpgsql;
>
> I have a few questions:
>
> 1) Does INSERT statement set FOUND based on whether or not the row was inserted?

> 2) If this is invoked without a transaction in progress, is there any guarantee of atomicity between checking the
EXISTSand attempting to insert the row?  If this is being executed in two (or more) sessions, can the SELECT succeed
butthen have the INSERT fail with a duplicate-key exception? 

This code can still fail with a unique violation, yes, as the select can
occur in both transactions then the insert in both.

> 3) Will the behavior be different if the invoking processes have a transaction in progress?

No, because all functions run in transactions. There is no such thing as
"not in a transaction" in PostgreSQL (except for a few special system
management commands).

If it's in a SERIALIZABLE transaction instead of the default READ
COMMITTED then it might fail with a serialization failure instead of a
unique violation, but it'll still fail.

Please read the detailed guidance on this problem that already exists:

http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/

http://stackoverflow.com/questions/1109061/insert-on-duplicate-update-in-postgresql



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services