Thread: atomic multi-threaded upsert

atomic multi-threaded upsert

From

"Mikhail V. Puzanov"

Date:

24 November 2010, 16:16:50

Hi, All

I'm trying to make a kind of upsert for the following table:

CREATE TABLE sequences  (   section VARCHAR( 50 ) NOT NULL,   name VARCHAR( 50 ) NOT NULL,   counter BIGINT NOT NULL
);

CREATE UNIQUE INDEX IDX_SEQUENCES_SN ON sequences(section, name);


Next, I execute the following two queries for that table:
  -- increment and get the counter if exists UPDATE sequences SET counter = counter + 1 WHERE section = 'testSection'
ANDname = 'testKey' RETURNING counter;
 
  -- insert new counter if not exists INSERT INTO sequences ( section, name, counter ) SELECT 'testSection', 'testKey',
0WHERE NOT EXISTS (     SELECT * FROM sequences     WHERE section = 'testSection' AND name = 'testKey' );
 


It works OK when executed in single thread.

When such pairs of queries are executed in parallel (3-10 threads,
1 JDBC connection for each thread, with autoCommit set),
occasionally I get the following error:

ERROR: duplicate key value violates unique constraint "idx_sequences_sn"

1) That looks weird actually, should the INSERT ... NOT EXISTS (...) be
executed   in atomic fashion? Also, AFAIK, UNIQUE INDEX locks rows before
insert (or not?).   Can it be related to  JDBC?   Or it's the result of MVCC conflict resolution?

2) Are there some patterns for such kind of task?


Thanks.

Re: atomic multi-threaded upsert

From

Tom Lane

Date:

24 November 2010, 17:21:19

"Mikhail V. Puzanov" <misha.puzanov@gmail.com> writes:
>    -- increment and get the counter if exists
>   UPDATE sequences SET counter = counter + 1
>   WHERE section = 'testSection' AND name = 'testKey'
>   RETURNING counter;

Seems that what you have here is a bad manual implementation of a
sequence.  Why don't you use a real sequence object and nextval()?
        regards, tom lane

Re: atomic multi-threaded upsert

From

Date:

25 November 2010, 02:32:34

On 2010-11-24, "Mikhail V. Puzanov" <misha.puzanov@gmail.com> wrote:
> Next, I execute the following two queries for that table:
>
>    -- increment and get the counter if exists
>   UPDATE sequences SET counter = counter + 1
>   WHERE section = 'testSection' AND name = 'testKey'
>   RETURNING counter;
>
>    -- insert new counter if not exists
>   INSERT INTO sequences ( section, name, counter )
>   SELECT 'testSection', 'testKey', 0
>   WHERE NOT EXISTS (
>       SELECT * FROM sequences
>       WHERE section = 'testSection' AND name = 'testKey'
>   );

> ERROR: duplicate key value violates unique constraint "idx_sequences_sn"
>
> 1) That looks weird actually, should the INSERT ... NOT EXISTS (...) be
> executed
>     in atomic fashion? Also, AFAIK, UNIQUE INDEX locks rows before
> insert (or not?).
>     Can it be related to  JDBC?
>     Or it's the result of MVCC conflict resolution?

Perhaps -

Thread A UPDATEs, affecting no row.
Thread B UPDATEs, affecting no row.
Thread A INSERTs one row.  Autocommit is on, so it commits the INSERT.
Thread B attempts INSERT and fails on the duplicate.

-- 
KM

Re: atomic multi-threaded upsert

From

MIkhail Puzanov

Date:

25 November 2010, 02:51:47

2010/11/25 Tom Lane <tgl@sss.pgh.pa.us>

"Mikhail V. Puzanov" <misha.puzanov@gmail.com> writes:
> -- increment and get the counter if exists
> UPDATE sequences SET counter = counter + 1
> WHERE section = 'testSection' AND name = 'testKey'
> RETURNING counter;

Seems that what you have here is a bad manual implementation of a
sequence. Why don't you use a real sequence object and nextval()?

Well, if I use the real sequence, I'll need unpredictable number of them
and will have to store mapping (section, name)->sequenceName, and
also will have to create those sequences dynamically the similar way.

Re: atomic multi-threaded upsert

From

MIkhail Puzanov

Date:

25 November 2010, 03:24:31

2010/11/25 KM <km@xacrasis.netx>

On 2010-11-24, "Mikhail V. Puzanov" <misha.puzanov@gmail.com> wrote:
> Next, I execute the following two queries for that table:
>
> -- increment and get the counter if exists
> UPDATE sequences SET counter = counter + 1
> WHERE section = 'testSection' AND name = 'testKey'
> RETURNING counter;
>
> -- insert new counter if not exists
> INSERT INTO sequences ( section, name, counter )
> SELECT 'testSection', 'testKey', 0
> WHERE NOT EXISTS (
> SELECT * FROM sequences
> WHERE section = 'testSection' AND name = 'testKey'
> );

> ERROR: duplicate key value violates unique constraint "idx_sequences_sn"
>
> 1) That looks weird actually, should the INSERT ... NOT EXISTS (...) be
> executed
> in atomic fashion? Also, AFAIK, UNIQUE INDEX locks rows before
> insert (or not?).
> Can it be related to JDBC?
> Or it's the result of MVCC conflict resolution?

Perhaps -

Thread A UPDATEs, affecting no row.
Thread B UPDATEs, affecting no row.
Thread A INSERTs one row. Autocommit is on, so it commits the INSERT.
Thread B attempts INSERT and fails on the duplicate.

Yeah, but my expectation was that only one INSERT occurs due
to WHERE NOT EXISTS clause.

Seems, the task generally needs using pessimistic locks.

Re: atomic multi-threaded upsert

From

"Mikhail V. Puzanov"

Date:

27 November 2010, 11:08:20

On 25.11.2010 12:51, MIkhail Puzanov wrote:

2010/11/25 Tom Lane <tgl@sss.pgh.pa.us>
"Mikhail V. Puzanov" <misha.puzanov@gmail.com> writes:
> -- increment and get the counter if exists
> UPDATE sequences SET counter = counter + 1
> WHERE section = 'testSection' AND name = 'testKey'
> RETURNING counter;

Seems that what you have here is a bad manual implementation of a
sequence. Why don't you use a real sequence object and nextval()?

Well, if I use the real sequence, I'll need unpredictable number of them
and will have to store mapping (section, name)->sequenceName, and
also will have to create those sequences dynamically the similar way.

Finally the problem solved with application level locks (that works for only
one application server, but suits our needs for now).

This also works http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE but we need some more or less cross-DBMS approach.