Thread: functions, transactions, key violations

functions, transactions, key violations

From
Michael Glaesemann
Date:
We have a function which has been throwing a key violation. We're
tracking the purchase history for a limited number of unique items,
and flagging the most recent purchase with purchase_status = 1.

Stripped down, here's the schema and the function:

CREATE TABLE purchases (
   purchase_id SERIAL PRIMARY KEY,
   item_id INT NOT NULL,
   purchased_by int NOT NULL,
   purchase_price INT NOT NULL,
   purchased_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT
CURRENT_TIMESTAMP,
   purchase_status INT NOT NULL DEFAULT 1
);

CREATE UNIQUE INDEX
purchases_purchase_id_where_purchase_status_eq_1_key on purchases
(purchase_id) WHERE purchase_status = 1;

So, when a purchase is made for a given item, we also need to update
the purchase_status of the previous purchase.

CREATE OR REPLACE FUNCTION
purchase(IN in_item_id integer,
         IN in_purchased_by bigint,
         IN in_purchase_price integer)
  RETURNS VOID AS
$BODY$
BEGIN
  -- some selects
  UPDATE purchases
    SET purchase_status = 0
    WHERE item_id = in_item_id
          AND purchase_status = 1;
  INSERT INTO purchases (item_id, purchased_by, purchase_price)
     VALUES (in_item_id, in_purchased_by, in_purchase_price);
  -- some more manipulation
END
$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

We're getting errors that the INSERT INTO purchases is within the
purchase function is violating the
purchases_purchase_id_where_purchase_status_eq_1_key constraint. How
can this be, if the function, called as a single statement (i.e., not
within an explicit transaction) is its own transaction?

The Postgres documentation shows another example, which leads me to
believe I'm missing something. If a function does occur within a
transaction, I don't understand why the exception block is necessary
in Example 38-1. Exceptions with UPDATE/INSERT [1]. Given the table
from the example:

CREATE TABLE db (a INT PRIMARY KEY, b TEXT);

If I issue the following transaction,

BEGIN;
UPDATE db SET b = data WHERE a = key;
INSERT INTO db(a,b) VALUES (key, data);
COMMIT;

and the UPDATE didn't affect any rows, I'd expect the transaction to
be successful.

What am I missing?

Thanks for your help.

Michael Glaesemann
grzm seespotcode net


[1](http://www.postgresql.org/docs/8.3/interactive/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
)

Re: functions, transactions, key violations

From
Tom Lane
Date:
Michael Glaesemann <grzm@seespotcode.net> writes:
> We have a function which has been throwing a key violation. We're
> tracking the purchase history for a limited number of unique items,
> and flagging the most recent purchase with purchase_status = 1.

Um, the function is ensuring that there is no conflicting row with
the same item_id, but the unique constraint is on purchase_id.
Plus, since purchase_id is declared PRIMARY KEY, the additional
unique constraint is surely a no-op.

I think you've anonymized the example into nonsense :-(.

            regards, tom lane

Re: functions, transactions, key violations

From
Michael Glaesemann
Date:
On Jun 4, 2008, at 5:39 PM, Tom Lane wrote:

> I think you've anonymized the example into nonsense :-(.

Now that I've provided yet another example of the perils of not
providing the exact code, I've tried to patch it (below, for those of
you willing to give me a second chance).

However, my point regarding the example in the docs still holds. Why
is the exception block necessary? Doesn't wrapping the statements in a
function ensure the unique_violation couldn't occur?
Again, this is from
<http://www.postgresql.org/docs/8.3/interactive/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
 >

CREATE TABLE db (a INT PRIMARY KEY, b TEXT);
CREATE FUNCTION merge_db(key INT, data TEXT)
RETURNS VOID AS $$
BEGIN
   LOOP -- first try to update the key
     UPDATE db SET b = data WHERE a = key;
     IF found THEN
        RETURN;
     END IF;
   -- not there, so try to insert the key
   -- if someone else inserts the same key concurrently,
   -- we could get a unique-key failure
     BEGIN
       INSERT INTO db(a,b) VALUES (key, data);
       RETURN;
     EXCEPTION WHEN unique_violation THEN
       -- do nothing, and loop to try the UPDATE again
     END;
   END LOOP;
END
$$ LANGUAGE plpgsql;
Michael Glaesemann
grzm seespotcode net


CREATE TABLE purchases (
  purchase_id SERIAL PRIMARY KEY,
  item_id INT NOT NULL,
  purchased_by int NOT NULL,
  purchase_price INT NOT NULL,
  purchased_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT
CURRENT_TIMESTAMP,
  purchase_status INT NOT NULL DEFAULT 1
);

CREATE UNIQUE INDEX
purchases_purchase_id_where_purchase_status_eq_1_key on purchases
(item_id) WHERE purchase_status = 1;

CREATE OR REPLACE FUNCTION
purchase(IN in_item_id integer,
          IN in_purchased_by bigint,
          IN in_purchase_price integer)
RETURNS VOID AS
$BODY$
BEGIN
-- some selects
UPDATE purchases
   SET purchase_status = 0
   WHERE item_id = in_item_id
         AND purchase_status = 1;
INSERT INTO purchases (item_id, purchased_by, purchase_price)
    VALUES (in_item_id, in_purchased_by, in_purchase_price);
-- some more manipulation
END
$BODY$
LANGUAGE 'plpgsql' VOLATILE;


Re: functions, transactions, key violations

From
Jeff Davis
Date:
> CREATE OR REPLACE FUNCTION
> purchase(IN in_item_id integer,
>          IN in_purchased_by bigint,
>          IN in_purchase_price integer)
>   RETURNS VOID AS
> $BODY$
> BEGIN
>   -- some selects
>   UPDATE purchases
>     SET purchase_status = 0
>     WHERE item_id = in_item_id
>           AND purchase_status = 1;
>   INSERT INTO purchases (item_id, purchased_by, purchase_price)
>      VALUES (in_item_id, in_purchased_by, in_purchase_price);
>   -- some more manipulation
> END
> $BODY$
>   LANGUAGE 'plpgsql' VOLATILE;

It looks like some concurrent statement could come between the UPDATE
and the INSERT that can still cause a unique constraint violation.

> The Postgres documentation shows another example, which leads me to
> believe I'm missing something. If a function does occur within a
> transaction, I don't understand why the exception block is necessary
> in Example 38-1. Exceptions with UPDATE/INSERT [1]. Given the table
> from the example:
>

[snip]

> and the UPDATE didn't affect any rows, I'd expect the transaction to
> be successful.
>

The comment from the example explains it:

-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure

The idea of the example is to try to update, and if it doesn't affect
any rows, then insert. Some other concurrent transaction could still
insert something after the UPDATE but before the INSERT, so the unique
constraint violation can still occur.

Regards,
    Jeff Davis


Re: functions, transactions, key violations

From
Michael Glaesemann
Date:
On Jun 4, 2008, at 6:00 PM, Jeff Davis wrote:

>> Some other concurrent transaction could still
> insert something after the UPDATE but before the INSERT, so the unique
> constraint violation can still occur.

Yes, I saw the comment. I'm guessing I'm missing something wrt
transaction isolation level or locking. Would I need to use
SERIALIZABLE or some kind of locking? Is the function in the example
any different than the following explicit transaction (with the UPDATE
not affecting any rows)?

BEGIN;
UPDATE db SET b = data WHERE a = key;
INSERT INTO db(a,b) VALUES (key, data);
COMMIT;

I'm obviously on the cusp of learning something new, or understanding
concurrency more deeply, so I'm looking forward to your responses.

Cheers,

Michael Glaesemann
grzm seespotcode net




Re: functions, transactions, key violations

From
Tom Lane
Date:
Michael Glaesemann <grzm@seespotcode.net> writes:
> However, my point regarding the example in the docs still holds. Why
> is the exception block necessary? Doesn't wrapping the statements in a
> function ensure the unique_violation couldn't occur?

Well, the point of that example is to deal correctly with the case where
two sessions try to insert the same key concurrently.  The second to
reach the index insertion point will see the conflicting index entry
already there, and it will block until it sees the other transaction
commit or abort, and if commit it will then throw a unique exception.

Are you certain that concurrent executions of purchase() for the same
item_id can never happen in your system?  Because that certainly seems
the most likely explanation from here.

            regards, tom lane

Re: functions, transactions, key violations

From
Jeff Davis
Date:
On Wed, 2008-06-04 at 18:21 -0400, Michael Glaesemann wrote:
> Yes, I saw the comment. I'm guessing I'm missing something wrt
> transaction isolation level or locking. Would I need to use
> SERIALIZABLE or some kind of locking? Is the function in the example
> any different than the following explicit transaction (with the UPDATE
> not affecting any rows)?

I think the best way to explain this is with a timeline of two
concurrent sessions, s1 and s2.

s1: BEGIN;
s2: BEGIN;
s1: UPDATE db SET b = data WHERE a = key; -- matches no rows
s2: UPDATE db SET b = data WHERE a = key; -- matches no rows
s1: INSERT INTO db(a,b) VALUES (key, data); -- inserts with a = key
s1: COMMIT;
s2: INSERT INTO db(a,b) VALUES (key, data); -- unique violation!

Notice that neither of the updates block, because neither match any
rows, so there is no conflict.

The exception handling in the loop in the example then retries s2
entirely, which then (correctly) updates the tuple rather than
inserting. There's some degenerate case, I suppose, when sessions are
perfectly synchronized with DELETEs such that it causes an infinite
loop, but that's a pretty unrealistic scenario.

SERIALIZABLE transactions don't really affect this, because the updates
still don't match any rows. Serializable transactions really only affect
the snapshot that you see and whether an UPDATE/DELETE causes a
serialization error (which can only happen if they match some rows).

The thing about a relation constraint (like UNIQUE) is that two
completely separate tuples can conflict with each other. That requires a
relation-level synchronization mechanism, because it can't assure that
the constraint is satisfied by examining tuples (or any proper subsets
of the relation) independently.

The general way to implement a relation constraint is by using LOCK
TABLE to prevent other concurrent sessions from interfering (as you
suggest above). This obviously has very bad performance, which is why
UNIQUE indexes provide another synchronization mechanism at the sub-
transaction level.

Regards,
    Jeff Davis



Re: functions, transactions, key violations

From
Ioannis Tambouras
Date:
 Apart from concurrency issues, it is possible that you
have sequence generation problems. Depending on how
you inserted the original rows into the 'purchases' table, it is possible
that the nextval number has not kept-up and is lagging behind.

 You need to ensure that 'purchases_purchase_id_seq' is pointing
to the correct next value! That is, if it is current nextval
is number 100, but you already have 110 rows on the table (without gaps),
it is no wonder you will receive primary key violations for the
next 10 inserts but will work fine afterwards. ( Assuming, of course,
you are the only one receiving values from the sequence.)


 Thanks
 Ioannis Tambouras