Thread: Am I locking more than I need to?
Right now performance isn't a problem, but this question has me curious: Let's say I have a shopping cart system where there is a "products" table that contains all possible products, and an "cart_items" table that stores how many of each product are in each cart. The obvious (or the first thing that came to my mind) would look something like this: create table products ( id serial primary key, ... ); create table cart_items ( id serial primary key, cart_id int references ..., prod_id int references product(id), quantity int ); The problem is, when you add the first item to "cart_items" you have to do an INSERT with a quantity of 1, but after that you need to do UPDATEs. That would seem to create a potential race condition, so in order for that to work it would seem you would need to do an ACCESS EXCLUSIVE lock on the table to make sure no other process was reading the table at the same time. Assuming my logic above is correct, there are two other ways I thought to do it, but both seem considerably more redundant: (1) I could just get rid of the "quantity" attribute and just insert a record for each product, then do a view that aggregates the products of the same prod_id and cart_id with count(). (2) Every time I add a product I could add a record with a quantity of 0 for each cart in existance, and every time I add a cart I could add a record with a quantity of 0 for each product. Is there some better solution that I'm missing? It seems like a simple problem, but right now I'm doing the full table lock to be on the safe side. Maybe there's some solution involving check constraints? Regards, Jeff Davis
The world rejoiced as jdavis-pgsql@empires.org (Jeff Davis) wrote: > The problem is, when you add the first item to "cart_items" you have to > do an INSERT with a quantity of 1, but after that you need to do > UPDATEs. That would seem to create a potential race condition, so in > order for that to work it would seem you would need to do an ACCESS > EXCLUSIVE lock on the table to make sure no other process was reading > the table at the same time. Various sorts of race conditions are possible in multi-user multi-tasking systems; what _actual_ problem are you expecting to have here? What I would expect is that putting a unique index onto cart_items based on (cart_id, prod_id) would prevent getting the confusing situation of having multiple quantities of a single product in a single cart. I imagine that is the best thing to try to prevent, and that is readily done without any "locks" by adding a UNIQUE constraint. But perhaps I am imagining a different error condition. Can you describe the nature of the error condition that you are thinking about? That may help indicate what foreign key checks and/or uniqueness constraints might be worth adding. -- let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];; http://cbbrowne.com/info/internet.html This login session: only $23.95!
On Thursday May 20 2004 8:19, Jeff Davis wrote: > > create table products ( > id serial primary key, > ... > ); > > create table cart_items ( > id serial primary key, > cart_id int references ..., > prod_id int references product(id), > quantity int > ); > > The problem is, when you add the first item to "cart_items" you have to > do an INSERT with a quantity of 1, but after that you need to do > UPDATEs. That would seem to create a potential race condition, so in > order for that to work it would seem you would need to do an ACCESS > EXCLUSIVE lock on the table to make sure no other process was reading > the table at the same time. I'm not sure what potential race condition you see since you haven't said much about how your transactions fit in here. But I would suggest you go with your first design and don't worry about any explicit locking unless/until it clearly becomes a problem. I've built numerous things similar to this, and in my experience, PostgreSQL is very good about managing the locking in an intelligent manner if your transactions are reasonably grouped. HTH.
> I'm not sure what potential race condition you see since you haven't said > much about how your transactions fit in here. But I would suggest you go > with your first design and don't worry about any explicit locking > unless/until it clearly becomes a problem. I've built numerous things > similar to this, and in my experience, PostgreSQL is very good about > managing the locking in an intelligent manner if your transactions are > reasonably grouped. > > HTH. > client1=> BEGIN; -- test to see if there's already a record there. If so, UPDATE -- if not, INSERT client1=> SELECT * from cart_items where cart_id=X AND prod_id=Y; -- no record, so INSERT client1=> INSERT into cart_items(cart_id,prod_id,quantity) VALUES(X,Y,1); client2=> SELECT * from cart_items where cart_id=X AND prod_id=Y; -- still no record, since client1 didn't commit yet client1=> COMMIT; -- now client2 needs to insert client2=> INSERT into cart_items(cart_id,prod_id,quantity) VALUES(X,Y,1); client2=> COMMIT; -- Oops, now there are two records in there. That's the condition I was worried about. Thanks, Jeff Davis
> Various sorts of race conditions are possible in multi-user > multi-tasking systems; what _actual_ problem are you expecting to have > here? > I posted the condition as a reply to Ed L., and copied it to the bottom of this message. > What I would expect is that putting a unique index onto cart_items > based on (cart_id, prod_id) would prevent getting the confusing > situation of having multiple quantities of a single product in a > single cart. > It looks like you knew what I was referring to anyway, and the UNIQUE constraint looks like another good solution. It would make the second transaction unable to commit, allowing the application to detect the error and send an update. One thing though, it would seem that it would have to be in the application code, since if I make a user-defined function I couldn't have a transaction inside it (at least until the 2PC patch makes it into a release). So, in a user-defined function I couldn't detect the error, because it would abort the outer transaction, right? So, it seems a little back-and-forth with the application would be required if using a unique constraint. It certainly seems like a performance win for concurrent access though (not that performance is currently a problem for me). Thanks, Jeff Davis client1=> BEGIN; -- test to see if there's already a record there. If so, UPDATE -- if not, INSERT client1=> SELECT * from cart_items where cart_id=X AND prod_id=Y; -- no record, so INSERT client1=> INSERT into cart_items(cart_id,prod_id,quantity) VALUES(X,Y,1); client2=> SELECT * from cart_items where cart_id=X AND prod_id=Y; -- still no record, since client1 didn't commit yet client1=> COMMIT; -- now client2 needs to insert client2=> INSERT into cart_items(cart_id,prod_id,quantity) VALUES(X,Y,1); client2=> COMMIT; -- Oops, now there are two records in there.
Clinging to sanity, jdavis-pgsql@empires.org (Jeff Davis) mumbled into her beard: >> Various sorts of race conditions are possible in multi-user >> multi-tasking systems; what _actual_ problem are you expecting to have >> here? > > I posted the condition as a reply to Ed L., and copied it to the bottom > of this message. I saw that, yes. >> What I would expect is that putting a unique index onto cart_items >> based on (cart_id, prod_id) would prevent getting the confusing >> situation of having multiple quantities of a single product in a >> single cart. > > It looks like you knew what I was referring to anyway, and the > UNIQUE constraint looks like another good solution. It would make > the second transaction unable to commit, allowing the application to > detect the error and send an update. Right. > One thing though, it would seem that it would have to be in the > application code, since if I make a user-defined function I couldn't > have a transaction inside it (at least until the 2PC patch makes it > into a release). So, in a user-defined function I couldn't detect > the error, because it would abort the outer transaction, right? That seems to be the right understanding. The exception handling does need to be in the application. And the right response may be, for a web app, to, at that point, simply stop, pull the "cart" contents as they are now, and then report back to the user: - Problem: Attempt to simultaneously request multiple quantities of Product Foo (Could someone be messing with your cart???) - Here's what's in your cart right now... > So, it seems a little back-and-forth with the application would be > required if using a unique constraint. It certainly seems like a > performance win for concurrent access though (not that performance > is currently a problem for me). Well, I'm not sure what the likely alternatives are, without, let's say, creating a lockable table for each 'cart.' And that would seem likely to have pretty heavy effects on the application, too. Whether you "lock" or "detect errors" seems like a "six of one, half a dozen of the other" to me, and the latter is likely to be WAY more efficient :-). -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org'). http://www.ntlug.org/~cbbrowne/sap.html "You can only examine 10 levels of pushdown, because that's all the fingers you have to stick in the listing." -- Anonymous programmer - "TOPS-10 Crash Analysis Guide"
On Friday May 21 2004 12:50, Jeff Davis wrote: > > client1=> BEGIN; > -- test to see if there's already a record there. If so, UPDATE > -- if not, INSERT > client1=> SELECT * from cart_items where cart_id=X AND prod_id=Y; > -- no record, so INSERT > client1=> INSERT into cart_items(cart_id,prod_id,quantity) > VALUES(X,Y,1); > client2=> SELECT * from cart_items where cart_id=X AND prod_id=Y; > -- still no record, since client1 didn't commit yet > client1=> COMMIT; > -- now client2 needs to insert > client2=> INSERT into cart_items(cart_id,prod_id,quantity) > VALUES(X,Y,1); > client2=> COMMIT; > -- Oops, now there are two records in there. > > That's the condition I was worried about. Ah, I see. I second Christopher Browne's comments on the unique index (I assumed you were doing that) and the ease of checking errors in the app. If you don't have transactions spanning multiple pageviews and you don't have multiple people modifying the same shopping cart at the same time, it would seem this is a non-issue. But I guess you could try to explicitly lock the table. I've never done it that way, instead preferring like C.B. to enforce integrity at the schema level with the unique index and having the app handle return values, errors, etc. (In DBI, you need to set a flag to have it allow you to handle the error vs. aborting. RaiseError, maybe?). Maybe its wise to systematically handle all DB errors, but I suspect you'll never see this one occur.
pgsql@bluepolka.net ("Ed L.") writes: > On Friday May 21 2004 12:50, Jeff Davis wrote: >> >> client1=> BEGIN; >> -- test to see if there's already a record there. If so, UPDATE >> -- if not, INSERT >> client1=> SELECT * from cart_items where cart_id=X AND prod_id=Y; >> -- no record, so INSERT >> client1=> INSERT into cart_items(cart_id,prod_id,quantity) >> VALUES(X,Y,1); >> client2=> SELECT * from cart_items where cart_id=X AND prod_id=Y; >> -- still no record, since client1 didn't commit yet >> client1=> COMMIT; >> -- now client2 needs to insert >> client2=> INSERT into cart_items(cart_id,prod_id,quantity) >> VALUES(X,Y,1); >> client2=> COMMIT; >> -- Oops, now there are two records in there. >> >> That's the condition I was worried about. > > Ah, I see. I second Christopher Browne's comments on the unique > index (I assumed you were doing that) and the ease of checking > errors in the app. If you don't have transactions spanning multiple > pageviews and you don't have multiple people modifying the same > shopping cart at the same time, it would seem this is a non-issue. > But I guess you could try to explicitly lock the table. I've never > done it that way, instead preferring like C.B. to enforce integrity > at the schema level with the unique index and having the app handle > return values, errors, etc. (In DBI, you need to set a flag to have > it allow you to handle the error vs. aborting. RaiseError, maybe?). > Maybe its wise to systematically handle all DB errors, but I suspect > you'll never see this one occur. I think it's just wishful thinking to hope that there's anything that is _fundamentally_ a lot better than having the UNIQUE index, and recovering from the "not unique" errors that may arise. - If you lock the table, then that adds some _new_ error conditions that can occur, namely that an update may get blocked because the table is locked. Coping with that requires some application code. - The other approach one might _imagine_ would be useful would be to run queries that are always checking to see if the inserts look like they'll be unique. Unfortunately, that CAN'T work, because multiple connections involve multiple transaction contexts. I can think of three other approaches: 1. You create a temp table for each cart, and somehow tie the cart to a single persistent connection. It is _impossible_ for another connection to interfere, because other connections can't even see the cart. If you can associate a process with each cart, and can accept the overheads of having a DB connection for each cart that is in progress, this ought to be pretty slick. Cart tables pass in and out of existence, cleaning themselves up as needed. Quite cool. But you can't ever use connection pooling, which may be unacceptable... 2. You don't insert directly into the cart/product table; you insert into a "product request" table, that is a queue of requests. There's a big literature on this; look up "Message Queueing," and perhaps look at IBM's product MQSeries. (Microsoft made a clone called "MSMQ.") A single serial process periodically goes through that queue, and cleanly moves the data in the queue into the cart/product table. That means there's some asynchronicity; data may stay in the queue for a while, which may be a problem. Furthermore, there is an efficiency loss because every insert has to be done twice; once into the queue, and then once into the "real" table. 3. Look up the notion of "Opportunistic locking." This is pretty big these days in Java and Smalltalk applications; I won't bother explaining it Yet Again. If your application is getting hammered because big long expensive transactions doing lots of updates are failing at COMMIT point due to uniqueness constraints, OL can cut the cost. All these approaches have a big impact on application design. And I don't see them being _fundamentally_ better than just using the UNIQUE index. -- let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];; http://cbbrowne.com/info/advocacy.html "Ahhh. A man with a sharp wit. Someone ought to take it away from him before he cuts himself." -- Peter da Silva
> That seems to be the right understanding. The exception handling does > need to be in the application. And the right response may be, for a > web app, to, at that point, simply stop, pull the "cart" contents as > they are now, and then report back to the user: > > - Problem: Attempt to simultaneously request multiple quantities of > Product Foo (Could someone be messing with your cart???) > > - Here's what's in your cart right now... > Interesting. I suppose in my application it probably is a good idea to give an error, seeing as one physical person can't do anything quickly enough to violate the UNIQUE. What if postgres were to have nested transactions (I misstated above as 2PC for some reason)? Would it be desirable to do the checking on the server side in a function (attempt to insert, and if we get a unique constraint violation we update) rather than the application? > Well, I'm not sure what the likely alternatives are, without, let's > say, creating a lockable table for each 'cart.' And that would seem > likely to have pretty heavy effects on the application, too. > > Whether you "lock" or "detect errors" seems like a "six of one, half a > dozen of the other" to me, and the latter is likely to be WAY more > efficient :-). One thing that I didn't think of before is this: if I have a table of all the carts, then could I do a "SELECT ... WHERE cart_id=X FOR UPDATE" right before I did the test to see whether I should insert or update? That would basically be a row lock on just the cart I'm modifying, preventing other concurrent accesses (assuming that they are also trying to "SELECT ... FOR UPDATE") from locking the same cart, right? But it would allow other carts to be modified without waiting. Is this a viable solution? Thanks, Jeff
Don't you still have the possibility for a race-condition? Scenario: SELECT ... WHERE cart_id=X FOR UPDATE IF (NOT FOUND) THEN BEGIN --Here is where nothing is locked. --No way to guarantee no one else will create a record before we do. INSERT ... END; END IF; Once client commits - assuming UNIQUE is enforced - one of the INSERT transactions with fail. Again, have to be handled client-side. <|};-)> -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jeff Davis Sent: Friday, May 21, 2004 1:25 PM To: Christopher Browne Cc: PostgreSQL General Subject: Re: [GENERAL] Am I locking more than I need to? > That seems to be the right understanding. The exception handling does > need to be in the application. And the right response may be, for a > web app, to, at that point, simply stop, pull the "cart" contents as > they are now, and then report back to the user: > > - Problem: Attempt to simultaneously request multiple quantities of > Product Foo (Could someone be messing with your cart???) > > - Here's what's in your cart right now... > Interesting. I suppose in my application it probably is a good idea to give an error, seeing as one physical person can't do anything quickly enough to violate the UNIQUE. What if postgres were to have nested transactions (I misstated above as 2PC for some reason)? Would it be desirable to do the checking on the server side in a function (attempt to insert, and if we get a unique constraint violation we update) rather than the application? > Well, I'm not sure what the likely alternatives are, without, let's > say, creating a lockable table for each 'cart.' And that would seem > likely to have pretty heavy effects on the application, too. > > Whether you "lock" or "detect errors" seems like a "six of one, half a > dozen of the other" to me, and the latter is likely to be WAY more > efficient :-). One thing that I didn't think of before is this: if I have a table of all the carts, then could I do a "SELECT ... WHERE cart_id=X FOR UPDATE" right before I did the test to see whether I should insert or update? That would basically be a row lock on just the cart I'm modifying, preventing other concurrent accesses (assuming that they are also trying to "SELECT ... FOR UPDATE") from locking the same cart, right? But it would allow other carts to be modified without waiting. Is this a viable solution? Thanks, Jeff ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
On Fri, 2004-05-21 at 14:33, Carl E. McMillin wrote: > Scenario: > > SELECT ... WHERE cart_id=X FOR UPDATE > > IF (NOT FOUND) THEN > BEGIN > --Here is where nothing is locked. > --No way to guarantee no one else will create a record before we do. > INSERT ... > END; > END IF; > Instead, I was thinking more like: BEGIN SELECT ... WHERE cart_id=X FOR UPDATE IF (NOT FOUND) THEN --Here is where nothing is locked. --No way to guarantee no one else will create a record before we do. INSERT ... ELSE UPDATE ... END IF; END; Won't that "SELECT ... FOR UPDATE" block out a concurrent access to the same cart until the first one finishes? Of course this assumes all concurrent accesses also try to "SELECT ... FOR UPDATE" before inserting. Thanks, Jeff Davis
Hi, > Instead, I was thinking more like: > > ... > ELSE > UPDATE ... Sorry: I neglected that for the nut of the gnarly part. Now you have to factor how MVCC behaves. My understanding is that, depending on the connection's transaction-isolation level, READ_COMMITTED transactions will only see those records committed at the START of the transaction (actually, I think it's before any modifications - such as UPDATE, INSERT, etc. - are made in the transaction). I'm presuming here that READ_SERIALIZABLE is way too heavy-handed for your application. So it's possible that you can have one or more transactions - clients trying to add a cart, select a cart, whatever - and not see any changes in any other transaction until the COMMIT of an INSERTed cart. Then the backends have to resolve WHO actually gets to INSERT the UNIQUE'ly qualified cart: only one should win and the others should throw "uniqueness violation" exceptions. Since postgres doesn't do nested-transactions the client has to rollback and submit the query again; the new transaction should see the newly committed record and on you chug. For support of this line of thinking, view the conversation where Tom Lane described the overall problem much more illustratively than I can. http://archives.postgresql.org/pgsql-general/2004-04/msg01153.php Carl <|};-)> -----Original Message----- From: Jeff Davis [mailto:jdavis-pgsql@empires.org] Sent: Friday, May 21, 2004 3:24 PM To: Carl E. McMillin Cc: 'PostgreSQL General' Subject: Re: [GENERAL] Am I locking more than I need to? On Fri, 2004-05-21 at 14:33, Carl E. McMillin wrote: > Scenario: > > SELECT ... WHERE cart_id=X FOR UPDATE > > IF (NOT FOUND) THEN > BEGIN > --Here is where nothing is locked. > --No way to guarantee no one else will create a record before we do. > INSERT ... > END; > END IF; > Instead, I was thinking more like: BEGIN SELECT ... WHERE cart_id=X FOR UPDATE IF (NOT FOUND) THEN --Here is where nothing is locked. --No way to guarantee no one else will create a record before we do. INSERT ... ELSE UPDATE ... END IF; END; Won't that "SELECT ... FOR UPDATE" block out a concurrent access to the same cart until the first one finishes? Of course this assumes all concurrent accesses also try to "SELECT ... FOR UPDATE" before inserting. Thanks, Jeff Davis
On Friday 21 May 2004 06:24 pm, Jeff Davis wrote: > On Fri, 2004-05-21 at 14:33, Carl E. McMillin wrote: > > Scenario: > > > > SELECT ... WHERE cart_id=X FOR UPDATE > > > > IF (NOT FOUND) THEN > > BEGIN > > --Here is where nothing is locked. > > --No way to guarantee no one else will create a record before we do. > > INSERT ... > > END; > > END IF; > > Instead, I was thinking more like: > > BEGIN > SELECT ... WHERE cart_id=X FOR UPDATE > IF (NOT FOUND) THEN > --Here is where nothing is locked. > --No way to guarantee no one else will create a record before we do. > INSERT ... > ELSE > UPDATE ... > END IF; > END; This is basically what I am doing. See below for the PL/PGSQL for a trigger based implimentation. It effectively SERIALIZEs the one table in question, any other table perfrom at the normail speed. Hope it helps! -miker (see below) ----------------------------------------- -- -- Merge on INSERT functionallity for Postgres 7.3+ -- -- miker@purplefrog.com / 5-1-04 -- -- CAVEAT EMPTOR: Uses table locks to avoid concurrency issues, -- so it WILL slow down heavily loaded tables. -- This effecivly puts the table into -- TRANSACTION ISOLATION LEVEL SERIALIZABLE mode. -- CREATE OR REPLACE FUNCTION add_merge_on_insert ( TEXT, -- table name TEXT, -- key column TEXT[] -- column list to update on deduplication ) RETURNS TEXT RETURNS NULL ON NULL INPUT SECURITY INVOKER LANGUAGE 'plpgsql' AS ' DECLARE tablename ALIAS FOR $1; keycol ALIAS FOR $2; updatecols ALIAS FOR $3; trig TEXT; arraydims TEXT; BEGIN trig := \' CREATE FUNCTION "\' || tablename || \'_merge_on_insert_proc" () RETURNS TRIGGER AS \'\' DECLARE orig \' || quote_ident(tablename) || \'%ROWTYPE; BEGIN LOCK TABLE \' || quote_ident(tablename) || \' IN ROW EXCLUSIVE MODE; SELECT INTO orig * FROM \' || quote_ident(tablename) || \' WHERE \' || quote_ident(keycol) || \' = NEW.\'|| quote_ident(keycol) || \'; IF NOT FOUND THEN RETURN NEW; END IF; UPDATE \' || quote_ident(tablename) || \' SET \'; arraydims := array_dims(updatecols); FOR i IN 1 .. (substring(arraydims from (position(\':\' in arraydims) + 1 ) for ( position(\']\' in arraydims) - (position(\':\'in arraydims) + 1 ) )))::INT LOOP trig := trig || quote_ident(updatecols[i]) || \' = COALESCE( NEW.\' || quote_ident(updatecols[i]) || \', orig.\'|| quote_ident(updatecols[i]) || \'), \'; END LOOP; trig := substring( trig from 0 for (character_length(trig) - 1)); trig := trig || \' WHERE \' || quote_ident(keycol) || \' = NEW.\' || quote_ident(keycol) || \'; RETURN NULL; END; \'\' LANGUAGE \'\'plpgsql\'\'; \'; EXECUTE trig; EXECUTE \' CREATE TRIGGER "\' || tablename || \'_merge_on_insert_trig" BEFORE INSERT ON \' || quote_ident(tablename) || \' FOR EACH ROW EXECUTE PROCEDURE "\' || tablename || \'_merge_on_insert_proc" (); \'; RETURN \'FUNCTION \' || tablename || \'_merge_on_insert_proc (); TRIGGER \' || tablename || \'_merge_on_insert_trig;\'; END; '; CREATE OR REPLACE FUNCTION remove_merge_on_insert ( TEXT -- table name ) RETURNS TEXT RETURNS NULL ON NULL INPUT SECURITY INVOKER LANGUAGE 'plpgsql' AS ' BEGIN EXECUTE \'DROP FUNCTION "\' || $1 || \'_merge_on_insert_proc" () CASCADE;\'; RETURN \'FUNCTION \' || $1 || \'_merge_on_insert_proc (); TRIGGER \' || $1 || \'_merge_on_insert_trig;\'; END; ';
At 07:19 PM 5/20/2004 -0700, Jeff Davis wrote: >Assuming my logic above is correct, there are two other ways I thought >to do it, but both seem considerably more redundant: > >(1) I could just get rid of the "quantity" attribute and just insert a >record for each product, then do a view that aggregates the products of >the same prod_id and cart_id with count(). > >(2) Every time I add a product I could add a record with a quantity of 0 >for each cart in existance, and every time I add a cart I could add a >record with a quantity of 0 for each product. > >Is there some better solution that I'm missing? It seems like a simple >problem, but right now I'm doing the full table lock to be on the safe >side. Maybe there's some solution involving check constraints? Full table lock works but blocks normal selects. If you can manage to use a uniqueness enforcement then that works too (but you'll have to deal with the errors). Alternatively you can use a table lock mode that doesn't lock plain selects but locks select for updates and similar stuff (you may still wish to have uniqueness enforcement just in case). e.g. pseudosub putrow (tablename,whereclause,namevaluepairs) LOCK TABLE tablename IN SHARE ROW EXCLUSIVE MODE select ... from tablename where whereclause for update if found update tablename .... else insert into tablename endif I'm not aware of a standard SQL command to do this, which seems like a common enough requirement. And the bright sparks made the syntax for updates different from inserts. Oh well, maybe it's just me. Link.