Thread: How can I see if my code is "concurrency safe"?
Hi there! Today I realised that my knowledge concerning how postgres handles concurrency is not very good, and its even worse whenit comes to using that knowledge in real-life. Let me give you an example. I have this table create table userpositions ( userID int, positionID int, unique (userID,positionID)); For a given userID there can be many positionIDs. There are then two operations performed on this table, the first is "select positionID from userpositions where userID=..."to get all positions for a user, and the second is to replace all positions for the user with new positions. Forthis I was thinking about running it in a transaction begin; delete from userpositions where userID=...; insert into userpositions (userID,positionID) values ....; commit; But will this work? I don't want select to return empty results, I don't want two inserts running causing a unique violation. Experimenting with it tells me yes, it will work, but how should I reason to "convinse" my self that it will work? Quoting the docs: "The partial transaction isolation provided by Read Committed mode is adequate for many applications,and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do complexqueries and updates might require a more rigorously consistent view of the database than Read Committed mode provides." How do I know I'm not creating one of those complex queries? Or to put it differntly, how do you identify a complex querywith potential issues? Any special techniques to analyze? /J
On Apr 25, 2012, at 5:17 PM, Janne H wrote: > Hi there! > > Today I realised that my knowledge concerning how postgres handles concurrency is not very good, and its even worse whenit comes to using that knowledge in real-life. > > Let me give you an example. > I have this table > > create table userpositions ( userID int, positionID int, unique (userID,positionID)); > > For a given userID there can be many positionIDs. > > There are then two operations performed on this table, the first is "select positionID from userpositions where userID=..."to get all positions for a user, and the second is to replace all positions for the user with new positions. Forthis I was thinking about running it in a transaction > > begin; > delete from userpositions where userID=...; > insert into userpositions (userID,positionID) values ....; > commit; > > But will this work? I don't want select to return empty results, I don't want two inserts running causing a unique violation. > Experimenting with it tells me yes, it will work, but how should I reason to "convinse" my self that it will work? > > Quoting the docs: "The partial transaction isolation provided by Read Committed mode is adequate for many applications,and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do complexqueries and updates might require a more rigorously consistent view of the database than Read Committed mode provides." > > > How do I know I'm not creating one of those complex queries? Or to put it differntly, how do you identify a complex querywith potential issues? Any special techniques to analyze? Think about it this way: once one session starts a transaction, any modifications it makes are invisible to other sessionsuntil you commit your transaction, at which point they all become visible atomically. (Unless those other sessionshave explicitly said, "yes, I want to play with fire" and changed their isolation mode to something other than ReadCommitted.) So given what you've said, it seems like you should be set, so long as you don't have two different sessions try to insertthe same {userID,positionID} tuple. If that happens, then the second one that tries to commit will fail. You can avoidthat by using a sequence for positionID, which will result in gaps and non-sequential IDs, but also no uniqueness failures.
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Ben Chobot > Sent: Wednesday, April 25, 2012 7:29 PM > To: Janne H > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] How can I see if my code is "concurrency safe"? > > On Apr 25, 2012, at 5:17 PM, Janne H wrote: > > > Hi there! > > > > Today I realised that my knowledge concerning how postgres handles > concurrency is not very good, and its even worse when it comes to using that > knowledge in real-life. > > > > Let me give you an example. > > I have this table > > > > create table userpositions ( userID int, positionID int, unique > > (userID,positionID)); > > > > For a given userID there can be many positionIDs. > > > > There are then two operations performed on this table, the first is > > "select positionID from userpositions where userID=..." to get all > > positions for a user, and the second is to replace all positions for > > the user with new positions. For this I was thinking about running it > > in a transaction > > > > begin; > > delete from userpositions where userID=...; > > insert into userpositions (userID,positionID) values ....; commit; > > > > But will this work? I don't want select to return empty results, I don't want > two inserts running causing a unique violation. > > Experimenting with it tells me yes, it will work, but how should I reason to > "convinse" my self that it will work? > > > > Quoting the docs: "The partial transaction isolation provided by Read > Committed mode is adequate for many applications, and this mode is fast > and simple to use; however, it is not sufficient for all cases. Applications that > do complex queries and updates might require a more rigorously consistent > view of the database than Read Committed mode provides." > > > > > > How do I know I'm not creating one of those complex queries? Or to put it > differntly, how do you identify a complex query with potential issues? Any > special techniques to analyze? > > Think about it this way: once one session starts a transaction, any > modifications it makes are invisible to other sessions until you commit your > transaction, at which point they all become visible atomically. (Unless those > other sessions have explicitly said, "yes, I want to play with fire" and changed > their isolation mode to something other than Read Committed.) > > So given what you've said, it seems like you should be set, so long as you > don't have two different sessions try to insert the same {userID,positionID} > tuple. If that happens, then the second one that tries to commit will fail. You > can avoid that by using a sequence for positionID, which will result in gaps > and non-sequential IDs, but also no uniqueness failures. > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general ===================================================== Please keep in mind that simply adding in a "SERIAL" (sequence) value to avoid duplicate keys is dangerous as well. While it avoids a physical duplicate you are now able to insert a logical duplicate into the system. Specific to your code could you just perform the following? UPDATE userpositions SET userid = new_userid WHERE userid = old_userid; In response to Ben: it appears that positionID is a FK in this situation which means whether the corresponding PK is serial or not is irrelevant. To Janne: If this indeed is a foreign key situation (multi-multi reference table) you should define both columns using "FOREIGN KEY ... REFERENCES ..." Since you are inserting records for a different userID than the one you are deleting you currently have no checking in place to ensure that one of your INSERT values is not already on the table. You also have not defined the relationship between position and user (i.e., can more than one person hold a given position). Assuming each position can only have one person active at a time you should define the positionID field as UNIQUE by itself as well. Then, if you ensure you only add records that correspond to the deleted records shown you can ensure that no duplicates will exist since someone trying to add a duplicate position before you commit will have to wait on the DELETE to unlock the position and by the time that happens you will already have insert a new record for that position. RISKY: Your risk, as shown, is between obtaining your "values" and running the transaction delete/insert your "values" become invalid. Say after you select but before you delete/insert someone adds a new position to the user. Your delete will remove the newly added position but your subsequent insert will not have the new data and thus that position will end up un-filled after your routine. Same thing applies if they delete a position entry - you will end up filling it again. It is also possible that the position could be double-filled if the userID is changed during your window. Since you have not provided ALL of your code (especially the SELECT/VALUES portion) I would assume that you are obtaining the data nievely and so you do have a concurrency issue to address. Dave
On Wed, Apr 25, 2012 at 4:17 PM, Janne H <jannehson51@yahoo.com> wrote: > Hi there! > > Today I realised that my knowledge concerning how postgres handles concurrency is not very good, and its even worse whenit comes to using that knowledge in real-life. I think what everyone here is trying to say is that while PostgreSQL is very good at handling concurrency itself, you may want to be careful about making sure your relational constraints are correct. This means proper use of multi-column keys (either as primary keys, unique constraints, or foreign keys). If you do a good job here, PostgreSQL will handle your concurrency needs for you. On an internal level, look at how MVCC works. It isn't really a question of overall concurrency so much as it is a matter of concurrency performance, since locking is no longer as necessary. Also check out the serializable transaction level, though the need for this sort of concurrency protection is rare. Best Wishes, Chris Travers
On Wed, 25 Apr 2012 16:17:53 -0700 (PDT) Janne H <jannehson51@yahoo.com> wrote: > Hi there! > > Today I realised that my knowledge concerning how postgres handles concurrency is not very good, and its even worse whenit comes to using that knowledge in real-life. > > Let me give you an example. > I have this table > > create table userpositions ( userID int, positionID int, unique (userID,positionID)); > > For a given userID there can be many positionIDs. > > There are then two operations performed on this table, the first is "select positionID from userpositions where userID=..."to get all positions for a user, and the second is to replace all positions for the user with new positions. Forthis I was thinking about running it in a transaction > > begin; > delete from userpositions where userID=...; > insert into userpositions (userID,positionID) values ....; > commit; > > But will this work? I don't want select to return empty results, I don't want two inserts running causing a unique violation. > Experimenting with it tells me yes, it will work, but how should I reason to "convinse" my self that it will work? > > Quoting the docs: "The partial transaction isolation provided by Read Committed mode is adequate for many applications,and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do complexqueries and updates might require a more rigorously consistent view of the database than Read Committed mode provides." > > > How do I know I'm not creating one of those complex queries? Or to put it differntly, how do you identify a complex querywith potential issues? Any special techniques to analyze? Just try it. Open two psql sessions and manually enter the commands one at a time, alternating between windows. Try some scenarios that you're concerned about and see what happens. Experience will teach you a lot very quickly. I can say, from experience, that what you have above is not sufficient to protect you from empty result sets and duplicate keys in rare, but annoyingly persistent cases. Especially if your front-end is a web-app, you may be shocked to find out how often users will double-click instead of single click, thus resulting in the exact same queries running close enough together to conflict with each other. Of course, this can be fixed in many ways, but understanding why it happens is the first step to choosing the best approach for your application. -- Bill Moran <wmoran@potentialtech.com>