Thread: Hibernate / other ORM and pg_advisory_lock
Hi all I'm about to settle on a database implementation for the Java part of the development project I'm working on, and I'm wondering if anybody here has any experience using PostgreSQL's advisory locking features with ORM database layers like Hibernate, TopLink, OpenJPA, Cayenne, iBatis, etc. I'm hoping to avoid relying on locking using long running transactions that're open across user interaction (ugh) and pg_advisory_lock etc seem well suited to the task. At this point it's looking like I'll be using Hibernate (via Hibernate EntityManager / JPA), so I'm especially interested in comments related to it. In particular, in a J2SE desktop environment, has anybody here made use (successful or otherwise) of Pg's advisory locking features? They work fine in my test cases with Hibernate EntityManager and with OpenJPA (not with TopLink Essentials due to connection management issues) but working in a few reasonably simple tests is very different from working reliably in production. I'm particularly worried about ensuring that connections aren't cached, pooled, or terminated & reestablished behind my back - say after an error that results in a transaction rolling back. The connection drop/reestablish would release any advisory locks being held, and if that's a risk then data must be selected FOR UPDATE and compared to the old copies before making any changes. Which also involves persuading the ORM layer to do that, and do it the right way. [Java newbie warning] I'm also curious about strategies people have used to ensure reliable advisory lock release when the persistence layer finishes with the object. I'd prefer to avoid the need to explicitly release the locks, instead having them released when the persistence layer "forgets" about the object. In C++ (w/o a gc at least) I might just use a dtor, but with the unpredictability of destruction timing in a gc'd environment it's not so attractive in Java. Right now I'm inclined to define a finish() method that causes all subsequent calls to accessors to throw, and call it from finalize in case some code path fails to call it manually when done with the object. Totally stupid, or workable sounding? [/Java newbie warning] So ... any thoughts/experiences using Hibernate or other ORM tools with Pg's advisory locking? -- Craig Ringer
Hi Craig, I have not used the pg_advisory_lock, so I can't comment on the use of it with various tools. With JPA you can add a version column to your table, either an integer or timestamp datatype, with the preference being an integer field as the timestamp column may not be granular enough to avoid data integrity issues, and flag that column as a version field with the @Version annotation in your corresponding entity, which will then tell JPA to use this field for efficient optimistic locking. I'm happy to stand corrected, but my understanding of JPA is that by default it uses optimistic locking anyway. But without a @Version field, data integrity is not guaranteed as it uses the last commit wins strategy, whereas if a @Version annotation exists, it compares the version value, and if it does not exist on the tuple in the database for the current object you hold in memory, JPA will throw an OptimisticLockException. Without the @Version annotation, you run the risk of blowing away someone else's entries when you do your update with older data. If you are happy with just a dirty read, and do not need to get an optimistic lock, then you can use the @OptimisticLock(excluded=true) annotation, which is particularly useful on relationships that you will not be updating at the same time as the current entity, such as for reference values. To set pessimistic locking, you need to go to the Entity Manager and place the lock there, with the different JPA vendors all having different ways of doing so, which doesn't make using pessimistic locking in JPA very portable. Using a pessimistic locking strategy such as the use of pg_advisory_lock ties you to the database vendor, again not a very portable approach, which may still be okay for you. Additionally, any pessimistic locking will limit the scalability of your solution. It may be the best solution for your particular domain model, but in most cases, pessimistic locking is generally deemed an evil. I would be very hesitant in going down the path of using pg_advisory_locks as from what I understand, they are tied to the session life, not to a transaction. So the lock will persist even after a transaction is committed or rolled back. Which means that the onus is on you to call pg_advisory_unlock successfully to remove the lock at the appropriate time and you need to manage the uniqueness of the lock handle across multiple clients. I can see some really useful things to do with this function, but I wouldn't be using it for plain CRUD work. That is what the ORM layer is there for. I'm not quite sure I follow what your concern is with the transaction. If I understand correctly you are concerned that with a transaction rolling back and then a subsequent update not updating correctly as it is updating with a stale version that was cached? If you use the @Version annotation, this will resolve that issue, while still providing you with full ACID behaviour, but in a highly concurrent environment, you may need to handle a number of OptimisticLockException's in the client, which is usually better than handling longer transaction times per client. Using either Optimistic or Pessimistic locking though will not stop you from getting race conditions and deadlocks if you have not designed the application well. But the again, even the best developers run into them from time to time. Additionally, with an ORM such as JPA, the entities you use to populate your client UI are created outside of a transaction - at least created by default with a read lock which is dropped once the entities are created. It is only when you need to add a new record, or update an existing one, that you enlist a transaction for that particular event. So transactions in an Optimistic locking model should have very short life spans. If you have a long running transaction using pessimistic locking while waiting for user interaction, then you probably need to reconsider your design. With an ORM such as JPA, the EM should clean up any locks on completion of the transaction, though you should ideally declare the completion of the transaction either with a commit or a rollback. The typical pattern is to wrap the DB call in a try catch block, and commit at the end of the try, rollback in the catch, and close your JPA/JDBC objects in the finally block - which are themselves wrapped in a try block depending on how defensive your coding style is. However, some web frameworks such as Seam take care of the whole transaction cycle by default without a need to call any transaction demarcation points, and you need to override the behaviour if you want more atomic control. In Java, unlike the C destroy(), the finalize() method is non-deterministic, in that it only gets run when the GC cleans up the object, which may be never. It is not a good idea to depend on it to clean up your resources. Additionally, the finalize method is only ever called once on an object, so if the object is GC'd and the finalize method is called and in the finalize method it resurrects itself, perhaps by passing a reference of itself to another object, the finalize method will not be called again for that particular object when it comes time again to dispose of it. Generally, you avoid the finalize method, as there are not many examples that you would depend on it for, other than for a last ditch best effort attempt to clean up any resources you may not have otherwise been able to. It is another coding style that is generally deemed an evil. Hopefully I have answered your question, and if not, at least pointed you in the right direction. Cheers, Andy Craig Ringer wrote: > Hi all > > I'm about to settle on a database implementation for the Java part of > the development project I'm working on, and I'm wondering if anybody > here has any experience using PostgreSQL's advisory locking features > with ORM database layers like Hibernate, TopLink, OpenJPA, Cayenne, > iBatis, etc. I'm hoping to avoid relying on locking using long running > transactions that're open across user interaction (ugh) and > pg_advisory_lock etc seem well suited to the task. > > At this point it's looking like I'll be using Hibernate (via Hibernate > EntityManager / JPA), so I'm especially interested in comments related > to it. In particular, in a J2SE desktop environment, has anybody here > made use (successful or otherwise) of Pg's advisory locking features? > They work fine in my test cases with Hibernate EntityManager and with > OpenJPA (not with TopLink Essentials due to connection management > issues) but working in a few reasonably simple tests is very different > from working reliably in production. > > I'm particularly worried about ensuring that connections aren't > cached, pooled, or terminated & reestablished behind my back - say > after an error that results in a transaction rolling back. The > connection drop/reestablish would release any advisory locks being > held, and if that's a risk then data must be selected FOR UPDATE and > compared to the old copies before making any changes. Which also > involves persuading the ORM layer to do that, and do it the right way. > > [Java newbie warning] I'm also curious about strategies people have > used to ensure reliable advisory lock release when the persistence > layer finishes with the object. I'd prefer to avoid the need to > explicitly release the locks, instead having them released when the > persistence layer "forgets" about the object. In C++ (w/o a gc at > least) I might just use a dtor, but with the unpredictability of > destruction timing in a gc'd environment it's not so attractive in > Java. Right now I'm inclined to define a finish() method that causes > all subsequent calls to accessors to throw, and call it from finalize > in case some code path fails to call it manually when done with the > object. Totally stupid, or workable sounding? [/Java newbie warning] > > So ... any thoughts/experiences using Hibernate or other ORM tools > with Pg's advisory locking? > > -- > Craig Ringer >
Thanks for your interesting and clear response. My comments are interleaved below. Andrew wrote: > With JPA you can add a version column to your table, either an integer > or timestamp datatype, with the preference being an integer field as > the timestamp column may not be granular enough to avoid data > integrity issues, and flag that column as a version field with the > @Version annotation in your corresponding entity, which will then tell > JPA to use this field for efficient optimistic locking. At least in a desktop app it's sometimes desirable to inform a user that (for example) somebody else is currently modifying a customer record, so they can't edit it right now, then possibly offer to wait. In this environment that's significantly preferable in user interface terms to telling the user "Whoops, can't save your changes, somebody else beat you to it", reloading the data in the UI and getting them to make their changes again. It's not something I can do with optimistic locking, and doing it with JPA pessimistic locking a transaction must be held open during user think-time. Or lunch-time, for that matter, though I'd time out any exclusive access after a while. Such long running transactions are not an attractive option. The locking to protect against conflicting concurrent updates in a transaction context is required even if advisory locking is used; the advisory locking is just useful to minimise the risk of conflicts in cases where the conflict can't be resolved without bugging the user. The app should still work and shouldn't mangle data even if advisory locking attempts were replaced with a call that always succeeded immediately, though it'd have to force users to retry/reenter some things. > > I'm happy to stand corrected, but my understanding of JPA is that by > default it uses optimistic locking anyway. But without a @Version > field, data integrity is not guaranteed as it uses the last commit > wins strategy [..] To set pessimistic locking, you need to go to the > Entity Manager and place the lock there, with the different JPA > vendors all having different ways of doing so, which doesn't make > using pessimistic locking in JPA very portable. Yep, that was my understanding. I was really hoping there'd also be the option to use traditional SELECT ... FOR UPDATE locking because I have existing apps using the database that handle trickier updates that way, but it doesn't look like that's covered within the JPA spec. The spec defines a lock(....) method on the EntityManager: http://java.sun.com/javaee/5/docs/api/javax/persistence/EntityManager.html#lock(java.lang.Object,%20javax.persistence.LockModeType) and defines the LockModeType enumeration, though it assumes that locking will be done using a version column rather than using SELECT ... FOR UPDATE. Other apps use this database, and it's not at all attractive to be forced to use the (to my mind clunky, though efficient under high loads) JPA version column approach for update collision detection. All the other DB-using code would also have to be altered to increment the version column, and all the rest already uses normal database concurrency control like using safe updates (UPDATE blah SET thing = thing + 1) or SELECT ... FOR UPDATE locking to make sure everything goes smoothly. I might be able to do something hairy like a trigger that increments the version column on UPDATE only if it's not explicitly updated by the app, but I'd prefer to get the Java app to fit in with the usual locking and concurrency control scheme. This is the sort of thing I was worried about with the use of ORM tools. > Using a pessimistic locking strategy such as the use of > pg_advisory_lock ties you to the database vendor, again not a very > portable approach, which may still be okay for you. It is OK for the workload of this app, but certainly an issue for some apps. This app is already highly tied to Pg, and to a fair extent is just a user interface for the brains of the system that live in Pg. > Additionally, any pessimistic locking will limit the scalability of > your solution. Again, no big deal in the cases its intended for. Areas where I'd be using advisory locking are generally low concurrency, and the advisory locking is intended to improve "user experience" rather than as a data protection strategy. The app will still use appropriate locking during the transaction used to actually commit the changes; the advisory lock is to ensure there's no chance of another user trying to then modify the same data and having the change rejected. > It may be the best solution for your particular domain model, but in > most cases, pessimistic locking is generally deemed an evil. I would > be very hesitant in going down the path of using pg_advisory_locks as > from what I understand, they are tied to the session life, not to a > transaction. So the lock will persist even after a transaction is > committed or rolled back. Which means that the onus is on you to call > pg_advisory_unlock successfully to remove the lock at the appropriate > time and you need to manage the uniqueness of the lock handle across > multiple clients. Yep, those are exactly the characteristics that make advisory locking attractive. A lock can be taken to inform other apps that "the user intends to modify this <thing>" where <thing> might be a particular record, collection of records with a certain common characteristic, etc. Other apps can test the lock and use that information for things like warning users "this <thing> is being modified by someone else", and can also where it's useful wait on the lock. No long-running transactions with locks need to be held open. They're also neatly cleaned up on disconnect so there are fewer issues of dangling locks of an app crashes, loses its network connection, etc. To my mind advisory locking is entirely separate to, and does not replace, proper locking or versioning during transactions, and the goal is to minimize conflicts where those conflicts would result in user annoyance or confusion. The alternative is to use an appserver for inter-client messaging to essentially the same effect, or to use a database table for non-waitable advisory locking (and lots of MVCC bloat). > > I'm not quite sure I follow what your concern is with the > transaction. If I understand correctly you are concerned that with a > transaction rolling back and then a subsequent update not updating > correctly as it is updating with a stale version that was cached? If > you use the @Version annotation, this will resolve that issue, while > still providing you with full ACID behaviour, but in a highly > concurrent environment, you may need to handle a number of > OptimisticLockException's in the client, which is usually better than > handling longer transaction times per client. Using either Optimistic > or Pessimistic locking though will not stop you from getting race > conditions and deadlocks if you have not designed the application > well. But the again, even the best developers run into them from time > to time. Additionally, with an ORM such as JPA, the entities you use > to populate your client UI are created outside of a transaction - at > least created by default with a read lock which is dropped once the > entities are created. It is only when you need to add a new record, > or update an existing one, that you enlist a transaction for that > particular event. So transactions in an Optimistic locking model > should have very short life spans. If you have a long running > transaction using pessimistic locking while waiting for user > interaction, then you probably need to reconsider your design. The above matches my understanding of the situation. I also agree with regards to avoiding long running transactions, and I'm seeking to make sure that I can safely use Pg's advisory locking specifically to avoid having to either use a long running transaction or tell a user something like "Your update could not be completed because somebody else has changed this record. The copy of the record on screen has been updated. Please check it, and if necessary re-apply any changes" because of a conflicting update detected by @Version style optimistic locking. > > In Java, unlike the C destroy(), the finalize() method is > non-deterministic, in that it only gets run when the GC cleans up the > object, which may be never. It is not a good idea to depend on it to > clean up your resources. Additionally, the finalize method is only > ever called once on an object, so if the object is GC'd and the > finalize method is called and in the finalize method it resurrects > itself, perhaps by passing a reference of itself to another object, > the finalize method will not be called again for that particular > object when it comes time again to dispose of it. Generally, you > avoid the finalize method, as there are not many examples that you > would depend on it for, other than for a last ditch best effort > attempt to clean up any resources you may not have otherwise been able > to. It is another coding style that is generally deemed an evil. That much I figured out. [Insert pathetic wailing about about lack of RAII and lexically scoped object lifetimes here]. It pretty much confirms what I was thinking - use an explicit close/cleanup that releases any external resources (like pg advisory locks). Then, in finalize(), check to make sure the object has been closed and yell loudly in the log if not before doing the cleanup, so that any missed closes can be found and fixed. -- Craig Ringer
I must admit that I wasn't even aware of the JPA lock, and it does appears to be very restricted. Shows how much attention I've paid to the API. When I have had to use locking through Hibernate, I have usually even bypassed Hibernate and passed the isolation level lock hint directly to the DB within the query as then I have finer control, but I have so far usually had the "luxury" of working on applications that are and forever will be tied to a particular DB, probably even fossilised to a particular version. But thanks for the link. You have to consider when using an ORM, why you would choose to do so, which is mainly to do much of the boilerplate coding for you. I don't buy into the argument that it gives you DB portability, as so far, when using an ORM I have found myself more tightly bound to a DB dialect than if I'd gone through JDBC directly. Using an ORM comes at a cost, and I'm not just talking about performance overheads, as it attempts to map object trees to relational data constructs and there is a pronounced impedance mismatch between the two. If you start out with a fresh DB and can design it with an ORM client in mind, single field surrogate keys on all tables, version fields on tables likely to have multiple users, choose data types that are only supported by the ORM and client language, don't mind feeling compromised when doing the DB design, then it fits beautifully. However, and this is where I begin to speak some heresy... If you access the DB through SP's or functions, don't even bother considering an ORM. Treat the DB as a service API. If you have an existing DB that is not conducive to modifying, that contains mostly natural keys, many of which are composite, then you can still use an ORM, but you will be wrestling with it all the way. If the DB tables have datatypes that are not standard in the JDBC or ORM, then you can make it work, but again you are potentially faced with additional effort above and beyond. Mileage will vary from project to project, but ORM's are no panacea, they simply serve a purpose for some situations, and for others you need to weigh up whether the cost is worth the returns. So back to your problem and off my soap box, perhaps an ORM is not necessarily the best fit for your solution? Regardless of what locking mechanism you choose, optimistic locking or pseudo-locking, at some point one user will be faced with stale data and you still have to reconcile that data for the user. Any attempt to check if the data is changed by someone else before any commit or to check if another user also has the same set of data with the potential to modify without a transactional lock would require some form of sentinel pattern and polling of that sentinel, whether that sentinel is the pg_advisory_lock, a flag in a dedicated table, or whatever else you choose. And you are still faced it how to handle data that has changed for the current user. Is there really a business case to set up a polling of your DB by all of your DB clients, just for some slight usability gains? Rhetorical question. For me to entertain such an approach, I would personally want a pretty strong case for it. Considering your enthusiasm of RAII, I bet you will be pleased with the planned release of C++0x with the inclusion of smart pointers. Completely off topic and at the risk of being chastised in this forum, you may be interested in this interview with Bjarne Stroustoup: http://www.ddj.com/cpp/207000124;jsessionid=4D4OJVJ3EPMOCQSNDLPSKH0CJUNN2JVN?_requestid=437406 Sadly, according to Stroustoup, C++0x will most likely be C++10, not that I have used a ~ for many years. Cheers, Andy
OK, I think I now see how to get the ORM layer's version-based optimistic locking to play well with existing apps using the DB, and how to effectively use Pg's advisory locking. Here's what I'm thinking in case it's ever of use to anybody: - Add version columns for ORM tools to use, setting defaults for apps unaware of the versioning so version columns get sensible initial values. - Add a trigger that, on updates to tables with version columns, does: IF new.version = old.version AND new <> old THEN new.version = new.version + 1; END IF; ... to support other apps that don't know about the versioning and rely on transaction-level locking. The new <> old check is added in case an ORM tool assumes that if it UPDATEs a row without incrementing the version or changing anything (say, to use the side effect that the UPDATE obtains a lock on the row) then the row version won't change. Without that an ORM tool might try to update the row later and think it's conflicting with another update, even though the conflicting update was its own. - Where the UI can be significantly improved by doing so use advisory locking to suppliment (not replace) the existing handling of conflicting updates, minimising the need for users to re-enter changes. Assume advisory locking is a best effort service that may fail (either due to my bad coding resulting in a lock not being taken, another app not knowing about the lock, the ORM layer switching connections behind my back, etc) and always be prepared to handle update conflicts. Does that sound sane to folks here? In particular, does the behaviour of the version column trigger sound sensible? More comments interleaved below. Andrew wrote: > Using an ORM comes at a cost, and > I'm not just talking about performance overheads, as it attempts to map > object trees to relational data constructs and there is a pronounced > impedance mismatch between the two. If you start out with a fresh DB > and can design it with an ORM client in mind, single field surrogate > keys on all tables, version fields on tables likely to have multiple > users, choose data types that are only supported by the ORM and client > language, don't mind feeling compromised when doing the DB design, then > it fits beautifully. I'm lucky enough that the DB design is already a reasonable fit for ORM use. The only real sticking point is the need for version columns because ORM systems want to do cross-transaction locking/versioning . I'd probably land up with them eventually anyway. Everything has a surrogate key, because almost all natural keys have issues in this DB. Everything's highly normalised. Most types are mapped to native Java types automatically by the PostgreSQL JDBC, and the few that aren't are just domains that're easily handled by the ORM once it's told what the base type is. > However, and this is where I begin to speak some > heresy... If you access the DB through SP's or functions, don't even > bother considering an ORM. Treat the DB as a service API. Sounds sensible to me. The ORM layers seem to have a limited facility for SPs that's good enough for in-DB queries that return normaltable%rowtype, but beyond that it doesn't look like much fun. I'd want to use something like Spring-JDBC instead. > So back to your problem and off my soap box, perhaps an ORM is not > necessarily the best fit for your solution? For most of the job an ORM is actually a very good fit, at least conceptually. The amount of repetitive code required will be significantly reduced by the ability to map a DB record to a Java object, then bind members/properties of the object to GUI elements in a form. After all, significant parts of this app aren't much more sophisticated than what you can do with MS access (if you don't mind the licensing fees, weird quirks, and total lack of portability). The tricky bits will probably bypass the ORM and do direct SQL, but that'll be touching largely separate parts of the DB so there won't be too many ORM cache issues. > Regardless of what locking mechanism you choose, optimistic locking or > pseudo-locking, at some point one user will be faced with stale data and > you still have to reconcile that data for the user. Any attempt to > check if the data is changed by someone else before any commit or to > check if another user also has the same set of data with the potential > to modify without a transactional lock would require some form of > sentinel pattern and polling of that sentinel, whether that sentinel is > the pg_advisory_lock, a flag in a dedicated table, or whatever else you > choose. And you are still faced it how to handle data that has changed > for the current user. Is there really a business case to set up a > polling of your DB by all of your DB clients, just for some slight > usability gains? First: thankfully polling is not required. One of the nice things about pg's advisory locks is the ability for the client to block on a lock. If the app wants to wait on an advisory lock all it has to do is attempt to acquire the lock; the thread will block until a result is returned (ie the lock is obtained). If the user gets bored of waiting the UI thread can just signal the blocked DB thread to abort the query and get on with something else. Of course, even with advisory locking it's always *possible* that someone else might sneak in an update. They might not check the lock. It might be another app that doesn't know about the advisory locks. Etc. So it's always necessary to use opportunistic or pessimistic transaction level locking/versioning as well, to ensure there are no update collisions, and it's necessary to be able to handle failed updates. However, I DO consider it *well* worth the small cost to improve the user "experience" by minimising the risk of failed updates in areas where it must directly affect the user. > Considering your enthusiasm of RAII, I bet you will be pleased with the > planned release of C++0x with the inclusion of smart pointers. For a moment I thought you were going to say "In the next version of Java, there will be ..." *sob* I'm already using standard c++ shared_ptr through std::tr1 . It's trivial to support std::tr1::shared_ptr for modern compilers and fall back to boost::shared_ptr for older compilers. However I often prefer to just follow strict ownership rules, use std::auto_ptr, etc. -- Craig Ringer
> -First: thankfully polling is not required. One of the nice things about pg's advisory locks is > the ability for the client to block on a lock. If the app wants to wait on an advisory lock all it > has to do is attempt to acquire the lock; the thread will block until a result is returned (ie the > lock is obtained). > If the user gets bored of waiting the UI thread can just signal the blocked DB thread to abort > the query and get on with something else. Is this mechanism supported by the PostgreSQL JDBC driver? Thanks Peter On Sun, Apr 6, 2008 at 10:10 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > OK, I think I now see how to get the ORM layer's version-based optimistic > locking to play well with existing apps using the DB, and how to effectively > use Pg's advisory locking. Here's what I'm thinking in case it's ever of use > to anybody: > > - Add version columns for ORM tools to use, setting defaults for apps > unaware of the versioning so version columns get sensible initial values. > > - Add a trigger that, on updates to tables with version columns, does: > > IF new.version = old.version AND new <> old THEN > new.version = new.version + 1; > END IF; > > ... to support other apps that don't know about the versioning and rely on > transaction-level locking. The new <> old check is added in case an ORM tool > assumes that if it UPDATEs a row without incrementing the version or > changing anything (say, to use the side effect that the UPDATE obtains a > lock on the row) then the row version won't change. Without that an ORM tool > might try to update the row later and think it's conflicting with another > update, even though the conflicting update was its own. > > - Where the UI can be significantly improved by doing so use advisory > locking to suppliment (not replace) the existing handling of conflicting > updates, minimising the need for users to re-enter changes. Assume advisory > locking is a best effort service that may fail (either due to my bad coding > resulting in a lock not being taken, another app not knowing about the lock, > the ORM layer switching connections behind my back, etc) and always be > prepared to handle update conflicts. > > Does that sound sane to folks here? In particular, does the behaviour of > the version column trigger sound sensible? > > More comments interleaved below. > > > Andrew wrote: > > > > Using an ORM comes at a cost, and I'm not just talking about performance > overheads, as it attempts to map object trees to relational data constructs > and there is a pronounced impedance mismatch between the two. If you start > out with a fresh DB and can design it with an ORM client in mind, single > field surrogate keys on all tables, version fields on tables likely to have > multiple users, choose data types that are only supported by the ORM and > client language, don't mind feeling compromised when doing the DB design, > then it fits beautifully. > > > > I'm lucky enough that the DB design is already a reasonable fit for ORM > use. The only real sticking point is the need for version columns because > ORM systems want to do cross-transaction locking/versioning . I'd probably > land up with them eventually anyway. > > Everything has a surrogate key, because almost all natural keys have issues > in this DB. Everything's highly normalised. Most types are mapped to native > Java types automatically by the PostgreSQL JDBC, and the few that aren't are > just domains that're easily handled by the ORM once it's told what the base > type is. > > > > > However, and this is where I begin to speak some heresy... If you access > the DB through SP's or functions, don't even bother considering an ORM. > Treat the DB as a service API. > > > > Sounds sensible to me. The ORM layers seem to have a limited facility for > SPs that's good enough for in-DB queries that return normaltable%rowtype, > but beyond that it doesn't look like much fun. I'd want to use something > like Spring-JDBC instead. > > > > > So back to your problem and off my soap box, perhaps an ORM is not > necessarily the best fit for your solution? > > > > For most of the job an ORM is actually a very good fit, at least > conceptually. The amount of repetitive code required will be significantly > reduced by the ability to map a DB record to a Java object, then bind > members/properties of the object to GUI elements in a form. After all, > significant parts of this app aren't much more sophisticated than what you > can do with MS access (if you don't mind the licensing fees, weird quirks, > and total lack of portability). > > The tricky bits will probably bypass the ORM and do direct SQL, but that'll > be touching largely separate parts of the DB so there won't be too many ORM > cache issues. > > > > > Regardless of what locking mechanism you choose, optimistic locking or > pseudo-locking, at some point one user will be faced with stale data and you > still have to reconcile that data for the user. Any attempt to check if the > data is changed by someone else before any commit or to check if another > user also has the same set of data with the potential to modify without a > transactional lock would require some form of sentinel pattern and polling > of that sentinel, whether that sentinel is the pg_advisory_lock, a flag in a > dedicated table, or whatever else you choose. And you are still faced it > how to handle data that has changed for the current user. Is there really a > business case to set up a polling of your DB by all of your DB clients, just > for some slight usability gains? > > > > First: thankfully polling is not required. One of the nice things about > pg's advisory locks is the ability for the client to block on a lock. If the > app wants to wait on an advisory lock all it has to do is attempt to acquire > the lock; the thread will block until a result is returned (ie the lock is > obtained). > > If the user gets bored of waiting the UI thread can just signal the blocked > DB thread to abort the query and get on with something else. > > Of course, even with advisory locking it's always *possible* that someone > else might sneak in an update. They might not check the lock. It might be > another app that doesn't know about the advisory locks. Etc. So it's always > necessary to use opportunistic or pessimistic transaction level > locking/versioning as well, to ensure there are no update collisions, and > it's necessary to be able to handle failed updates. > > However, I DO consider it *well* worth the small cost to improve the user > "experience" by minimising the risk of failed updates in areas where it must > directly affect the user. > > > > > Considering your enthusiasm of RAII, I bet you will be pleased with the > planned release of C++0x with the inclusion of smart pointers. > > > > For a moment I thought you were going to say "In the next version of Java, > there will be ..." > > *sob* > > I'm already using standard c++ shared_ptr through std::tr1 . It's trivial > to support std::tr1::shared_ptr for modern compilers and fall back to > boost::shared_ptr for older compilers. However I often prefer to just follow > strict ownership rules, use std::auto_ptr, etc. > > -- > Craig Ringer > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc >
See responses below: > - Add a trigger that, on updates to tables with version columns, does: > > IF new.version = old.version AND new <> old THEN > new.version = new.version + 1; > END IF; > > ... to support other apps that don't know about the versioning and > rely on transaction-level locking. The new <> old check is added in > case an ORM tool assumes that if it UPDATEs a row without incrementing > the version or changing anything (say, to use the side effect that the > UPDATE obtains a lock on the row) then the row version won't change. > Without that an ORM tool might try to update the row later and think > it's conflicting with another update, even though the conflicting > update was its own. I've seen a number of blogs suggesting similar trigger based approaches so it will possibly work, I've just never gone down that path. You may want to also check Hibernates documentation, which discourages changes to the version number outside of Hibernate: http://www.hibernate.org/hib_docs/annotations/reference/en/html_single/#entity-mapping-entity-version "The application must not alter the version number set up by Hibernate in any way. To artificially increase the version number, check in Hibernate Entity Manager's reference documentation LockMode.WRITE" http://www.hibernate.org/hib_docs/entitymanager/reference/en/html_single/#d0e1162 "LockMode.WRITE prevents dirty-reads and non repeatable read on a given entity and force an increase of the version number if any." > First: thankfully polling is not required. One of the nice things > about pg's advisory locks is the ability for the client to block on a > lock. If the app wants to wait on an advisory lock all it has to do is > attempt to acquire the lock; the thread will block until a result is > returned (ie the lock is obtained). > > If the user gets bored of waiting the UI thread can just signal the > blocked DB thread to abort the query and get on with something else. > > Of course, even with advisory locking it's always *possible* that > someone else might sneak in an update. They might not check the lock. > It might be another app that doesn't know about the advisory locks. > Etc. So it's always necessary to use opportunistic or pessimistic > transaction level locking/versioning as well, to ensure there are no > update collisions, and it's necessary to be able to handle failed > updates. So back to pessimistic locking which I believe we both had agreed was not a desirable behaviour, particularly in a user interaction model? I apologise but I feel that I am not following your logic, so you may have a valid approach, but I'm missing something and so to me it does not sound appropriate. I guess if you have multiple clients in multiple technologies directly accessing the DB, then your situation is immediately a lot more complex, and pessimistic locking may be your only solution to ensure ACID transactions. The ideal situation would be if the DB was only accessed through a single shared access layer by all clients using a multi-tiered decoupled design, though I acknowledge that is not always possible, particularly with many thick client architectures that are tightly coupled two tiered designs at best. I can understand how you would release the advisory lock by running another SQL statement in another thread to do so, but I do not know if that is of much use to you. As far as I know you can't tell a running JDBC query to abort or cancel a running query. Such calls, either at the JDBC level or at the JPA level are blocking calls for the client and about the only influence you have on it is by setting the query time out before executing (which has its own set of issues in the postgresql driver (http://www.nabble.com/Implementing-setQueryTimeout()-td15537669.html)). Of course the UI client can always manage calls to its access layer via another thread and kill that thread if the end user becomes impatient and not want to wait. But from my understanding, that query with the back end database process will still be running and the DB connection will still be in use, including all of the associated locks and DB threads. So a subsequent attempt on a long running query will also block. However it the queries are all short running but are part of a long running transaction, then you can rollback the transaction at any point, but any other calls in a separate transaction dependent on those locks held by the running transaction will result in the aforementioned blocked call. You have to remember that at the ORM level (which is just a wrapper to the JDBC), or at the JDBC level, you do not have fine grain control of what is happening in the back end, and in my mind you should be looking at what JDBC provides, not what the underlying DB may be able to provide, and in using pg advisory locks, you are mixing the two. And obviously I'm missing the main point of your argument, as I still do not know how using pg advisory locks mixed with transaction locks will improve your end users experience. Also I think I'm getting a little confused, as previously you had stated: > At least in a desktop app it's sometimes desirable to inform a user > that (for example) somebody else is currently modifying a customer > record, so they can't edit it right now, then possibly offer to wait. > In this environment that's significantly preferable in user interface > terms to telling the user "Whoops, can't save your changes, somebody > else beat you to it", reloading the data in the UI and getting them to > make their changes again. and then in the last mail the reason was: > However, I DO consider it *well* worth the small cost to improve the > user "experience" by minimising the risk of failed updates in areas > where it must directly affect the user. For me, your initial rational can only be addressed with some form of polling on some lock or pseudo lock, whether automated or manual, particularly if you give the user the option to wait. The second rational is simply around ACID transactions which a well designed architecture and optimistic locking with versioning can provide, or worst case a pessimistic locking strategy at the cost of scalability, but either way without a need for the pg advisory locks. I'm sure you have valid scenarios in mind that justify your desired approach, otherwise you would not feel so strongly, but I am having some trouble working out such a scenario. So I think that may be my failing, and is probably not important anyway from the perspective of responding to specific technical aspects of your question. Cheers, Andy
Peter Kovacs wrote: >> -First: thankfully polling is not required. One of the nice things about pg's advisory locks is >> the ability for the client to block on a lock. If the app wants to wait on an advisory lock all it >> has to do is attempt to acquire the lock; the thread will block until a result is returned (ie the >> lock is obtained). > >> If the user gets bored of waiting the UI thread can just signal the blocked DB thread to abort >> the query and get on with something else. > > Is this mechanism supported by the PostgreSQL JDBC driver? Good question. The answer is yes, as shown by the attached test class, but it's definitely a good idea to test rather than assume these things. The test should be run while some other job, like a psql interactive session, holds pg_advisory_lock(1). You'd need to update the connection string to use it. Canceling the query works fine, at least on 8.3 with the JDBC4 8.3 driver. The JDBC driver handles the cancel() call by establishing a new connection and issuing a cancel query request with it. Of course, I can't guarantee that the JDBC driver isn't polling internally, but it's not hitting the database to do it so for my use I don't much care. According to wireshark there's no traffic on port 5432 during the 10 second wait interval. I'd be surprised if it wasn't actually blocked on the network read, though I haven't gone digging to verify this. -- Craig Ringer import java.sql.SQLException; import java.sql.CallableStatement; import java.sql.Connection; import java.sql.DriverManager; public class JdbcCancelTest { Connection c; CallableStatement st; static final String PG_ERR_QUERY_CANCELLED = "57014"; class Waiter extends Thread { @Override public void run() { try { st = c.prepareCall("SELECT pg_advisory_lock(1);"); System.out.println("[Waiter] Sending query..."); st.execute(); System.out.println("[Waiter] Query finished OK"); } catch (SQLException ex) { if (ex.getSQLState().equals(PG_ERR_QUERY_CANCELLED)) { System.out.println("[Waiter] Query cancelled successfully"); } else { System.out.println("[Waiter] DB error string: " + ex.getMessage() + " (SQLState: " + ex.getSQLState()+ ")"); } } } } class Killer extends Thread { @Override public void run() { try { System.out.println("[Killer] Cancelling query..."); st.cancel(); System.out.println("[Killer] Query cancel sent"); } catch (SQLException ex) { System.out.println("[Killer] Unexpected exception: " + ex); } } } void runtest() throws InterruptedException { try { c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/DBNAME?user=USERNAME&password=PASSWORD"); Waiter w = new Waiter(); Killer k = new Killer(); w.start(); Thread.sleep(10000); k.start(); } catch ( SQLException ex ) { System.out.println(ex); } } public static void main(String[] args) throws ClassNotFoundException, InterruptedException { Class.forName("org.postgresql.Driver"); JdbcCancelTest t = new JdbcCancelTest(); t.runtest(); } }
I'm overlooking the cancellation of the query on a separate thread with the JDBC cancel() method. Though that method always seems to have issues associated with it across various drivers. But still, for what you are talking about, I think a long running transaction is still going to be your biggest problem rather than a long running query. > I can understand how you would release the advisory lock by running > another SQL statement in another thread to do so, but I do not know if > that is of much use to you. As far as I know you can't tell a running > JDBC query to abort or cancel a running query. Such calls, either at > the JDBC level or at the JPA level are blocking calls for the client > and about the only influence you have on it is by setting the query > time out before executing (which has its own set of issues in the > postgresql driver > (http://www.nabble.com/Implementing-setQueryTimeout()-td15537669.html)). > Of course the UI client can always manage calls to its access layer > via another thread and kill that thread if the end user becomes > impatient and not want to wait. But from my understanding, that query > with the back end database process will still be running and the DB > connection will still be in use, including all of the associated locks > and DB threads. So a subsequent attempt on a long running query will > also block. However it the queries are all short running but are part > of a long running transaction, then you can rollback the transaction > at any point, but any other calls in a separate transaction dependent > on those locks held by the running transaction will result in the > aforementioned blocked call. You have to remember that at the ORM > level (which is just a wrapper to the JDBC), or at the JDBC level, you > do not have fine grain control of what is happening in the back end, > and in my mind you should be looking at what JDBC provides, not what > the underlying DB may be able to provide, and in using pg advisory > locks, you are mixing the two. Cheers, Andy
I'm sorry if I've been communicating all this poorly. Even if I've been muddling you and convincing you that I'm quite crazy, it's been a helpful conversation for me ;-) . I hope my explanations below make a bit more sense than the prior ones. Andrew wrote: >> Of course, even with advisory locking it's always *possible* that >> someone else might sneak in an update. They might not check the lock. >> It might be another app that doesn't know about the advisory locks. >> Etc. So it's always necessary to use opportunistic or pessimistic >> transaction level locking/versioning as well, to ensure there are no >> update collisions, and it's necessary to be able to handle failed >> updates. > > So back to pessimistic locking which I believe we both had agreed was > not a desirable behaviour, particularly in a user interaction model? The reference to pessimistic locking was a bit of a red herring. What I meant was simply "If advisory locking is being used, it is still necessary to rely on normal locking methods to ensure that conflicting updates are detected or prevented". In other words, advisory locking in this situation is used purely to help the user out, giving them advance warning about potential concurrent updates and letting them avoid wasting time and effort. It does not replace normal methods for ensuring data consistency and integrity, though it does interact with optimistic locking solely in that it's intended to reduce or eliminate user-visible conflicts. Advisory locking can be used to detect and avoid situations that would otherwise result in an optimistic locking conflict requiring user action like data reentry. It can't (or IMO shouldn't) replace optimistic locking. An advisory lock allows information about object "busy-ness" to be communicated among clients outside a transactional scope. This means that conflicts can be detected and prevented before they happen, instead of waiting until all the work is done and catching an optimistic locking conflict. Often we do not care about a conflict, as the operation can be re-done behind the user's back without them knowing or caring. Incrementing some value, for example. In other cases, there might be time consuming or difficult-to-repeat changes being made that the program cannot simply retry without asking the user how to handle it. Changing a notes field on a customer record, for example. In those cases, it's much nicer to catch the conflict BEFORE the user does all the work. That's where advisory locking is useful. (Yes, I know a notes/history field can just be broken down into multiple entries in a separate table. It's an *example* because the main areas of concern in the real system would require way too much background). In a system designed for advanced users I'd quite possibly just let both users proceed and tell the user who lost the race to redo their changes and try again. I might even provide a "merge/diff" style summary of the conflicts between their changes and the winning change. This is, however, not a system for advanced users, and for them it's a *lot* nicer to be told somebody else is editing something complicated and be able to wait. I wouldn't expose a normal user working with a content management system to a merge conflict, source revision control system style. I don't know what your users are like, but mine have a hard time saving email attachments let alone interpreting change conflict summaries. I'd prefer to tell them "This item is being edited by another user. You can try again later, or wait for them to finish.". The situations I'm facing in this applicatoin are similar to that problem, with similar UI implications. Additionally, in this situation almost all updates to the areas where advisory locking will be useful will be done through the tool I'm now working on, which will honour the advisory locks. There will rarely, if ever, be updates to this data from DB clients that don't know/care about the advisory lock, so users will rarely if ever get an error due to a conflicting update if I use advisory locking to help avoid it happening. The advisory locking can even be done from a separate JDBC connection, avoiding any interaction at all with the ORM tools. It's essentially the same as using a middle layer solely for inter-client messaging, but without the need for the middle layer. That's one less thing to break, to my mind. In most areas advisory locking is neither necessary nor useful. It's only some particular situations, similar to the examples I've given, that it's likely to be of use to improve the way multi-user interaction works. In all other areas using optimistic locking alone should be fine. As for mixed access by the existing apps and the new tools, I'm now pretty sure the trigger-based versioning approach will be fine. I'm going to do a fair bit of testing of course, but it all looks quite reasonable. It'll just ensure that existing apps that aren't aware of the row versioning still increment the version, and thus play well with the ORM tools. The trigger will never change the effect of a statement run by the ORM tools (because they always increment the version themselves), and is invisible to them. I'll drop a short note here when that testing is done, as I imagine it might be of interest to others. > The ideal situation would be if > the DB was only accessed through a single shared access layer by all > clients using a multi-tiered decoupled design, though I acknowledge that > is not always possible, particularly with many thick client > architectures that are tightly coupled two tiered designs at best. I see a lot of this, and I'm still not sure I understand the thinking behind it. The DB is a shared access system. It's designed to handle concurrency, and does a very good job of it. Adding another layer to hide the DB seems to be adding unnecessary complexity unless it can provide something extra like preventing / warning about concurrent changes to objects outside the scope of DB transactions. However, that's handled just as well by row versioning without adding another layer to the system. Unless the additional layer implements the "business logic", using the DB as a dumb storage engine, I just can't see much of a rationale for its existence. The proper division of work between app and DB is of course along standing philosophical argument that won't be resolved in a hurry. Ditto the utility/necessity of "n-tier" application design. All I can say is that in this specific case the DB already handles the business logic, so the midlayer would be superfluous. Furthermore, the additional layer will either be tied to a particular technology (eg Java RPC) or rely on clumsy schemes like XML-RPC for communication. I can see the point for really big and complex systems that mix many different data stores. For smaller stuff where the single DB is already the center of the system and doing a good job there I don't see the point. It strikes me as being "Enterprise-y" for the sake of it. > I can understand how you would release the advisory lock by running > another SQL statement in another thread to do so, but I do not know if > that is of much use to you. I assume you meant cancelling an attempted acquisition of the advisory lock, rather than releasing the lock? Releasing it so it can be acquired by another connection is just another ordinary SELECT pg_advisory_unlock(blah). Or closing the JDBC connection, of course. If there's an advisory lock acquisition attempt blocked, and the user wants to give up, I can just cancel the statement trying to take out the advisory lock using the JDBC Statement.cancel() method - see the attached code in my other reply. That's something that's easily triggered by user interaction in the UI thread, and involves very little fuss. It might not work well with some other JDBC drivers. I've seen reports of cancel() not doing anything with MySQL's JDBC for example. For this app that's a non-issue though, it's tied to Pg anyway. > But from my understanding, that query with the > back end database process will still be running and the DB connection > will still be in use, including all of the associated locks and DB > threads. So a subsequent attempt on a long running query will also > block. Not if you use the JDBC to cancel the query. There's no guarantee that a big, complex query will be terminated immediately by the backend, but for something simple like an attempt to acquire an advisory lock cancelling is immediate and quite clean, at least according to my testing. > You have to remember that at the ORM level (which is just > a wrapper to the JDBC), or at the JDBC level, you do not have fine grain > control of what is happening in the back end, and in my mind you should > be looking at what JDBC provides, not what the underlying DB may be able > to provide, and in using pg advisory locks, you are mixing the two. It's true that using advisory locking would involve mixing two locking models. That could get very ugly if they had to interact in any sort of complex way - say, if there were stored procedures or triggers that used advisory locking and the ordinary transaction-level locking done by Pg. To me, that says "avoid mixing advisory locking operations in with other operations that might take normal transaction-level locks" rather than "don't use advisory locking". In this case the advisory locking can even be done from a separate JDBC connection, avoiding any interaction at all with the ORM tools. Locks can be taken out, tested, and released only before any work is done or after all work is committed. It's essentially the same as using a middle layer solely for inter-client messaging, but without the need for the middle layer. -- Craig Ringer
> I'll drop a short note here when that testing is done, as I imagine it > might be of interest to others. I've been unable to uncover any issues with concurrent updates being made by both Hibernate and plain 'ol psql to the following schema: CREATE OR REPLACE FUNCTION zz_field_version() RETURNS trigger AS $$ BEGIN IF tg_op = 'UPDATE' AND new.oplock = old.oplock AND ROW(new.*) IS DISTINCT FROM ROW (old.*) THEN -- Row is being updated by an application that does not know -- about row versioning. It's changed data in the row, but -- hasn't incremented the version. We'll do that for it. new.oplock := new.oplock + 1; END IF; RETURN new; END; $$ LANGUAGE 'plpgsql'; COMMENT ON FUNCTION zz_field_version() IS 'Increments the record version if a row is changed by an update and its version was not incremented by the UPDATE issuer. Intended to allow ORM optimistic locking to coexist with normal DB transactional locking. Only suitable for use as a trigger on UPDATE.'; CREATE TABLE ver ( id SERIAL PRIMARY KEY, counter INTEGER NOT NULL, oplock INTEGER NOT NULL DEFAULT(0) ); COMMENT ON TABLE ver IS 'Test table for versioning'; CREATE TRIGGER ver_ensure_version_updated BEFORE UPDATE ON ver FOR EACH ROW EXECUTE PROCEDURE zz_field_version(); Two concurrent updaters that ignore the oplock column completely, like two copies of: #!/bin/bash for n in `seq 1 1000`; do psql craig -c 'update ver set counter = counter + 1' done can run quite happily along with an instance or two of the Hibernate test case. The hibernate test case, without all the tracing code and more detailed error handling, boils down to: // Assume pre-declared & set up EntityManager instance `em' // and EntityTransaction tx , as well as appropriately mapped class // Ver with an @Version annotation on the `oplock' field: // int ver_id = 0; /* Pre-created record with id 0, counter 0 */ int successfulUpdates = 0; while (successfulUpdates < 1000) { try { Ver v = em.find(Ver.class, ); v.setCounter( v.getCounter() + 1 ); tx.commit(); successfulUpdates ++; } catch (javax.persistence.RollbackException exc) { // Retry } } The value of `counter' for the test record in `ver' always ends up with the total number of updates done by psql plus `successfulUpdates' * the number of hibernate test instances. So, if there are 2 shell updaters and 2 hibernate updaters running the counter always increases by exactly 4000 . In other words, no updates are getting lost. If a counter is added to report retried updates in the hibernate tests it becomes clear that there *are* conflicts being detected and retries being performed, even if only one hibernate test is running along with one or more psql loop tests. The trigger is doing its job. A version that explicitly increments the version with em.lock(v, LockMode.WRITE) and restarts the transaction if lock acquisition fails also works fine when several copies are run along with several looping manual updates. That's hardly conclusive proof that nothing can go wrong, but it seems to be pretty solid, and I can't *see* any way it can break. Well, the trigger needs to handle version overflow, but other than that fairly minor issue it looks OK. Think I'm missing anything? Being able to use a trigger to let existing apps coexist with the ORM row versioning will make things a *lot* easier. Doubly so if I can get rid of the relatively expensive row comparison, but I'm not sure that's safe with Hibernate-specific lock modes like LockMode.UPGRADE yet. -- Craig Ringer