Thread: Issue NOTICE for attempt to raise lock level?
I am working on eliminating the "relation NNN modified while in use" misfeature by instead grabbing a lock on each relation at first use in a statement, and holding that lock till end of transaction. The main trick here is to make sure that the first lock grabbed is adequate --- for example, it won't do to grab AccessShareLock and then have to raise that to AccessExclusiveLock, because there will be a deadlock if two backends do this concurrently. To help debug this, I'm planning to add a little bit of code to the lock manager that detects a request for a lock on an object on which we already hold a lock of a lower level. What I'm wondering about is whether to make the report be elog(DEBUG) --- ie, send to postmaster log only --- or elog(NOTICE), so that users would see it by default. A NOTICE might be useful to users since it would complain about deadlock-prone user-level coding practices too, such as begin;select * from foo; -- grabs read locklock table foo; -- grabs exclusive lock However, it might not be *very* useful, because the lock manager isn't in a position to issue a message that's much more intelligible than this: NOTICE: Deadlock risk: raising lock level from 1 to 4 on object 85372/5732 (The lock level could be printed symbolically, but I doubt that very much can be done with the object identifier --- it's not safe for the lock manager to try to resolve relation OIDs to names, for example.) Right now I'm thinking that this sort of notice would just create more confusion than enlightenment for most users, so I'm inclined to make it a DEBUG message. But that's a judgment call, so I thought I'd throw the issue out for discussion. Any contrary opinions? regards, tom lane
Tom Lane writes: > To help debug this, I'm planning to add a little bit of code to the > lock manager that detects a request for a lock on an object on which > we already hold a lock of a lower level. What I'm wondering about is > whether to make the report be elog(DEBUG) --- ie, send to postmaster > log only --- or elog(NOTICE), so that users would see it by default. To me this seems to be a little like the much-disputed notice for adding implicit range-table entries: Either it's an error, then you abort, or it's legal, then you leave the user alone and perhaps explain failure scenarios in the documentation. At least until we have something like a user-configurable warning level. elog(DEBUG) might be okay, but only with a positive DebugLvl, IMHO. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
> -----Original Message----- > From: Tom Lane > Sent: Wednesday, November 08, 2000 1:26 AM > To: pgsql-hackers@postgreSQL.org > Subject: [HACKERS] Issue NOTICE for attempt to raise lock level? > > > I am working on eliminating the "relation NNN modified while in use" > misfeature by instead grabbing a lock on each relation at first use > in a statement, and holding that lock till end of transaction. Isn't "relation NNN modified while in use" itself coming from heap_ open(r) 's LockRelation_after_allocate sequence ? Or from a rd_refcnt leak,of cource. I'm thinking that RelationCacheInvalidate() should ignore relations which are while in use. IMHO allocate_after_lock sequence is needed for heap_open(r). > The > main trick here is to make sure that the first lock grabbed is adequate > --- for example, it won't do to grab AccessShareLock and then have to > raise that to AccessExclusiveLock, because there will be a deadlock if > two backends do this concurrently. > I object to you if it also includes parse_rewrite_plan stage. If there's a long transation it would also hold a AccessShareLock on system tables for a long time. Then vacuum for system tables would be blocked. Other transactions would be blocked...... Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> I am working on eliminating the "relation NNN modified while in use" >> misfeature by instead grabbing a lock on each relation at first use >> in a statement, and holding that lock till end of transaction. > Isn't "relation NNN modified while in use" itself coming from heap_ > open(r) 's LockRelation_after_allocate sequence ? Right. I intend to eliminate that test entirely, and simply let the relcache update happen. With appropriate start-to-end locking, it shouldn't be possible for a schema update to sneak in at an unsafe point. The only reason that there is a "modified while in use" test at all is that I put it in awhile back as a stopgap solution until we did a better job with the end-to-end locking problem. The reports coming back on 7.0.* make it clear that the stopgap answer isn't good enough, so I want to fix it right for 7.1. > I'm thinking that RelationCacheInvalidate() should ignore relations > which are while in use. Won't work unless you somehow store an "update needed" flag to make the update happen later --- you can't just discard a shared-inval notification. And if you did that, you'd have synchronization issues to contend with. Since we use SnapshotNow for reading catalogs, catalog fetches may see data that is inconsistent with the current state of the relcache. Not good. Forcing the schema update to be held off in the first place seems the right answer. > I object to you if it also includes parse_rewrite_plan stage. > If there's a long transation it would also hold a AccessShareLock > on system tables for a long time. No, I'm going to leave locking of system catalogs as-is. This basically means that we don't support concurrent alteration of schemas for system tables. Seems like an OK tradeoff to me ... regards, tom lane
Tom Lane wrote: > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> I am working on eliminating the "relation NNN modified while in use" > >> misfeature by instead grabbing a lock on each relation at first use > >> in a statement, and holding that lock till end of transaction. > > > Isn't "relation NNN modified while in use" itself coming from heap_ > > open(r) 's LockRelation_after_allocate sequence ? > > Right. I intend to eliminate that test entirely, and simply let the > relcache update happen. With appropriate start-to-end locking, it > shouldn't be possible for a schema update to sneak in at an unsafe > point. > > The only reason that there is a "modified while in use" test at all > is that I put it in awhile back as a stopgap solution until we did > a better job with the end-to-end locking problem. The reports coming > back on 7.0.* make it clear that the stopgap answer isn't good enough, > so I want to fix it right for 7.1. > > > I'm thinking that RelationCacheInvalidate() should ignore relations > > which are while in use. > > Won't work unless you somehow store an "update needed" flag to make the > update happen later --- you can't just discard a shared-inval > notification. What I mean is to change heap_open(r) like LockRelationId(Name) -> shared-inval-handling -> allocate the relation descriptor and increment rd_refcnt This would ensure that relations with rd_refcnt > 0 acquire some lock. Could any shared-inval-noti fication arrive for such relations under the me- chanism ? However 'reset system cache' message could arrive at any time. I've examined the error 'recursive use of cache' for some time. It seems very difficult to avoid the error if we reconstruct relation descriptors whose rd_refcnt > 0 in RelationCacheInvalidate(). Comments ? Regards. Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > What I mean is to change heap_open(r) like > LockRelationId(Name) -> shared-inval-handling -> > allocate the relation descriptor and increment rd_refcnt > This would ensure that relations with rd_refcnt > 0 > acquire some lock. Could any shared-inval-noti > fication arrive for such relations under the me- > chanism ? Yes, because the system doesn't make any attempt to ensure that relcache entries are held open throughout a statement or transaction. (If they were, we largely wouldn't have a problem.) So we can't use relcache refcount going from 0 to 1 as the sole criterion for when to acquire a lock. I did look at using the relcache to control holding locks throughout statements, but it seems that it doesn't have enough information to grab the right kind of lock. For example, I had to modify the parser to ensure that the right kind of lock is grabbed on the initial relcache access, depending on whether the table involved is accessed for plain SELECT, SELECT FOR UPDATE, or INSERT/UPDATE/DELETE. I still have to make a similar change in the rewriter for table references that are added to a query by rewrite. The code that is doing this stuff knows full well that it is making the first reference to a table, and so the relcache doesn't really have anything to contribute. > However 'reset system cache' message > could arrive at any time. I've examined the error > 'recursive use of cache' for some time. It seems > very difficult to avoid the error if we reconstruct > relation descriptors whose rd_refcnt > 0 in > RelationCacheInvalidate(). I haven't had time to look at that yet, but one possible answer is just to disable the 'recursive use of cache' test. It's only a debugging sanity-check anyway, not essential functionality. regards, tom lane