Thread: Notice: heap_open/close changes committed
I have just committed a ton of changes to make heap_open/heap_openr take an additional argument specifying the kind of lock to grab on the relation (or you can say "NoLock" to get the old, no-lock behavior). Similarly, heap_close takes a new argument giving the kind of lock to release, or "NoLock" to release no lock. (If you don't release the lock you got at open time, it's implicitly held till transaction end.) This should go a long way towards fixing problems caused by a concurrent VACUUM moving tuples in system relations. I also fixed several bugs (first spotted by Hiroshi) having to do with not getting a sufficient lock on a relation being DROPped, ALTERed, etc. There may be more of those still lurking, though. There are a couple of coding rules that ought to be pointed out: 1. If you specify a lock type to heap_open/openr, then heap_open will elog(ERROR) if the relation cannot be found --- since it can't get the lock in that case, obviously. You must use NoLock (and then lock later, if appropriate) if you want to be able to recover from no-such-relation. This allowed extra code to test for no-such-rel to be removed from many call sites. There were a lot of other call sites that neglected to test for failure return at all, so they're now a little more robust. 2. I made most opens of system relations grab AccessShareLock if read-only, or RowExclusiveLock if read-write, on the theory that these accesses correspond to an ordinary search or update of a user relation. This maximizes concurrency of access to the system tables. It should be sufficient to have AccessExclusiveLock on the user relation being modified in order to do most things like dropping/modifying relations. Note however that we release these locks as soon as we close the system relations, whereas the lock on the underlying relation needs to be held till transaction commit to ensure that other users of the relation will see whatever you did. There may be a few cases where we need a stronger lock on system relations... 3. If you are doing a SearchSysCache (any flavor) to find a tuple that you intend to modify/delete, you must open and lock the containing relation *before* the search, not after. Otherwise you may retrieve a tuple containing a stale physical location pointer (because VACUUM has just moved it), which will cause the heap_replace or heap_delete to crash and burn. There were a few places that did the heap_open after getting the tuple. Naughty naughty. I did not change locking behavior for index_open/close --- for the most part we just acquire AccessShareLock on an index during index_beginscan. I am not sure if this is good or not. I did make index_open do elog(ERROR) on failure, because most call sites weren't checking for failure... These changes do *not* include Hiroshi's recent proposal to add time qual checking in SearchSysCache. I suspect he is right but would like confirmation from someone who actually understands the time qual code ;-) regards, tom lane PS: you should do "make clean all" after pulling these changes.
Tom Lane wrote: > > 2. I made most opens of system relations grab AccessShareLock if > read-only, or RowExclusiveLock if read-write, on the theory that ^^^^^^^^^^^^^^^^ > these accesses correspond to an ordinary search or update of a user > relation. This maximizes concurrency of access to the system tables. There are problems here. In the case of normal UPDATE/DELETE (when RowExclusiveLock is acquired) Executor takes care about the-same-row writers, but other parts of system don't check is tuple read being updated concurrent transaction or not. This is the old bug (pre-6.5.X released WRITE lock just after system table was modified). I had no time to fix it and so just changed old WRITE lock with new AccessExclusiveLock. But we have to handle this in proper way (wait if t_xmax is id of an active transaction). Vadim
Vadim Mikheev <vadim@krs.ru> writes: > Tom Lane wrote: >> 2. I made most opens of system relations grab AccessShareLock if >> read-only, or RowExclusiveLock if read-write, on the theory that > ^^^^^^^^^^^^^^^^ > There are problems here. In the case of normal UPDATE/DELETE > (when RowExclusiveLock is acquired) Executor takes care about > the-same-row writers, but other parts of system don't check > is tuple read being updated concurrent transaction or not. Drat. I was afraid I might be getting in over my head :-( > This is the old bug (pre-6.5.X released WRITE lock just after > system table was modified). I had no time to fix it and so > just changed old WRITE lock with new AccessExclusiveLock. I do not think changing RowExclusiveLock back to AccessExclusiveLock will fix it unless we hold the lock till end of transaction, no? That seems like much too high a price to pay. > But we have to handle this in proper way (wait if t_xmax > is id of an active transaction). Yes. Where is the code that does this right in the regular executor? I will see what needs to be done to make the system table accesses act the same. regards, tom lane
Tom Lane wrote: > > > This is the old bug (pre-6.5.X released WRITE lock just after > > system table was modified). I had no time to fix it and so > > just changed old WRITE lock with new AccessExclusiveLock. > > I do not think changing RowExclusiveLock back to AccessExclusiveLock > will fix it unless we hold the lock till end of transaction, no? Yes. > That seems like much too high a price to pay. That's why I proposed to use Exclusive lock (it doesn't conflict with AccessShareLock used by readers). > > But we have to handle this in proper way (wait if t_xmax > > is id of an active transaction). > > Yes. Where is the code that does this right in the regular executor? > I will see what needs to be done to make the system table accesses > act the same. Sorry - I messed things up: heap_replace/heap_delete wait for concurrent update, but doesn't update/delete modified tuple. They return result code (HeapTupleMayBeUpdated etc in utils/tqual.h) and it's up to caller decide what to do if tuple modified by concurrent transaction. For _updated_ tuple TID of new tuple version is also returned (if requested)... Examples of how this is handled/used by Executor are in execMain.c (just search for HeapTupleUpdated). Vadim