Thread: deadlock in REINDEX
I noticed a pretty obscure deadlock condition with REINDEX in CVS HEAD: client1: nconway=# create table a (b int unique, c int unique); CREATE TABLE nconway=# begin; BEGIN nconway=# lock table a in access exclusive mode; LOCK TABLE client2: nconway=# reindex index a_b_key; < blocks, waiting to acquire an access exclusive lock on the heap relation "a" > client1: nconway=# reindex table a; ERROR: deadlock detected Naturally, this situation is not a very common one. But it seems to me that the practice of acquiring locks in REINDEX in an inconsistent order is asking for trouble: REINDEX TABLE locks the heap rel first, followed by any indexes of the heap rel, but REINDEX INDEX locks the target index, followed by the heap rel. Hence a deadlock condition (the explicit lock table above just serves to make the window of opportunity much larger). This should be fixed, right? I was thinking of changing reindex_index() to acquire an AccessShareLock on the index in question, find its parent rel ID, release the lock, then acquire an AccessExclusiveLock on the parent rel, followed by an AccessExclusiveLock on the index in question. Comments? Cheers, Neil P.S. I noticed this because I was browsing through REINDEX, trying to see if it would be possible to allow it to acquire less exclusive locks... -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
Neil Conway <neilc@samurai.com> writes: > Naturally, this situation is not a very common one. But it seems to me > that the practice of acquiring locks in REINDEX in an inconsistent order > is asking for trouble: REINDEX TABLE locks the heap rel first, followed > by any indexes of the heap rel, but REINDEX INDEX locks the target > index, followed by the heap rel. Hence a deadlock condition (the > explicit lock table above just serves to make the window of opportunity > much larger). > This should be fixed, right? Only if the cure isn't worse than the disease. > I was thinking of changing reindex_index() to acquire an AccessShareLock > on the index in question, find its parent rel ID, release the lock, then > acquire an AccessExclusiveLock on the parent rel, followed by an > AccessExclusiveLock on the index in question. If you release the lock then I think you are opening yourself to worse troubles than this one, having to do with someone renaming/deleting the table and/or index out from under you. The fact that REINDEX INDEX might fail if there are concurrent conflicting operations doesn't bother me a whole lot; but not holding a lock throughout the operation does. AFAICS, REINDEX INDEX is only a disaster-recovery tool anyway, and so is not likely to be run in parallel with other operations. The scenarios I can think of where you might want to do REINDEX routinely would always use REINDEX TABLE, I should think. BTW, I imagine DROP INDEX has a similar issue, and CLUSTER might depending on what it locks first (but it would be easy to fix it to lock the table first, since it has both names). regards, tom lane
On Mon, 2003-02-17 at 18:39, Tom Lane wrote: > > I was thinking of changing reindex_index() to acquire an AccessShareLock > > on the index in question, find its parent rel ID, release the lock, then > > acquire an AccessExclusiveLock on the parent rel, followed by an > > AccessExclusiveLock on the index in question. > > If you release the lock then I think you are opening yourself to worse > troubles than this one, having to do with someone renaming/deleting the > table and/or index out from under you. Presumably, the renaming/deleting operation acquires an exclusive lock and then holds it until transaction commit, right? If so, then wouldn't we still be okay: the REINDEX would lock the index in access share mode, find the OID of the heap rel, unlock the index, lock the heap rel in access exclusive mode, then try to re-open & lock the index, find that it no longer exists and then elog(ERROR). Whether or not that solution actually works, ISTM there must be *some* method of locking that is free of deadlocks -- saying "oh well, it's not a common case anyway" doesn't strike me as being satisfactory :-\ Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
Neil Conway <neilc@samurai.com> writes: > On Mon, 2003-02-17 at 18:39, Tom Lane wrote: >> If you release the lock then I think you are opening yourself to worse >> troubles than this one, having to do with someone renaming/deleting the >> table and/or index out from under you. > Presumably, the renaming/deleting operation acquires an exclusive lock > and then holds it until transaction commit, right? If so, then wouldn't > we still be okay: the REINDEX would lock the index in access share mode, > find the OID of the heap rel, unlock the index, lock the heap rel in > access exclusive mode, then try to re-open & lock the index, find that > it no longer exists and then elog(ERROR). That approach might be deadlock-free, but that doesn't mean it is surprise-free. For example, if the other guy did an ALTER TABLE RENAME on the index, it'd be possible that what you are actually reindexing is now differently named than it was before (and, perhaps, there is now some other index that has the original name and is the one the user really meant). This is not so dangerous in the REINDEX case, maybe, but it could be unhappy-making in the DROP case. > Whether or not that solution actually works, ISTM there must be *some* > method of locking that is free of deadlocks -- saying "oh well, it's not > a common case anyway" doesn't strike me as being satisfactory :-\ Basically, I'm not convinced that a deadlock failure is so much worse than any other failure that we should open ourselves to other surprises in order to avoid a deadlock. It's a judgment call though. Any other comments out there? regards, tom lane
On Mon, 17 Feb 2003, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > On Mon, 2003-02-17 at 18:39, Tom Lane wrote: > >> If you release the lock then I think you are opening yourself to worse > >> troubles than this one, having to do with someone renaming/deleting the > >> table and/or index out from under you. > > > Presumably, the renaming/deleting operation acquires an exclusive lock > > and then holds it until transaction commit, right? If so, then wouldn't > > we still be okay: the REINDEX would lock the index in access share mode, > > find the OID of the heap rel, unlock the index, lock the heap rel in > > access exclusive mode, then try to re-open & lock the index, find that > > it no longer exists and then elog(ERROR). > > That approach might be deadlock-free, but that doesn't mean it is > surprise-free. For example, if the other guy did an ALTER TABLE RENAME Perhaps the change that needs to be made is: if(IsUnderPostmaster)elog(ERROR,"You cannot run REINDEX INDEX in multi-user mode"); to ReindexIndex() or some other appropriate place (with a better error message). Gavin
On Tue, 18 Feb 2003, Gavin Sherry wrote: GS>Perhaps the change that needs to be made is: GS> GS>if(IsUnderPostmaster) GS> elog(ERROR,"You cannot run REINDEX INDEX in multi-user mode"); GS> GS>to ReindexIndex() or some other appropriate place (with a better error GS>message). GS> GS>Gavin It is incorrect. In PostgreSQL REINDEX must be routing (PostgreSQL 7.3.1 Documentation 8.3. Routine Reindexing). -- Olleg Samoylov