On Sun, Jan 29, 2012 at 5:01 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I can't see any way that situation can occur. The patch *explicitly*
> waits until all people that can see the index as usable have dropped
> their lock. So I don't think this is necessary. Having said that,
> since we are talking about the index and not the whole table, if I
> believe the above statement then I can't have any reasonable objection
> to doing as you suggest.
>
> Patch now locks index in AccessExclusiveLock in final stage of drop.
>
> v3 attached.
>
> If you have suggestions to improve grammar issues, they;re most
> welcome. Otherwise this seems good to go.
I improved the grammar issues in the attached version of the patch -
the syntax is now simpler and more consistent, IF EXISTS now works,
and RESTRICT is accepted (without changing the behavior) while CASCADE
fails with a nicer error message. I also fixed a bug in
RangeVarCallbackForDropRelation.
However, testing reveals that this doesn't really work. I found that
by playing around with a couple of concurrent sessions, I could get a
SELECT statement to hang waiting on the AccessExclusiveLock in the
second phase, because we're really still opening the relation (even
though it's invalid) and thus still locking it, and thus still waiting
for the AccessExclusiveLock. And I managed to get this:
rhaas=# select * from foo where a = 1;
ERROR: could not open relation with OID 16388
So then I tried changing the lock level back to
ShareUpdateExclusiveLock. It appears that that merely narrows the
window for errors of this type, rather than getting rid of them. I
ran this in one window:
pgbench -c 30 -j 30 -f f -n -T 180
where the file f contained this:
select * from foo where a = 1
and then in the other window I repeatedly did this:
rhaas=# create index foo_a on foo (a);
CREATE INDEX
rhaas=# drop index concurrently foo_a;
DROP INDEX
and pgbench started issuing messages like this:
Client 2 aborted in state 0: ERROR: could not open relation with OID 16394
Client 9 aborted in state 0: ERROR: could not open relation with OID 16397
Client 18 aborted in state 0: ERROR: could not open relation with OID 16397
My conclusion is that regardless of whether ShareUpdateExclusiveLock
or AccessExclusiveLock is used for the final phase of the drop, this
isn't safe. I haven't tracked down exactly where the wheels are
coming off, but the problem does not occur when using the non-current
form of DROP INDEX.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company