Thread: deadlock while doing VACUUM and DROP

deadlock while doing VACUUM and DROP

From
Jan Urbański
Date:
I got this on HEAD while doing lots of CREATE -> COPY -> DROP on table.

(...)
DEBUG:  ProcessUtility
DEBUG:  drop auto-cascades to toast table pg_toast.pg_toast_16774
DEBUG:  drop auto-cascades to type pg_toast.pg_toast_16774
DEBUG:  drop auto-cascades to index pg_toast.pg_toast_16774_index
DEBUG:  drop auto-cascades to type public.text
DEBUG:  drop auto-cascades to type public.text[]
DEBUG:  StartTransaction
DEBUG:  name: unnamed; blockState:       DEFAULT; state: INPROGR, 
xid/subid/cid: 0/1/0, nestlvl: 1, children:
ERROR:  deadlock detected
DETAIL:  Process 25423 waits for AccessExclusiveLock on relation 16386 
of database 1; blocked by process 25428.        Process 25428 waits for AccessShareLock on relation 16390 of 
database 1; blocked by process 25423.        Process 25423: drop table manuale ;        Process 25428: autovacuum:
VACUUMANALYZE public.manuale
 
HINT:  See server log for query details.
STATEMENT:  drop table text;
(...)

I was then able to recreate this using:

$ while true; do psql template1 < test-lock.sql; done
and in another terminal
$ while true; do psql template1 -c 'vacuum analyze text'; done

The file test-lock.sql contained:

create table text ( a text );
\copy text from stdin ... 1000 lines of random 20 character strings ...
\.
drop table text;

Almost instantly messages like the above crop up, sometimes it's 
AccessExclusiveLock/ShareUpdateExclusiveLock, sometimes 
AccessExclusiveLock/ShareUpdateExclusiveLock.

System:
Linux 2.6.23.9 Intel Core Duo 32bit

Configure switches:
./configure --enable-debug --enable-cassert --with-libxml --with-perl 
--with-python --with-openssl --with-tcl

Cheers,
-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


Re: deadlock while doing VACUUM and DROP

From
"Pavan Deolasee"
Date:
On Fri, May 16, 2008 at 5:24 AM, Jan Urbański
<j.urbanski@students.mimuw.edu.pl> wrote:
>
> ERROR:  deadlock detected
> DETAIL:  Process 25423 waits for AccessExclusiveLock on relation 16386 of
> database 1; blocked by process 25428.
>        Process 25428 waits for AccessShareLock on relation 16390 of database
> 1; blocked by process 25423.
>        Process 25423: drop table manuale ;
>        Process 25428: autovacuum: VACUUM ANALYZE public.manuale

I looked at it briefly. ISTM that the main relation and the toast
relation is getting into a deadlock. VACUUM first vacuums the main
relation, commits the transaction but *keeps* the
ShareUpdateExclusiveLock on the main relation. It then vacuums the
toast relation which requires ShareUpdateExclusiveLock on the toast
relation.

If at the same time, another backend drops the main relation. Because
of dependency, the toast relation is dropped first. So the other
backend takes AccessExclusiveLock on the toast relation. It then tries
to drop the main relation, asks for AccessExclusiveLock on the main
relation and gets into a deadlock with the first process vacumming the
relation.

I think we can fix it by making sure that locks on the to-be-deleted
and all the dependent objects are taken first, in an order that would
prevent the deadlock. Alternatively, we can just acquire
AccessExclusiveLock on the main relation before proceeding with the
recursive deletion. That would solve this case, but may be there are
other similar deadlocks waiting to happen. Also I am not sure if the
issue is big enough to demand the change.


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: deadlock while doing VACUUM and DROP

From
Gregory Stark
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

> Alternatively, we can just acquire AccessExclusiveLock on the main relation
> before proceeding with the recursive deletion. That would solve this case,
> but may be there are other similar deadlocks waiting to happen. 

Surely we should be locking the relation before even doing the dependency scan
or else someone else can come along and add more dependencies after we've
started the scan?

> Also I am not sure if the issue is big enough to demand the change.

I think it is, effectively what we have now is "your DDL could fail randomly
for reasons that are out of your control" :(

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


Re: deadlock while doing VACUUM and DROP

From
"Pavan Deolasee"
Date:
On Fri, May 16, 2008 at 1:17 PM, Gregory Stark <stark@enterprisedb.com> wrote:
>
>
> Surely we should be locking the relation before even doing the dependency scan
> or else someone else can come along and add more dependencies after we've
> started the scan?
>

Yeah, that's indeed possible. I could make that happen the following way:

Session 1:

- CREATE TABLE test (a int);
- Attach the session to gdb
- Put a break at dependency.c:727 (just before doDeletion() call)
- DROP TABLE test;

Session 2:
- CREATE INDEX testindx ON test(a);

The CREATE INDEX in session 2 succeeds. But DROP TABLE at this point
has already scanned all the dependencies and fails to recognize the
newly added dependency. As a result, the table gets dropped but the
index remains.

Session 1:
- continue from the breakpoint
- DROP TABLE succeeds.
- But the index remains

postgres=# SELECT relname, relfilenode from pg_class WHERE relname
like '%test%'; relname  | relfilenode
-----------+-------------testindx  |       16391
(1 row)


You can't even drop the index now.

postgres=# DROP INDEX testindx;
ERROR:  could not open relation with OID 16388

If I remember correctly, we had seen a similar bug report few days
back. May be we now know the cause.

>> Also I am not sure if the issue is big enough to demand the change.
>
> I think it is, effectively what we have now is "your DDL could fail randomly
> for reasons that are out of your control" :(
>

Yeah. I think we better fix this, especially given the above mentioned scenario.


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: deadlock while doing VACUUM and DROP

From
Alvaro Herrera
Date:
Pavan Deolasee escribió:

> >> Also I am not sure if the issue is big enough to demand the change.
> >
> > I think it is, effectively what we have now is "your DDL could fail randomly
> > for reasons that are out of your control" :(
> 
> Yeah. I think we better fix this, especially given the above mentioned scenario.

The pg_shdepend code has code to grab a lock on the object being
dropped, which is also grabbed by someone who wants to add a dependency
on the object.  Perhaps the pg_depend code should do the same.

I don't think this closes the original report though, unless we ensure
that the lock taken by vacuum conflicts with that one.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: deadlock while doing VACUUM and DROP

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
>> Alternatively, we can just acquire AccessExclusiveLock on the main relation
>> before proceeding with the recursive deletion. That would solve this case,
>> but may be there are other similar deadlocks waiting to happen. 

> Surely we should be locking the relation before even doing the dependency scan

Yeah.  I think this is just another manifestation of the problem I was
noodling about a few days ago:
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00301.php

As I said then, I don't want to think about it until after commitfest.
I foresee an invasive and not sanely back-patchable patch.
        regards, tom lane