Thread: deadlock while doing VACUUM and DROP
I got this on HEAD while doing lots of CREATE -> COPY -> DROP on table. (...) DEBUG: ProcessUtility DEBUG: drop auto-cascades to toast table pg_toast.pg_toast_16774 DEBUG: drop auto-cascades to type pg_toast.pg_toast_16774 DEBUG: drop auto-cascades to index pg_toast.pg_toast_16774_index DEBUG: drop auto-cascades to type public.text DEBUG: drop auto-cascades to type public.text[] DEBUG: StartTransaction DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: ERROR: deadlock detected DETAIL: Process 25423 waits for AccessExclusiveLock on relation 16386 of database 1; blocked by process 25428. Process 25428 waits for AccessShareLock on relation 16390 of database 1; blocked by process 25423. Process 25423: drop table manuale ; Process 25428: autovacuum: VACUUMANALYZE public.manuale HINT: See server log for query details. STATEMENT: drop table text; (...) I was then able to recreate this using: $ while true; do psql template1 < test-lock.sql; done and in another terminal $ while true; do psql template1 -c 'vacuum analyze text'; done The file test-lock.sql contained: create table text ( a text ); \copy text from stdin ... 1000 lines of random 20 character strings ... \. drop table text; Almost instantly messages like the above crop up, sometimes it's AccessExclusiveLock/ShareUpdateExclusiveLock, sometimes AccessExclusiveLock/ShareUpdateExclusiveLock. System: Linux 2.6.23.9 Intel Core Duo 32bit Configure switches: ./configure --enable-debug --enable-cassert --with-libxml --with-perl --with-python --with-openssl --with-tcl Cheers, -- Jan Urbanski GPG key ID: E583D7D2 ouden estin
On Fri, May 16, 2008 at 5:24 AM, Jan Urbański <j.urbanski@students.mimuw.edu.pl> wrote: > > ERROR: deadlock detected > DETAIL: Process 25423 waits for AccessExclusiveLock on relation 16386 of > database 1; blocked by process 25428. > Process 25428 waits for AccessShareLock on relation 16390 of database > 1; blocked by process 25423. > Process 25423: drop table manuale ; > Process 25428: autovacuum: VACUUM ANALYZE public.manuale I looked at it briefly. ISTM that the main relation and the toast relation is getting into a deadlock. VACUUM first vacuums the main relation, commits the transaction but *keeps* the ShareUpdateExclusiveLock on the main relation. It then vacuums the toast relation which requires ShareUpdateExclusiveLock on the toast relation. If at the same time, another backend drops the main relation. Because of dependency, the toast relation is dropped first. So the other backend takes AccessExclusiveLock on the toast relation. It then tries to drop the main relation, asks for AccessExclusiveLock on the main relation and gets into a deadlock with the first process vacumming the relation. I think we can fix it by making sure that locks on the to-be-deleted and all the dependent objects are taken first, in an order that would prevent the deadlock. Alternatively, we can just acquire AccessExclusiveLock on the main relation before proceeding with the recursive deletion. That would solve this case, but may be there are other similar deadlocks waiting to happen. Also I am not sure if the issue is big enough to demand the change. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > Alternatively, we can just acquire AccessExclusiveLock on the main relation > before proceeding with the recursive deletion. That would solve this case, > but may be there are other similar deadlocks waiting to happen. Surely we should be locking the relation before even doing the dependency scan or else someone else can come along and add more dependencies after we've started the scan? > Also I am not sure if the issue is big enough to demand the change. I think it is, effectively what we have now is "your DDL could fail randomly for reasons that are out of your control" :( -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On Fri, May 16, 2008 at 1:17 PM, Gregory Stark <stark@enterprisedb.com> wrote: > > > Surely we should be locking the relation before even doing the dependency scan > or else someone else can come along and add more dependencies after we've > started the scan? > Yeah, that's indeed possible. I could make that happen the following way: Session 1: - CREATE TABLE test (a int); - Attach the session to gdb - Put a break at dependency.c:727 (just before doDeletion() call) - DROP TABLE test; Session 2: - CREATE INDEX testindx ON test(a); The CREATE INDEX in session 2 succeeds. But DROP TABLE at this point has already scanned all the dependencies and fails to recognize the newly added dependency. As a result, the table gets dropped but the index remains. Session 1: - continue from the breakpoint - DROP TABLE succeeds. - But the index remains postgres=# SELECT relname, relfilenode from pg_class WHERE relname like '%test%'; relname | relfilenode -----------+-------------testindx | 16391 (1 row) You can't even drop the index now. postgres=# DROP INDEX testindx; ERROR: could not open relation with OID 16388 If I remember correctly, we had seen a similar bug report few days back. May be we now know the cause. >> Also I am not sure if the issue is big enough to demand the change. > > I think it is, effectively what we have now is "your DDL could fail randomly > for reasons that are out of your control" :( > Yeah. I think we better fix this, especially given the above mentioned scenario. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee escribió: > >> Also I am not sure if the issue is big enough to demand the change. > > > > I think it is, effectively what we have now is "your DDL could fail randomly > > for reasons that are out of your control" :( > > Yeah. I think we better fix this, especially given the above mentioned scenario. The pg_shdepend code has code to grab a lock on the object being dropped, which is also grabbed by someone who wants to add a dependency on the object. Perhaps the pg_depend code should do the same. I don't think this closes the original report though, unless we ensure that the lock taken by vacuum conflicts with that one. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Gregory Stark <stark@enterprisedb.com> writes: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: >> Alternatively, we can just acquire AccessExclusiveLock on the main relation >> before proceeding with the recursive deletion. That would solve this case, >> but may be there are other similar deadlocks waiting to happen. > Surely we should be locking the relation before even doing the dependency scan Yeah. I think this is just another manifestation of the problem I was noodling about a few days ago: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00301.php As I said then, I don't want to think about it until after commitfest. I foresee an invasive and not sanely back-patchable patch. regards, tom lane