Thread: Autovacuum deadlock - bug or not?
Don't know if this is a bug or just undocumented, but it seems as you shoul= d turn off autovacuum before you run CLUSTER, otherwise you might run into = a deadlock: NOTICE: Clustering idx_vehicle_unit_data_200407_person_information__id on = vehicle_unit_data_200407 ERROR: deadlock detected DETAIL: Process 29022 waits for AccessExclusiveLock on relation 68950 of d= atabase 16390; blocked by process 15865. Process 15865 waits for AccessShareLock on relation 68122 of database 16390= ; blocked by process 29022. /Mikael
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > Don't know if this is a bug or just undocumented, but it seems as you should turn off autovacuum before you run CLUSTER,otherwise you might run into a deadlock: Which variant of CLUSTER were you using? ISTR that some of them lock the index before the table, which is prone to deadlock against nearly all other operations on the table (not just vacuum). regards, tom lane
variant: CLUSTER indexname ON tablename Maybe there should be something about this in the docs, so that users don't= get surprised when this happens and start sending stupid emails to the pgs= ql-bugs list :) /Mikael -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: den 17 november 2005 15:51 To: Mikael Carneholm Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20 "Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > Don't know if this is a bug or just undocumented, but it seems as you sho= uld turn off autovacuum before you run CLUSTER, otherwise you might run int= o a deadlock: Which variant of CLUSTER were you using? ISTR that some of them lock the index before the table, which is prone to deadlock against nearly all other operations on the table (not just vacuum). regards, tom lane
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > variant: CLUSTER indexname ON tablename Hmph. Looking at the code, that should always lock the table first, so I don't see where the problem is. Would you look up the numbers for us --- exactly which relations were involved in the deadlock, and (if you can tell) which process was which? Also, what PG version is this exactly? regards, tom lane
Unfortunately, relfilenodes 68950 and 68122 don't exist anymore, could be t= hat they were temporary copies of tables or indexes. I know that process 15= 865 was the autovacuum pid, I looked that up when it happened (pg was resta= rted with autovacuum=3Doff afterwards, so that process is also gone) Could it be that the deadlock was caused by autovacuum trying to vacuum one= of the temp copies? -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: den 17 november 2005 16:04 To: Mikael Carneholm Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20 "Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > variant: CLUSTER indexname ON tablename Hmph. Looking at the code, that should always lock the table first, so I don't see where the problem is. Would you look up the numbers for us --- exactly which relations were involved in the deadlock, and (if you can tell) which process was which? Also, what PG version is this exactly? regards, tom lane
Forgot to mention:=20 dfol=3D> select version(); version=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20 ------------------------------------------------------------- PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC 2.96 (1 row) -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: den 17 november 2005 16:04 To: Mikael Carneholm Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20 "Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > variant: CLUSTER indexname ON tablename Hmph. Looking at the code, that should always lock the table first, so I don't see where the problem is. Would you look up the numbers for us --- exactly which relations were involved in the deadlock, and (if you can tell) which process was which? Also, what PG version is this exactly? regards, tom lane
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > Unfortunately, relfilenodes 68950 and 68122 don't exist anymore, You should be looking at pg_class.oid, not relfilenode. regards, tom lane
oops, you're right. dfol=3D> select pgc.oid, pgc.relname from pg_class pgc where pgc.oid in (68= 950, 68122); oid | relname=20=20=20=20=20=20=20=20=20=20 -------+-------------------------- 68950 | vehicle_unit_data_200407 68122 | vehicle_unit_data_200301 NOTICE: Clustering idx_vehicle_unit_data_200407_person_information__id on = vehicle_unit_data_200407 ERROR: deadlock detected DETAIL: Process 29022 waits for AccessExclusiveLock on relation 68950 of d= atabase 16390; blocked by process 15865. Process 15865 waits for AccessShareLock on relation 68122 of database 16390= ; blocked by process 29022. So it seems that it was the clustering of idx_vehicle_unit_data_200407_pers= on_information__id on vehicle_unit_data_200407 that caused the deadlock. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: den 17 november 2005 16:53 To: Mikael Carneholm Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20 "Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > Unfortunately, relfilenodes 68950 and 68122 don't exist anymore, You should be looking at pg_class.oid, not relfilenode. regards, tom lane
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > dfol=> select pgc.oid, pgc.relname from pg_class pgc where pgc.oid in (68950, 68122); > oid | relname > -------+-------------------------- > 68950 | vehicle_unit_data_200407 > 68122 | vehicle_unit_data_200301 > NOTICE: Clustering idx_vehicle_unit_data_200407_person_information__id on vehicle_unit_data_200407 > ERROR: deadlock detected > DETAIL: Process 29022 waits for AccessExclusiveLock on relation 68950 of database 16390; blocked by process 15865. > Process 15865 waits for AccessShareLock on relation 68122 of database 16390; blocked by process 29022. > So it seems that it was the clustering of idx_vehicle_unit_data_200407_person_information__id on vehicle_unit_data_200407that caused the deadlock. Hmm, the CLUSTER on vehicle_unit_data_200407 wouldn't have taken any lock on vehicle_unit_data_200301. Were you perhaps issuing a series of CLUSTERs inside a transaction block? That would pile up exclusive locks on all the tables involved, which is certainly deadlock-prone. I'm also wondering where that NOTICE "Clustering ..." came from, because there is no such message anywhere in the 8.1 PG sources. You *sure* this is 8.1? There's something funny about 15865 too; you said that was an autovacuum process but I don't think so. VACUUM doesn't take AccessShareLock; there's a different lock type that that tries to acquire. And it doesn't take any locks at all on more than one user table at a time. regards, tom lane
>Hmm, the CLUSTER on vehicle_unit_data_200407 wouldn't have taken any >lock on vehicle_unit_data_200301. Were you perhaps issuing a series >of CLUSTERs inside a transaction block? That would pile up exclusive >locks on all the tables involved, which is certainly deadlock-prone. Ah, that explains it...the clustering is done from a function that's cluste= ring all (child) tables of a parent, and as I've come to understand, all fu= nctions have a built-in transaction (the "Clustering idx_foo on bar" notice= message also originates from that function) I'm pretty sure though that 15865 was the autovacuum process, but I'll have= to run the function with autovacuum turned on before I can verify that thi= s is (was) the case. /Mikael -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: den 17 november 2005 17:42 To: Mikael Carneholm Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20 "Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes: > dfol=3D> select pgc.oid, pgc.relname from pg_class pgc where pgc.oid in (= 68950, 68122); > oid | relname=20=20=20=20=20=20=20=20=20=20 > -------+-------------------------- > 68950 | vehicle_unit_data_200407 > 68122 | vehicle_unit_data_200301 > NOTICE: Clustering idx_vehicle_unit_data_200407_person_information__id o= n vehicle_unit_data_200407 > ERROR: deadlock detected > DETAIL: Process 29022 waits for AccessExclusiveLock on relation 68950 of= database 16390; blocked by process 15865. > Process 15865 waits for AccessShareLock on relation 68122 of database 163= 90; blocked by process 29022. > So it seems that it was the clustering of idx_vehicle_unit_data_200407_pe= rson_information__id on vehicle_unit_data_200407 that caused the deadlock. Hmm, the CLUSTER on vehicle_unit_data_200407 wouldn't have taken any lock on vehicle_unit_data_200301. Were you perhaps issuing a series of CLUSTERs inside a transaction block? That would pile up exclusive locks on all the tables involved, which is certainly deadlock-prone. I'm also wondering where that NOTICE "Clustering ..." came from, because there is no such message anywhere in the 8.1 PG sources. You *sure* this is 8.1? There's something funny about 15865 too; you said that was an autovacuum process but I don't think so. VACUUM doesn't take AccessShareLock; there's a different lock type that that tries to acquire. And it doesn't take any locks at all on more than one user table at a time. regards, tom lane