Thread: Autovacuum deadlock - bug or not?

Autovacuum deadlock - bug or not?

From
"Mikael Carneholm"
Date:
Don't know if this is a bug or just undocumented, but it seems as you shoul=
d turn off autovacuum before you run CLUSTER, otherwise you might run into =
a deadlock:

NOTICE:  Clustering idx_vehicle_unit_data_200407_person_information__id on =
vehicle_unit_data_200407
ERROR:  deadlock detected
DETAIL:  Process 29022 waits for AccessExclusiveLock on relation 68950 of d=
atabase 16390; blocked by process 15865.
Process 15865 waits for AccessShareLock on relation 68122 of database 16390=
; blocked by process 29022.

/Mikael

Re: Autovacuum deadlock - bug or not?

From
Tom Lane
Date:
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> Don't know if this is a bug or just undocumented, but it seems as you should turn off autovacuum before you run
CLUSTER,otherwise you might run into a deadlock: 

Which variant of CLUSTER were you using?  ISTR that some of them lock
the index before the table, which is prone to deadlock against nearly
all other operations on the table (not just vacuum).

            regards, tom lane

Re: Autovacuum deadlock - bug or not?

From
"Mikael Carneholm"
Date:
variant: CLUSTER indexname ON tablename

Maybe there should be something about this in the docs, so that users don't=
 get surprised when this happens and start sending stupid emails to the pgs=
ql-bugs list :)

/Mikael



-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: den 17 november 2005 15:51
To: Mikael Carneholm
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20


"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> Don't know if this is a bug or just undocumented, but it seems as you sho=
uld turn off autovacuum before you run CLUSTER, otherwise you might run int=
o a deadlock:

Which variant of CLUSTER were you using?  ISTR that some of them lock
the index before the table, which is prone to deadlock against nearly
all other operations on the table (not just vacuum).

            regards, tom lane

Re: Autovacuum deadlock - bug or not?

From
Tom Lane
Date:
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> variant: CLUSTER indexname ON tablename

Hmph.  Looking at the code, that should always lock the table first,
so I don't see where the problem is.  Would you look up the numbers
for us --- exactly which relations were involved in the deadlock,
and (if you can tell) which process was which?

Also, what PG version is this exactly?

            regards, tom lane

Re: Autovacuum deadlock - bug or not?

From
"Mikael Carneholm"
Date:
Unfortunately, relfilenodes 68950 and 68122 don't exist anymore, could be t=
hat they were temporary copies of tables or indexes. I know that process 15=
865 was the autovacuum pid, I looked that up when it happened (pg was resta=
rted with autovacuum=3Doff afterwards, so that process is also gone)

Could it be that the deadlock was caused by autovacuum trying to vacuum one=
 of the temp copies?

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: den 17 november 2005 16:04
To: Mikael Carneholm
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20


"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> variant: CLUSTER indexname ON tablename

Hmph.  Looking at the code, that should always lock the table first,
so I don't see where the problem is.  Would you look up the numbers
for us --- exactly which relations were involved in the deadlock,
and (if you can tell) which process was which?

Also, what PG version is this exactly?

            regards, tom lane

Re: Autovacuum deadlock - bug or not?

From
"Mikael Carneholm"
Date:
Forgot to mention:=20

dfol=3D> select version();
                           version=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
-------------------------------------------------------------
 PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: den 17 november 2005 16:04
To: Mikael Carneholm
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20


"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> variant: CLUSTER indexname ON tablename

Hmph.  Looking at the code, that should always lock the table first,
so I don't see where the problem is.  Would you look up the numbers
for us --- exactly which relations were involved in the deadlock,
and (if you can tell) which process was which?

Also, what PG version is this exactly?

            regards, tom lane

Re: Autovacuum deadlock - bug or not?

From
Tom Lane
Date:
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> Unfortunately, relfilenodes 68950 and 68122 don't exist anymore,

You should be looking at pg_class.oid, not relfilenode.

            regards, tom lane

Re: Autovacuum deadlock - bug or not?

From
"Mikael Carneholm"
Date:
oops, you're right.

dfol=3D> select pgc.oid, pgc.relname from pg_class pgc where pgc.oid in (68=
950, 68122);
  oid  |         relname=20=20=20=20=20=20=20=20=20=20
-------+--------------------------
 68950 | vehicle_unit_data_200407
 68122 | vehicle_unit_data_200301

NOTICE:  Clustering idx_vehicle_unit_data_200407_person_information__id on =
vehicle_unit_data_200407
ERROR:  deadlock detected
DETAIL:  Process 29022 waits for AccessExclusiveLock on relation 68950 of d=
atabase 16390; blocked by process 15865.
Process 15865 waits for AccessShareLock on relation 68122 of database 16390=
; blocked by process 29022.

So it seems that it was the clustering of idx_vehicle_unit_data_200407_pers=
on_information__id on vehicle_unit_data_200407 that caused the deadlock.


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: den 17 november 2005 16:53
To: Mikael Carneholm
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20


"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> Unfortunately, relfilenodes 68950 and 68122 don't exist anymore,

You should be looking at pg_class.oid, not relfilenode.

            regards, tom lane

Re: Autovacuum deadlock - bug or not?

From
Tom Lane
Date:
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> dfol=> select pgc.oid, pgc.relname from pg_class pgc where pgc.oid in (68950, 68122);
>   oid  |         relname
> -------+--------------------------
>  68950 | vehicle_unit_data_200407
>  68122 | vehicle_unit_data_200301

> NOTICE:  Clustering idx_vehicle_unit_data_200407_person_information__id on vehicle_unit_data_200407
> ERROR:  deadlock detected
> DETAIL:  Process 29022 waits for AccessExclusiveLock on relation 68950 of database 16390; blocked by process 15865.
> Process 15865 waits for AccessShareLock on relation 68122 of database 16390; blocked by process 29022.

> So it seems that it was the clustering of idx_vehicle_unit_data_200407_person_information__id on
vehicle_unit_data_200407that caused the deadlock. 

Hmm, the CLUSTER on vehicle_unit_data_200407 wouldn't have taken any
lock on vehicle_unit_data_200301.  Were you perhaps issuing a series
of CLUSTERs inside a transaction block?  That would pile up exclusive
locks on all the tables involved, which is certainly deadlock-prone.

I'm also wondering where that NOTICE "Clustering ..." came from, because
there is no such message anywhere in the 8.1 PG sources.  You *sure*
this is 8.1?

There's something funny about 15865 too; you said that was an autovacuum
process but I don't think so.  VACUUM doesn't take AccessShareLock;
there's a different lock type that that tries to acquire.  And it
doesn't take any locks at all on more than one user table at a time.

            regards, tom lane

Re: Autovacuum deadlock - bug or not?

From
"Mikael Carneholm"
Date:
>Hmm, the CLUSTER on vehicle_unit_data_200407 wouldn't have taken any
>lock on vehicle_unit_data_200301.  Were you perhaps issuing a series
>of CLUSTERs inside a transaction block?  That would pile up exclusive
>locks on all the tables involved, which is certainly deadlock-prone.

Ah, that explains it...the clustering is done from a function that's cluste=
ring all (child) tables of a parent, and as I've come to understand, all fu=
nctions have a built-in transaction (the "Clustering idx_foo on bar" notice=
 message also originates from that function)

I'm pretty sure though that 15865 was the autovacuum process, but I'll have=
 to run the function with autovacuum turned on before I can verify that thi=
s is (was) the case.

/Mikael


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: den 17 november 2005 17:42
To: Mikael Carneholm
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Autovacuum deadlock - bug or not?=20


"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> dfol=3D> select pgc.oid, pgc.relname from pg_class pgc where pgc.oid in (=
68950, 68122);
>   oid  |         relname=20=20=20=20=20=20=20=20=20=20
> -------+--------------------------
>  68950 | vehicle_unit_data_200407
>  68122 | vehicle_unit_data_200301

> NOTICE:  Clustering idx_vehicle_unit_data_200407_person_information__id o=
n vehicle_unit_data_200407
> ERROR:  deadlock detected
> DETAIL:  Process 29022 waits for AccessExclusiveLock on relation 68950 of=
 database 16390; blocked by process 15865.
> Process 15865 waits for AccessShareLock on relation 68122 of database 163=
90; blocked by process 29022.

> So it seems that it was the clustering of idx_vehicle_unit_data_200407_pe=
rson_information__id on vehicle_unit_data_200407 that caused the deadlock.

Hmm, the CLUSTER on vehicle_unit_data_200407 wouldn't have taken any
lock on vehicle_unit_data_200301.  Were you perhaps issuing a series
of CLUSTERs inside a transaction block?  That would pile up exclusive
locks on all the tables involved, which is certainly deadlock-prone.

I'm also wondering where that NOTICE "Clustering ..." came from, because
there is no such message anywhere in the 8.1 PG sources.  You *sure*
this is 8.1?

There's something funny about 15865 too; you said that was an autovacuum
process but I don't think so.  VACUUM doesn't take AccessShareLock;
there's a different lock type that that tries to acquire.  And it
doesn't take any locks at all on more than one user table at a time.

            regards, tom lane