Thread: Rare deadlock failure in create_am test
conchuela just showed an unusual failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-09-03%2023%3A00%3A36 The core of it is a deadlock failure in create_am.sql; there's then some follow-on noise from not having successfully dropped the test AM. The deadlock looks like: 2020-09-04 01:05:06.904 CEST [609175:34] pg_regress/create_am LOG: process 609175 detected deadlock while waiting for AccessExclusiveLockon relation 17347 of database 16384 after 4616.873 ms 2020-09-04 01:05:06.904 CEST [609175:35] pg_regress/create_am DETAIL: Process holding the lock: 609183. Wait queue: . 2020-09-04 01:05:06.904 CEST [609175:36] pg_regress/create_am STATEMENT: DROP ACCESS METHOD gist2 CASCADE; 2020-09-04 01:05:06.904 CEST [609175:37] pg_regress/create_am ERROR: deadlock detected 2020-09-04 01:05:06.904 CEST [609175:38] pg_regress/create_am DETAIL: Process 609175 waits for AccessExclusiveLock on relation17347 of database 16384; blocked by process 609183. Process 609183 waits for RowExclusiveLock on relation 20095 of database 16384; blocked by process 609175. Process 609175: DROP ACCESS METHOD gist2 CASCADE; Process 609183: autovacuum: VACUUM ANALYZE public.fast_emp4000 2020-09-04 01:05:06.904 CEST [609175:39] pg_regress/create_am HINT: See server log for query details. 2020-09-04 01:05:06.904 CEST [609175:40] pg_regress/create_am STATEMENT: DROP ACCESS METHOD gist2 CASCADE; 2020-09-04 01:05:06.904 CEST [609183:11] LOG: process 609183 acquired RowExclusiveLock on relation 20095 of database 16384after 13377.776 ms 2020-09-04 01:04:52.895 CEST [609183:6] LOG: automatic analyze of table "regression.public.tenk2" system usage: CPU: user:0.03 s, system: 0.00 s, elapsed: 0.59 s So it's not hard to understand the problem: DROP of an index AM, cascading to an index, will need to acquire lock on the index and then lock on the index's table. Any other operation on the table, like say autovacuum, is going to acquire locks in the other direction. This is pretty rare, but not unheard of: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-24%2022%3A00%3A23 (There might be more such failures, but I only looked back six months, and these two are all I found in that window.) I'm inclined to think that the best fix is to put begin; lock table tenk2; ... commit; around the DROP CASCADE. We could alternatively disable autovacuum on tenk2, but that could have undesirable side-effects. regards, tom lane
On Fri, Sep 4, 2020 at 2:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > (There might be more such failures, but I only looked back six months, > and these two are all I found in that window.) Right: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-09-03%2023:00:36 | HEAD | DragonflyBSD https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2020-04-08%2004:38:34 | HEAD | Debian https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2020-03-30%2013:54:05 | HEAD | AIX https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-24%2022:00:23 | HEAD | Raspbian https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2020-03-03%2016:10:06 | HEAD | Debian https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2019-12-11%2023:59:06 | HEAD | macOS https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2019-12-11%2022:09:07 | HEAD | macOS https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-12-10%2003:46:12 | REL9_6_STABLE | https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2019-07-10%2023:19:12 | REL9_6_STABLE | AIX https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi&dt=2018-06-23%2018:00:15 | HEAD | Arch > I'm inclined to think that the best fix is to put > > begin; > lock table tenk2; > ... > commit; Makes sense, except s/tenk2/fast_emp4000/, no?
Thomas Munro <thomas.munro@gmail.com> writes: > On Fri, Sep 4, 2020 at 2:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm inclined to think that the best fix is to put >> >> begin; >> lock table tenk2; >> ... >> commit; > Makes sense, except s/tenk2/fast_emp4000/, no? Sorry, thinko ... regards, tom lane
On 2020-Sep-03, Tom Lane wrote: > So it's not hard to understand the problem: DROP of an index AM, cascading > to an index, will need to acquire lock on the index and then lock on the > index's table. Any other operation on the table, like say autovacuum, > is going to acquire locks in the other direction. Oh, of course. > I'm inclined to think that the best fix is to put > > begin; > lock table [fast_emp4000]; > ... > commit; > > around the DROP CASCADE. Yeah, sounds good. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Sep-03, Tom Lane wrote: >> I'm inclined to think that the best fix is to put >> >> begin; >> lock table [fast_emp4000]; >> ... >> commit; >> >> around the DROP CASCADE. > Yeah, sounds good. Done. regards, tom lane