BUG #5321: Parallel restore temporarily deadlocked by autovacuum analyze - Mailing list pgsql-bugs

From Jerry Gamache
Subject BUG #5321: Parallel restore temporarily deadlocked by autovacuum analyze
Date
Msg-id 201002092155.o19LtWq9001298@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5321: Parallel restore temporarily deadlocked by autovacuum analyze  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5321
Logged by:          Jerry Gamache
Email address:      jerry.gamache@idilia.com
PostgreSQL version: 8.4.2
Operating system:   Linux
Description:        Parallel restore temporarily deadlocked by autovacuum
analyze
Details:

While restoring a database in 8.4.2, pg_restore stopped for more than 30
minutes doing nothing:

pg_restore -d database1 -j 8 -v database1.pg_dump

The parallel restore showed very impressive performance as the verbose
output went by, but the restore suddenly stopped. The output of "ps" after
15 minutes of 0% CPU activity showed 4 process in "Ss" state:

postgres: xxx database1 [local] CREATE INDEX waiting
postgres: xxx database1 [local] CREATE INDEX waiting
postgres: autovacuum worker process   database1
postgres: autovacuum worker process   database1

The output of pg_stat_activity for database1 showed:

database1=# select procpid,datname,current_query from pg_stat_activity where
datname='database1' ORDER BY procpid;
 procpid |     datname      |
 current_query
---------+------------------+-----------------------------------------------
----------------------------------------------------------------
    6146 | database1 | autovacuum: ANALYZE public.table_x
    6270 | database1 |
                            : CREATE UNIQUE INDEX index1 ON table_x USING
btree (col_a, col_b);
    6271 | database1 |
                            : CREATE UNIQUE INDEX index2 ON table_x USING
btree (col_b, col_c, col_d);
    6307 | database1 | autovacuum: ANALYZE public.table_y
    6597 | database1 | <IDLE>
    6626 | database1 | select procpid,datname,current_query from
pg_stat_activity where datname='database1' ORDER BY procpid;
(6 rows)

The output of pg_locks for the 4 stuck PIDs showed:

database1=# select * from pg_locks where pid IN (SELECT procpid FROM
pg_stat_activity WHERE datname='database1') ORDER BY pid;
  locktype  | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid  |
      mode           | granted
------------+----------+----------+------+-------+------------+-------------
--+---------+-------+----------+--------------------+------+----------------
----------+---------
 relation   |    16993 |    17597 |      |       |            |
 |         |       |          | 11/57              | 6146 | AccessShareLock
        | t
 relation   |    16993 |    17616 |      |       |            |
 |         |       |          | 11/57              | 6146 | AccessShareLock
        | t
 relation   |    16993 |    17663 |      |       |            |
 |         |       |          | 11/57              | 6146 | AccessShareLock
        | t
 virtualxid |          |          |      |       | 11/57      |
 |         |       |          | 11/57              | 6146 | ExclusiveLock
        | t
 relation   |    16993 |    17165 |      |       |            |
 |         |       |          | 11/57              | 6146 |
ShareUpdateExclusiveLock | t
 virtualxid |          |          |      |       | 13/100     |
 |         |       |          | 13/100             | 6270 | ExclusiveLock
        | t
 relation   |    16993 |    17165 |      |       |            |
 |         |       |          | 13/100             | 6270 | ShareLock
        | f
 relation   |    16993 |    17165 |      |       |            |
 |         |       |          | 14/50              | 6271 | ShareLock
        | f
 virtualxid |          |          |      |       | 14/50      |
 |         |       |          | 14/50              | 6271 | ExclusiveLock
        | t
 relation   |    16993 |    17188 |      |       |            |
 |         |       |          | 3/844              | 6307 |
ShareUpdateExclusiveLock | t
 virtualxid |          |          |      |       | 3/844      |
 |         |       |          | 3/844              | 6307 | ExclusiveLock
        | t
 relation   |    16993 |    17660 |      |       |            |
 |         |       |          | 3/844              | 6307 | AccessShareLock
        | t
 relation   |    16993 |    17590 |      |       |            |
 |         |       |          | 3/844              | 6307 | AccessShareLock
        | t
 relation   |        0 |     2677 |      |       |            |
 |         |       |          | 7/187              | 6626 | AccessShareLock
        | t
 relation   |        0 |     2676 |      |       |            |
 |         |       |          | 7/187              | 6626 | AccessShareLock
        | t
 relation   |    16993 |    11042 |      |       |            |
 |         |       |          | 7/187              | 6626 | AccessShareLock
        | t
 relation   |        0 |     2671 |      |       |            |
 |         |       |          | 7/187              | 6626 | AccessShareLock
        | t
 relation   |        0 |     2672 |      |       |            |
 |         |       |          | 7/187              | 6626 | AccessShareLock
        | t
 virtualxid |          |          |      |       | 7/187      |
 |         |       |          | 7/187              | 6626 | ExclusiveLock
        | t
 relation   |    16993 |    10969 |      |       |            |
 |         |       |          | 7/187              | 6626 | AccessShareLock
        | t
 relation   |        0 |     1262 |      |       |            |
 |         |       |          | 7/187              | 6626 | AccessShareLock
        | t
 relation   |        0 |     1260 |      |       |            |
 |         |       |          | 7/187              | 6626 | AccessShareLock
        | t
(22 rows)

The restore resumed while I was writing this report, and I saw these new
entries in the logs:

ERROR:  canceling autovacuum task
CONTEXT:  automatic analyze of table "database1.public.table_y"
ERROR:  canceling autovacuum task
CONTEXT:  automatic analyze of table "database1.public.table_y"

pgsql-bugs by date:

Previous
From: Fujii Masao
Date:
Subject: Re: unable to fail over to warm standby server
Next
From: Oleg Serov
Date:
Subject: Re: BUG #5314: Error in nested composite types in plpgsql.