BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table
Date
Msg-id 16016-ed0345229ddc484c@postgresql.org
Whole thread Raw
Responses Re: BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16016
Logged by:          Alexey Ermakov
Email address:      alexey.ermakov@dataegret.com
PostgreSQL version: 11.5
Operating system:   Ubuntu 16.04.6 LTS
Description:

Hello,

recently I encountered strange situation when replica server stopped serving
queries. most likely it happened because startup process was holding access
exclusive lock on pg_statistic's toast table and was waiting for some
backend which lead to deadlock.
situation resolved after max_standby_streaming_delay timeout kicked in and
conflicting backend's queries were canceled.

more details:

1) first waiting backend in logs:
2019-09-19 12:14:11.314 UTC 144285 user@db from [local] [vxid:21/106350189
txid:0] [BIND waiting] LOG:  process 144285 still waiting for
AccessShareLock on relation 2840 of database 16389 after 1000.115 ms
2019-09-19 12:14:11.314 UTC 144285 user@db from [local] [vxid:21/106350189
txid:0] [BIND waiting] DETAIL:  Process holding the lock: 6002. Wait queue:
144285, 136649, 127349, 76417, 97350, 93019, 109010, 103163, 136300.

2) 6002 pid is indeed startup process:
ps ax | grep 6002
6002 ?        Ss   200:16 postgres: 11/main: startup   recovering
000000010000425900000094

3) locked relation is pg_statistic's toast table:

select 2840::regclass;
        regclass
------------------------
 pg_toast.pg_toast_2619

select 2619::regclass;
   regclass
--------------
 pg_statistic

4) I guess autovacuum was working on pg_statistic table on master server
right before problem occurred and cleaned up some pages at the end of
relation.
but can't confirm, log_autovacuum_min_duration = 1s and no entries in logs,
but table is pretty small (~2MB) so autovacuum might be finished in less
than 1s.

5) according to client no DB migrations were performed at that time.


same thing (again toast table of pg_statistic) also happened year ago twice
on 9.6.6 version on completely different server and project.
unfortunately I wasn't able to get gdb backtrace of startup process, so it
might be difficult to understand where it got stuck :(

at year's ago case I was able to get output of pg_locks, but not much
interesting there:

select * from pg_locks;
  locktype  | database | relation | page | tuple | virtualxid  |
transactionid | classid | objid | objsubid | virtualtransaction |  pid   |
     mode         | granted | fastpath

------------+----------+----------+------+-------+-------------+---------------+---------+-------+----------+--------------------+--------+---------------------+---------+----------
...truncated, I can show full output if it might help...
 virtualxid |          |          |      |       | 12/19089055 |
  |         |       |          | 12/19089055        |  91145 | ExclusiveLock
      | t       | t
 virtualxid |          |          |      |       | 1/1         |
  |         |       |          | 1/0                |  16141 | ExclusiveLock
      | t       | t
 relation   |    20508 |     2840 |      |       |             |
  |         |       |          | 12/19089055        |  91145 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |
  |         |       |          | 24/18427740        |  94698 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |
  |         |       |          | 26/15964159        |  96387 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |
  |         |       |          | 27/11337343        | 107448 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |
  |         |       |          | 28/9666022         |  99110 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |
  |         |       |          | 1/0                |  16141 |
AccessExclusiveLock | t       | f
(56 rows)

in that case pid 16141 - startup process and again 2840 - oid of toast table
of pg_statistic.
16141 ?        Ss   14169:14 postgres: 9.6/main: startup process
recovering 000000020000A3E3000000DC waiting

--
Thanks,
Alexey Ermakov


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16015: information_schema.triggers lack of truncate trigger
Next
From: Ashutosh Sharma
Date:
Subject: Re: BUG #16015: information_schema.triggers lack of truncate trigger