BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica. - Mailing list pgsql-bugs
From | maxim.boguk@gmail.com |
---|---|
Subject | BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica. |
Date | |
Msg-id | 20151001121745.368.58359@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #13657: Some kind of undetected deadlock between query
and "startup process" on replica.
(Maxim Boguk <maxim.boguk@gmail.com>)
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 13657 Logged by: Maksym Boguk Email address: maxim.boguk@gmail.com PostgreSQL version: 9.4.4 Operating system: Linux Description: Hi, On one of 9.4.4 replica database I had found strange issue with stuck query and stuck wal replay process waiting each other but with no detectable deadlock condtions. Stuck query runs: SELECT schemaname, relname, pg_relation_size(relid) as table_size, pg_indexes_size(relid) as table_indexes_size, seq_scan, coalesce(idx_scan,0) as idx_scan, seq_tup_read, coalesce(idx_tup_fetch,0) as idx_tup_fetch, n_tup_ins as inserted, n_tup_upd as updated, n_tup_del as deleted, n_tup_hot_upd as hot_updated FROM pg_stat_user_tables In pg_stat_activity it's on waiting state. Locks for this query: postgres=# select * from pg_locks where pid=862 order by granted, relation; locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+-----------------+---------+---------- relation | 16424 | 17987 | | | | | | | | 7/924904 | 862 | AccessShareLock | f | f relation | 16424 | 1259 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 2610 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 2615 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 2662 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 2663 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 2678 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 2679 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 2684 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 2685 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 3455 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 11221 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t relation | 16424 | 11236 | | | | | | | | 7/924904 | 862 | AccessShareLock | t | t virtualxid | | | | | 7/924904 | | | | | 7/924904 | 862 | ExclusiveLock | t | t So not granted an single lock on relation 17987 (just common table inside the database). Now lets see whats going on with wal replay: ps -axuww | grep recover postgres 26786 2.5 0.0 177620 76784 ? Ss Sep30 48:27 postgres: startup process recovering 00000001000061A0000000AE waiting So wal replay waiting for something. Lets see what locks wal replay have: postgres=# select * from pg_locks where pid=26786 order by granted, relation; locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------------+---------+---------- relation | 16424 | 17987 | | | | | | | | 1/0 | 26786 | AccessExclusiveLock | t | f virtualxid | | | | | 1/1 | | | | | 1/0 | 26786 | ExclusiveLock | t | t (2 rows) So wal replay got AccessExclusiveLock on relation 17987 and waiting for something. And query waiting for AccessShareLock on the same relation. Once I killed waiting query, wal replay resume with no issues. So seems there are some king of undetected deadlock condition between query and wal replay. PS: situation reproduced every few hours so it's possible to debug but I have no good idea where to start. Kind Regards, Maksym
pgsql-bugs by date: