Help understanding SIReadLock growing without bound on completedtransaction - Mailing list pgsql-general

From Mike Klaas
Subject Help understanding SIReadLock growing without bound on completedtransaction
Date
Msg-id kah64cuu.911fba7b-28c9-4c00-93cc-7557777f9e58@we.are.superhuman.com
Whole thread Raw
Responses Re: Help understanding SIReadLock growing without bound on completed transaction  (Thomas Munro <thomas.munro@gmail.com>)
Re: Help understanding SIReadLock growing without bound on completed transaction  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-general
We recently experienced an issue where a transaction that was finished and no longer existed kept slowly accumulating SIReadLocks over the period of a week.  The only way we could remove the locks was by restarting postgresql.

The entries in pg_locks resembled:

mode: SIReadLock              

locktype: page         
relation::regclass::text: <table>_pkey                                      
virtualtransaction: 36/296299968
granted:t      
pid:2263461
count(1):  5559 (when grouped)

Note that this pid did not exist in pg_stat_activity.  I understand that it is normal for SSI locks to persist after a transaction is finished.  There are however two aspects to this that I don't understand:
  1. It's my understanding that these locks should be cleared when there are no conflicting transactions.  These locks had existed for > 1 week and we have no transactions that last more than a few seconds (the oldest transaction in pg_stat_activity is always < 1minute old).
  2. Why would a transaction that is finished continue accumulating locks over time?  
If it helps, here is some more information about the state of the system:
  1. There were a total of six pids in pg_locks that didn't exist in pg_stat_activity.  They held a variety of SIReadLocks, but they weren't increasing in number over time.  I'm not sure how long they were present; I only know that the problematic pid existed for a week due to its continual growth reflecting in our internal lock monitoring system.
  2. I tried finding overlapping SIReadLocks (see query below), but none were returned (I realize that the SSI conflict resolution algo is much more involved than this simple query)
  3. PG version: 9.6.17
I would appreciate any hints of what I could've done to investigate this further or how I could've resolved the issue without restarting the db (and thus experiencing downtime).

thank you in advance,
-Mike

SELECT 

    waiting.locktype           AS w_locktype,

    LEFT(waiting.relation::regclass::text,25) AS waiting_table,

    COALESCE(waiting_stm.query,'?')          AS w_query,

    waiting.page AS w_page,

    waiting.tuple AS w_tuple,

    waiting.pid                AS w_pid,

    other.locktype             AS o_locktype,

    LEFT(other.relation::regclass::text,15)   AS other_table,

    LEFT(COALESCE(other_stm.query, '?'), 50)            AS other_query,

    other.page AS o_page,

    other.tuple AS o_tuple, 

    other.pid                  AS other_pid,

    other.GRANTED              AS o_granted

FROM

    pg_catalog.pg_locks AS waiting

LEFT JOIN

    pg_catalog.pg_stat_activity AS waiting_stm

    ON        waiting_stm.pid = waiting.pid

JOIN

    pg_catalog.pg_locks AS other

    ON (

        (

            waiting."database" = other."database"

        AND waiting.relation  = other.relation

        and waiting.locktype = other.locktype

        AND ( CASE WHEN other.locktype = 'page' THEN waiting.page IS NOT DISTINCT FROM other.page

           WHEN other.locktype = 'tuple' THEN waiting.page IS NOT DISTINCT FROM other.page and waiting.tuple IS NOT DISTINCT FROM other.tuple

           ELSE true END

        )

        )

        OR waiting.transactionid = other.transactionid

    ) AND waiting.pid <> other.pid

LEFT JOIN

    pg_catalog.pg_stat_activity AS other_stm

    ON        other_stm.pid = other.pid

WHERE waiting.pid IN (2263461, 2263276, 2263283, 2263284, 2263459, 2263527 )



pgsql-general by date:

Previous
From: Karsten Hilbert
Date:
Subject: Re: Strategy for materialisation and centralisation of data
Next
From: Sándor Daku
Date:
Subject: Re: Table partitioning for cloud service?